preface

  • So just to summarize what we’ve done so far

  • The InnoDB data page consists of seven parts. Knowing that each data page can form a bidirectional linked list, and the records in each data page form a one-way linked list in order of primary key value, each data page will generate a page directory for the records stored in it. When searching for a record through the primary key, the binary method can be used to quickly locate the corresponding slot in the page directory, and then traverse the records in the corresponding group of the slot to quickly find the specified record

  • Page A, page B, page C… Pages n These pages may not be connected physically, but simply associated via a bidirectional linked list.

A, index,

  • Create a table:
mysql> CREATE TABLE index_demo(
    ->     c1 INT.->     c2 INT.->     c3 CHAR(1),
    ->     PRIMARY KEY(c1)
    -> ) ROW_FORMAT = Compact;
Query OK, 0 rows affected (0.03 sec)
Copy the code
  • This new oneindex_demoThere are two in the tableINTType column, 1CHAR(1)Type column, and we specify thatc1Column primary key used by this tableCompactLine format to actually store records. Let’s simplify this a little bit for our sakeindex_demoTable row format diagram:

We only show these parts of the record in the diagram:

  • record_type: An attribute of the record header indicating the type of record,0Represents common record,2Represents the minimum record,3Represents the maximum record,1We haven’t used it yet, but we’ll talk about it later
  • next_record: An attribute of the record header that represents the offset of the next address relative to this record. We use arrows to indicate who the next record is for ease of understanding.
  • The values of the individual columns: This is only recorded inindex_demoThe three columns in the table arec1,c2andc3.
  • Other information: All information except the above three, including the values of other hidden columns and additional information recorded.

1. A simple index scheme

  • Assume that each data page can hold up to three records (in fact, a data page is large enough to hold many records). So with that assumption we’re going to go toindex_demoAlter table insert 3 records:
mysql> INSERT INTO index_demo VALUES(1.4.'u'), (3.9.'d'), (5.3.'y');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
Copy the code
  • The records are then cascaded into a one-way linked list by the size of the primary key, as shown in the figure below:

  • As you can see from the picture,index_demoAll three records in the table have been inserted with the number10In the data page of. Now let’s insert another record:
mysql> INSERT INTO index_demo VALUES(4.4.'a');
Query OK, 1 row affected (0.00 sec)
Copy the code
  • becausePages 10We can only put 3 records at most, so we have to assign a new page:

  • How to assign the page number28Oh, it shouldn’t be11? Once again,The newly assigned data page numbers may not be contiguous, meaning that the pages we use may not be next to each other in the storage space. They simply establish linked lists by maintaining the numbers of the previous and the next pages. In addition,Pages 10The maximum primary key value of the user record in5And thePage 28The primary key of one of the records in4Because the5 > 4, so this does not meet the requirement that the primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous page, so the primary key value in the insert is4The primary key value is set to5The record moves toPage 28, and then set the primary key value to4Insert the record toPages 10, the process is shown as follows:

  • The primary key value recorded by the user on the next data page must be greater than the primary key value recorded by the user on the previous page. This process we can also callPage divided.
  • In theindex_demoAfter inserting many records into a table, the effect might look like this:

  • Because these 16KB pages may not be physically next to each other, if we want to quickly locate the page of some records based on the primary key, we need to make a directory for them. Each page corresponds to a directory entry, and each directory entry contains the following two parts:

    • Page user record the smallest primary key value we usekeyTo represent.
    • Page number, we usepage_noSaid.

For example, page 28 corresponds to directory entry 2, which contains the page number 28 and the minimum primary key value 5 for user records on the page. We just need to store several items in physical storage consecutively, such as putting them in an array, can realize the function of a certain record based on the primary key value. For example, if we want to find the record whose primary key value is 20, the specific search process is divided into two steps:

  1. The primary key value is quickly determined by dichotomy from the directory entry20Records of theDirectory entry 3(because of12 < 20 < 209), its corresponding page isPage 9.
  2. Look for records in the page as described abovePage 9To locate a specific record.
    • In the binary + order search through the page class, find the final primary key is 20

2. Index scheme in InnoDB

  • How does InnoDB distinguish between a user record and a directory entry record? Don’t forget the record_type attribute in the record header, whose values mean the following:

    • 0: Common user records
    • 1: Records of directory entries
    • 2: Minimum record
    • 3: Maximum record
  • Record_type = 1 record_type = 1 record_type = 1

  • As you can see from the figure, a new page numbered 30 has been allocated to store catalog entry records. Again, there are differences between directory entry records and normal user records:

  • The record_type value of a directory entry record is 1, whereas the record_type value of a common user record is 0.

  • The directory entry record has only two columns: primary key and page number, whereas the normal user record column is user-defined and may contain many columns, as well as hidden columns added by InnoDB itself.

  • Remember we talked about a property called min_rec_mask earlier when we were talking about logging headers. The min_rec_mask value is 1 for the directory entry record with the smallest primary key value on the page where the directory entry record is stored, and 0 for all other records.

  • If we insert a user record with a primary key of 320, then we need to allocate a new page to store the directory entry record:

  • These storageDirectory entry recordThe page is regenerated into a more advanced directory, like a multi-level directory, with smaller directories inside the larger directory where the actual data is stored, so the page diagram now looks something like this:

3, single table storage upper limit

  • Suppose that the data pages represented by all leaf nodes storing user records can store 100 user records, and the data pages represented by all internal nodes storing catalog records can store 1000 catalog records, then:

    • ifB+The tree has only one layer, that is, only one node for storing user records, at most100Records.
    • ifB+The tree has two layers and can hold as much as possible1000 x 100 = 100000Records.
    • ifB+The tree has three layers and can hold as much as possible1000 x 1000 x 100 = 100000000Records.
    • ifB+The tree has four layers and can hold as much as possible1000 x 1000 x 1000 x 100 = 100000000000records
  • Typically, the B+ tree we use is no more than four levels, so we only need to do a maximum of four Page look-up through primary key values to find a particular record (three item pages and one user record Page), and because there is a so-called Page Directory in each Page, So in the page can also be achieved through dichotomy fast location record, this is not very cool, ha ha!

Clustering index

  • Clustering indexIs how data is stored (all user records are stored in theA leaf node)

Secondary indexes

  • We usec2The size of the column is used as the sorting rule for the data page and the records in the pageB+Tree, as shown below:

  • This B+ tree differs from the cluster index described above in several ways:

  • Ordering records and pages using the size of the record C2 column has three implications:

    • Page records are in accordance withc2The column sizes are arranged in a one-way linked list.
    • The pages that store user records are also based on the records in the pagesc2The column size order is arranged in a two-way linked list.
    • The pages that hold directory entry records are divided into different levels, and the pages in the same level are also recorded according to the directory entry in the pagec2The column size order is arranged in a two-way linked list.
  • The leaf node of the B+ tree does not store the complete user record, but only the values of the C2 + primary key columns.

  • Instead of a primary key + page number, there is a C2 column + page number in the directory entry record.

  • Remember to return to the table

Joint index
  • Fang says we want the B+ tree to be sorted by the size of the C2 and C3 columns. This has two implications:

  • Start by sorting individual records and pages into c2 columns.

  • In the case that the C2 columns of the records are the same, the C3 column is used for sorting

4. InnoDB B+ tree index

(1) The root page will not move

  • B+ tree formation
    • Each time one is created for a tableB+When a tree index (a clustered index is not created artificially, it is created by default), an index is created for this indexThe root nodePage. At the beginning, when there’s no data in the table, eachB+Corresponding to the tree indexThe root nodeThere are neither user records nor directory entry records in.
    • When you subsequently insert user records into the table, you store the user records in this firstThe root nodeIn the.
    • whenThe root nodeContinues to insert records when the available space inThe root nodeCopy all records in to a newly assigned page, for exampleA page, and proceed to the new pagePage dividedTo get another new page, for examplePage b. At this point, the newly inserted record is assigned to the value of the key (that is, the primary key value in the cluster index, the corresponding index column value in the secondary index)A pageorPage b, and theThe root nodeUpgrade to a page that stores directory entry records.

The root node of a B+ tree index does not move as soon as it is born. So whenever we create an index for a table, the page number of the root node will be recorded somewhere, and then whenever InnoDB storage engine needs to use the index, the page number of the root node will be retrieved from that fixed place to access the index.

(2) Uniqueness of directory entry records in inner nodes

  • knowB+The contents of the directory entry in the inner node of the tree index areIndex column + page numberBut this collocation is a bit loose for secondary indexes. Also takeindex_demoFor example, suppose the data in this table looks like this:

  • If the contents of the directory entry in the secondary index are onlyIndex column + page numberCollocation, then forc2Column after the index is createdB+A tree should look like this:

  • If we want to insert a new row where c1, c2, and c3 are: 9, 1, ‘c’, then we have a big problem modifying the B+ tree corresponding to the c2 secondary index: Since the table of contents stored in page 3 is composed of c2 column + page number, the value of c2 column corresponding to the two table of contents records in page 3 is 1, and the value of C2 column of the newly inserted record is also 1, so should we put the newly inserted record on page 4 or on page 5?

  • In order for the newly inserted record to find itself on that page, we need to ensure that the entry record of the node in the same level of the B+ tree is unique except for the page number field. Therefore, the contents of the entry record for the inner node of the secondary index are actually composed of three parts:

  • In other words, we added the primary key value to the directory entry record of the node in the secondary index. This ensures that each directory entry record of the node in each layer of the B+ tree is unique except for the page number field. Therefore, the diagram after we build the secondary index for c2 column should actually look like this:

  • When inserting records (9, 1, ‘c’), we can compare the values of column C2 of the new record with the values of column C2 of each entry in page 3. If the values of column C2 are the same, we can compare the values of column C2 of each entry in page 3. Because the c2 column + primary key values for different entries in the same layer of the B+ tree are bound to be different, the only entry record must eventually be located and, in this case, the new record should be inserted into page 5.

  • The index value is compared, the primary key value is compared, and the final page number is selected

(3) Store at least 2 records on a page

InnoDB can store at least two records on a data page, which is a conclusion we used when we were talking about row formatting (we used this conclusion to deduce how many bytes a column can store in a table with only one column without row overflow, go back and see if you forget).