A quick look at the data page structure

  • The 16KB storage space represented by the data page can be divided into multiple parts, and different parts have different functions. Each part is shown in the figure:

  • The storage space of an InnoDB data page is roughly divided into seven parts

One, records in the page storage

  • The stored Records are stored in the User Records section in the row format we specified
  • When we first generated the page, we didn’t actually have the User Records section, and every time we inserted a record, we took a record size from the Free Space section, which is unused storage Space, and divided it into the User Records section.
  • When all the Space in the Free Space part is replaced by the User Records part, it means that the page is used up. If there are new Records inserted, it is necessary to apply for a new page

1. Record the secret of header information

Let’s create a table:

mysql> CREATE TABLE page_demo(
    ->     c1 INT,
    ->     c2 INT,
    ->     c3 VARCHAR(10000),
    ->     PRIMARY KEY (c1)
    -> ) CHARSET=ascii ROW_FORMAT=Compact;
Query OK, 0 rows affected (0.03 sec)
Copy the code
  • Columns C1 and C2 are used to store integers, and column C3 is used to store strings

  • In the page_demo table row format diagram, draw the header properties and columns C1, C2, and C3 (the other information does not mean that they do not exist). The simplified row format diagram looks like this:

  • Continue inserting data
mysql> INSERT INTO page_demo VALUES(1, 100, 'aaaa'), (2, 200, 'bbbb'), (3, 300, 'cccc'), (4, 400, 'dddd'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0
Copy the code
  • As is shown in

  • The delete_mask attribute indicates whether the current record is deleted. It takes up one binary bit. A value of 0 indicates that the record is not deleted, and a value of 1 indicates that the record is deleted. Instead of being deleted immediately, all the deleted records will form a so-called junk list. The space occupied by the records in the list is called reusable space. If new records are inserted into the table later, the storage space occupied by the deleted records may be overwritten.
  • The min_REC_mask B+ tree adds this flag to the smallest record in each layer of non-leaf nodes
  • N_owned is a secret, but later it will be the main character
  • The heap_NO attribute indicates the position of the current record on the page. As you can see from the figure, the positions of the four records we inserted on the page are 2, 3, 4, and 5. Without the 0 and 1, mysql adds two records to the edge of each page. Since these records are not inserted by ourselves, they are sometimes referred to as pseudo records or virtual records. One is the minimum record and the other is the maximum record. The record is the primary key, and no matter how many of our own records we insert into the page, Mysql automatically generates two pseudo records, one is the minimum record and the other is the maximum record. The construction of these two records is quite simple, consisting of a 5-byte record header information and a fixed 8-byte section, as shown in the figure

  • Since these two Records are not our own defined Records, they are not stored in the User Records section of the page, they are placed in a separate section called Infimum + Supremum, as shown in the figure below:

  • The values heap_no are 0 and 1, respectively
  • Record_type Specifies the type of the current record. There are four types of records: 0 for normal records, 1 for B+ leaf nodes, 2 for minimum records, and 3 for maximum records. We can also see from the figure that the records we insert are normal records with the record_type value of 0, while the minimum record and maximum record have the record_type value of 2 and 3, respectively
  • Next_record It represents the address offset from the real data in the current record to the real data in the next record. The next_record value for the record is 0

  • If we delete one record from the list, the list will change. For example, if we delete the second record:

  • As you can see from the figure, the main changes occurred before and after deleting the second record: the second record was not removed from the storage space, but its delete_mask value was set to 1. The next_record value for the second record is changed to 0, indicating that there is no next record for the record. The next_record of record 1 points to record 3. Another thing you may have missed is that the n_OWNED value for the largest record has changed from 5 to 4, which we’ll explain more about later.
  • If we insert this record into the table again

  • As you can see from the figure, InnoDB does not apply for new storage space for the new record insertion, but directly reuse the storage space of the original deleted record.

2, Page Directory

  • Now that we know that records are concatenated in a single linked list by primary key on a page, what if we want to find a record on a page by primary key
SELECT * FROM page_demo WHERE c1 = 3;
Copy the code
  • MySQL’s query process divides all normal records (including the largest and smallest records, but not those marked as deleted) into groups. The N_OWNED attribute in the header of the last record of each group (the largest record in the group) indicates how many records that record has, that is, how many records there are in the group. The address offset for the last record of each group is extracted separately and stored sequentially near the end of the Page, which is called the Page Directory. (You should go back to the head to see the map of each section of the Page.) These address offsets in the page catalog are called slots, so the page catalog is made up of slots.
  • For example, the page_demo table now has 6 normal records, InnoDB will divide them into two groups, the first group has only one minimum record, the second group is the remaining 5 records, see the diagram below:

  • Note the N_OWNED attribute in the header information for the minimum and maximum records. The N_OWNED value for the minimum record is 1, which means that there is only one record in the group that ends with the minimum record, the minimum record itself. The maximum record has an N_OWNED value of 5, which means that the group ending with the maximum record has only five records, including the maximum record itself and the four records we inserted ourselves.
  • Only for the group where the smallest record is located1The maximum number of records in the group can only be1 ~ 8Between conditions, the number of records in the remaining groups must be in the range of yes4 ~ 8Between the article.
  • So the n_owned of the smallest record is 1, and the n_owned of the current largest record is 5

  • Each time a record is inserted, it finds a slot in the page directory where the primary key is larger than the primary key of the current record and the difference between the primary key and the primary key of the current record is the smallest. Then, the n_OWNED value of the record corresponding to this slot is increased by 1 to indicate that another record is added to the group until the number of records in the group is equal to 8.
  • If the number of records in a group is equal to eight and a record is inserted, the group is divided into two groups. One group has four records and the other has five records. This process creates a new slot in the page directory to record the offset of the largest record in the new group.
  • Add another 12 pieces of data

  • Let’s say we want to find a record with a primary key of 6
  1. Calculate the position of the middle slot :(0+4)/2=2, so check the primary key of the record corresponding to slot 2 is 8, and since 8 > 6, set high=2, and keep low unchanged.
  2. Recalculate the position of the middle slot :(0+2)/2=1, so check that the primary key for slot 1 is 4, and since 4 < 6, set low=1, and leave high unchanged.
  3. Because the value of high-low is 1, make sure that the record with a primary key of 6 is in the group corresponding to slot 2. At this point we need to find the record in slot 2 with the smallest primary key, and then traverse the records in slot 2 along the unidirectional list.
  4. We can get the record for slot 1 (primary key 4), and the next record for that record is the record with the smallest primary key in slot 2, which has the primary key 5. So we can start with the record with the primary key of 5 and go through the records in slot 2 until we find the record with the primary key of 6. Since a group can contain only 1 to 8 records, the cost of traversing the records in a group is small.

So finding a record in a data page with a specified primary key is a two-step process:

  1. The slot in which the record is located is determined by dichotomy and the record with the smallest primary key in the group in which the slot is located is found.
  2. The next_record attribute of the record is used to traverse the records in the group to which the slot belongs.

Page Header

  • In order to get the records stored in a data Page state information, such as how many records have been stored in this Page, what is the address of the first record, in the Page directory how many storage tank and so on, specially in the Page defines a named Page Header section, it is the second part of the Page structure, this part of the fixed 56 bytes, It stores all kinds of state information. See the following table for what each byte does:
The name of the Occupied space describe
PAGE_N_DIR_SLOTS 2 – Number of slots in the page catalog
PAGE_HEAP_TOP 2 – Minimum address of unused Space, i.e., Free Space after this address
PAGE_N_HEAP 2 – The number of records in this page (including minimum and maximum records and records marked for deletion)
PAGE_FREE 2 – The address of the first record that has been marked for deletion (next_record also forms a single linked list that can be reused)
PAGE_GARBAGE 2 – Number of bytes occupied by deleted records
PAGE_LAST_INSERT 2 – The last place to insert a record
PAGE_DIRECTION 2 – Record the direction of insertion
PAGE_N_DIRECTION 2 – The number of consecutive records inserted in one direction
PAGE_N_RECS 2 – The number of records on the page (excluding minimum and maximum records and those marked for deletion)
PAGE_MAX_TRX_ID 8 bytes Changes the maximum transaction ID for the current page, which is defined only in the secondary index
PAGE_LEVEL 2 – The level of the current page in the B+ tree
PAGE_INDEX_ID 8 bytes Index ID, which indicates the index to which the current page belongs
PAGE_BTR_SEG_LEAF 10 bytes The header information for the B+ tree leaf segment is defined only on the Root page of the B+ tree
PAGE_BTR_SEG_TOP 10 bytes The header information for the non-leaf segments of the B+ tree is defined only on the Root page of the B+ tree

4. File Header

  • This section is a fixed 38 bytes. It consists of the following contents:

:

The name of the Occupied space describe
FIL_PAGE_SPACE_OR_CHKSUM 4 bytes Page checksum
FIL_PAGE_OFFSET 4 bytes Page number
FIL_PAGE_PREV 4 bytes The number of the previous page
FIL_PAGE_NEXT 4 bytes The page number of the next page
FIL_PAGE_LSN 8 bytes Log Sequence Number where the page was last modified
FIL_PAGE_TYPE 2 – The type of the page
FIL_PAGE_FILE_FLUSH_LSN 8 bytes Defined in only one page of the system tablespace, which means that the file has been refreshed to at least the corresponding LSN value
FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID 4 bytes Which table space the page belongs to

Against this chart, let’s take a look at some of the most important parts:

  • FIL_PAGE_SPACE_OR_CHKSUM Represents the checksum of the current page. What is a checksum? That is, for a very, very long byte string, we’re going to use some algorithm to compute a short value to represent that long byte string, and that short value is called the checksum. This compares the checksum of two long bytes before comparing them. If the checksum is different, the two long bytes must be different, so the time loss of directly comparing two long bytes is eliminated.
  • FIL_PAGE_OFFSET Each page has a separate page number, just like your ID number. InnoDB uses the page number to uniquely locate a page.
  • FIL_PAGE_TYPE indicates the type of the current page. InnoDB divides pages into different types for different purposes. InnoDB has many other types of pages as shown in the following table: Type Name Hexadecimal description FIL_PAGE_TYPE_ALLOCATED0x0000 Latest allocated, The FIL_PAGE_UNDO_LOG0x0002Undo log page FIL_PAGE_INODE0x0003 information node FIL_PAGE_IBUF_FREE_LIST0x0004Insert has not been used Buffer free list FIL_PAGE_IBUF_BITMAP0x0005Insert Buffer bitmap FIL_PAGE_TYPE_SYS0x0006 System page FIL_PAGE_TYPE_TRX_SYS0x0007 Transaction system data FIL_PAGE_TYPE_FSP_HDR0x0008 Table space header FIL_PAGE_TYPE_XDES0 X0009 Extension description page FIL_PAGE_TYPE_BLOB0x000A Overflow page FIL_PAGE_INDEX0x45BF Index page the type of data page where we store records is actually FIL_PAGE_INDEX, which is called the index page. As for what an index is, let’s break it down next time
  • FIL_PAGE_PREV and FIL_PAGE_NEXT

Five, the summary

A data page can be roughly divided into seven sections, called File headers, which represent some common information about the page and occupy a fixed 38 bytes. The Page Header, which represents some information proprietary to the data Page, is a fixed number of 56 bytes. Infimum + Supremum, two virtual pseudo records, representing the minimum and maximum records in the page, occupying a fixed 26 bytes. User Records: The part that actually stores the Records we insert. The size is not fixed. Free Space: An unused portion of a page of uncertain size. Page Directory: The relative location of some records in the Page, that is, the address offset of each slot in the Page, the size is not fixed, the more records inserted, the more space this part of the space. File Trailer: A fixed 8 bytes section used to verify that the page is complete.