InnoDB is the default engine for mysql and the one we use most often, so learn the page structure based on InnoDB. And the learning page structure is for better learning index.

One, page introduction

A page is the basic unit of storage space managed by InnoDB, and the size of a page is usually 16KB.

In order to achieve different purposes, the author designed various types of pages, such as:

  • The page that holds the table space header information
  • The page where the change buffer is stored
  • The page that holds the inode information
  • Page that stores undo log information
  • . .

What we are most concerned about, however, is the type of page on which the data records are stored in the table. This type of page is officially called an INDEX page, but for the sake of understanding, it will be referred to as a data page.

Second, the structure of the data page

This data page also has 16KB of storage space and can be roughly divided into seven sections.

As can be seen from the structure diagram, the number of bytes occupied by some parts is determined, while some are uncertain. The User Records that we care most about are in the User Records section.

However, when the page is first generated, there is no User Records section. When a new record is inserted, a record size Space is applied from the Free Space section and then divided into the User Records section until Free Space is completely replaced by User Records, indicating that the page is used up. If a new record is inserted, you need to request a new page.

I think this data page can be regarded as a page of a book. The content on the page is usually presented line by line. When the whole page is used up, you have to turn to the next page (new page) to continue writing.

Third, the storage structure recorded in the page

So, how are these Records managed in the User Records section?

Let’s create a table:

CREATE TABLE pingguo_demo(
	c1 INT,
	c2 INT,
	c3 VARCHAR(10000),
	PRIMARY KEY (c1)
	) CHARSET = ASCII ROW_FORMAT = COMPACT;
Copy the code

The row format specified here is COMPACT (there are other row formats in the engine), but we’ll just know COMPACT.

When we insert a record into the database, the underlying row format looks like this:

Notice the record header in orange. It contains a lot of important information:

  • Reserved bit 1: Occupies one bit and is not used.
  • Reserved bit 2: Occupies one bit and is not used.
  • Deleted_flag: 1 bit is used to mark whether the record is deleted.
  • Min_rec_flag: Takes 1 bit and is added to the smallest entry in a non-leaf node at each level of the B+ tree (index later).
  • N_owned: Records on a page are divided into groups, one of which is the “big brother” and the other is the “little brother”. The n_owned value of the “big brother” is the number of entries in the group, and the n_owned value of the “little brother” is 0
  • Heap_no: Takes 13 bits, indicating the relative position of the current record in the page heap.
  • Record_type: Takes 3 bits, indicating the current record type. 0 is a common record, 1 is a directory entry record on the B+ tree, 2 is a Infimum record, and 3 is a Suprememum record.
  • Next_record: occupies 16 bits and represents the relative position of the next record.

4. Record header information

Now, insert four records into the table created above:

INSERT INTO pingguo_demo VALUES
	(1, 100, 'aaaa'),
	(2, 200, 'bbbb'),
	(3, 300, 'cccc'),
	(4, 400, 'dddd');
Copy the code

Then, the row format corresponding to these four records should be:

Notice that this is a simplification for memorization. In addition, the information in the record is actually bits of binary data, which is written in decimal notation for the sake of understanding. Furthermore, there is no gap between individual Records stored in User Records, which is represented abstractly here.

1. deleted_flag

This attribute is used to indicate whether the current record has been deleted, with 1 indicating deletion and 0 indicating no deletion.

Huh? I deleted the data from my table and it’s still on the page.

Yeah, you thought it was deleted, but it’s still on the disk. Why is that?

Because of the performance cost of removing these records on disk and having to rearrange the other records, only one delete is marked.

Then, all the deleted records form a junk list. The space occupied by the records in the list is called reusable space, and when new records are inserted into the table later, they may overwrite this space.

2. min_rec_flag

This tag is added to the smallest directory entry in a non-leaf node at each level of the B+ tree. The directory entries mentioned here will be explained later.

Here, the min_rec_flag of the 4 records is 0, indicating that none of them is the smallest directory entry record in the non-leaf node of the B+ tree.

3. n_owned

That’s in the next chapter.

4. heap_no

Represents the relative position of the current record in the page heap.

The above four records are an abstract description, but in fact they are all laid out one by one in a tight, seamless arrangement called the heap.

For administrative purposes, the relative position of a record in the heap is called HEap_NO.

  • The record heap_NO at the front of the page is relatively small
  • The record heap_no at the end of the page is relatively large
  • When the storage space of a record is requested, the heAP_no value of this record is 1 greater than that of the record whose physical location precedes it

Heap_no of the above 4 records is 2, 3, 4, 5, eh? Where are the zeros and ones?

Virtual records -Infimum and Supremum

This is mentioned in the second part of this article. In fact, these two records are automatically added in the page:

  • Infimum: represents the smallest record in the page
  • Supremum: represents the maximum record on the page

The authors stipulate that no matter how many records are inserted into the page, any user record is larger than the Infimum record and smaller than the Supremum record.

The structure of the two virtual records is also simple.

So, for the four user records you inserted above, you should add these two default records to the top of the list.

Also, note that when the heAP_NO value recorded in the heap is allocated, it does not change. Even if a record is deleted from the heap, the heAP_NO value for the deleted record remains the same.

5. record_type

This property represents the type of the current record. There are four types:

  • 0: indicates common records
  • 1: indicates the record of non-leaf entries in the B+ tree
  • 2: Indicates Infimum records
  • 3: indicates the Supremum record

6. next_record

This attribute is important because it represents the distance between the real data in the current record and the real data in the next record.

  • A positive value indicates that the next record of the current record comes after the current record.
  • A negative attribute value: indicates that the next record in the current record precedes the current record.

For example, if the first record has a next_record value of 32, then 32 bytes from the actual data address of the record is the real data of the next record. For example, if the value is -111, it means looking 111 bytes forward from the record.

Familiar with? That’s right, a linked list.

  • The next entry is the next entry in order of primary keys from smallest to largest.
  • The next record in the Infrimum record is the user record with the smallest primary key on this page.
  • The next record for the user with the largest primary key on this page is the Supremum record.

So, now look again at the diagram between the records, can be described as a unidirectional linked list:

If you delete one of these records, you change the pointer.