This is the third article illustrating MySQL, and this article will make it very clear:

  • What is InnoDB row format? What are InnoDB pages?
  • What field information does InnoDB page and InnoDB row format have?
  • Why is it recommended to use the auto-increment ID as the primary key and not the UUID?
  • How InnoDB designers design efficient algorithms to quickly search records on a page.

Begin the text!


Note: All of the following descriptions are for InnoDB storage engines, and will be specified if other storage engines are involved

InnoDB row format (ROW_FORMAT)

We usually insert data into MySQL tables on a record basis, and these records are stored on disk in InnoDB row format.

To prove that I am not lying, for example, I query my local database for the row format of a table that starts with forward

We rarely work with row formatting, so this concept may not be clear. In fact, the InnoDB storage engine provides us with 4 different row formats

  • DYNAMIC (default row format)

  • COMPACT

  • REDUNDANT

  • COMPRESSED

We can specify the row format when creating the table (if not, the default row format is DYNAMIC). For example, I specify the row_format_TABLE row format as COMPACT

mysql> CREATE TABLE row_format_table(
    -> id INT,
    -> c1 VARCHAR(10),
    -> c2 CHAR(10),
    -> PRIMARY KEY(id)
    -> ) CHARSET=utf8 ROW_FORMAT=COMPACT;
Copy the code

Now that you know the syntax, let’s look at the four line formats and draw a picture

As can be seen from the figure, a complete record can be divided into two parts: “Recorded additional information” and “real data information”. The differences of the four line formats are mainly reflected in the part of “real data information”. That is, different line formats use different data formats to store our real data, and the specific differences are not important for this article.

Next_record represents the relative position of the next record. With this field, records can be connected in a single linked list. This is easier to understand. As for the other field information, we will introduce it when we use it.

Note: The order in which fields are listed (including those that will be mentioned below) is not the actual order in which InnoDB rows are stored on storage devices. I have simplified this to illustrate the rest of the story, so you can understand the idea

2. Introduce InnoDB pages

For any storage engine of MySQL, data is stored on disk. The storage engine must load data from disk into memory before it can manipulate data.

So how much data is appropriate to load from disk into memory at once? Does InnoDB storage engine need to read records from disk one by one when retrieving them?

Of course not! We know that disk read/write speeds are several orders of magnitude different from memory read/write speeds, and if the data we need to read happens to be running in different places on the disk, that means multiple I/O operations.

Therefore, both the operating system and MySQL storage engine have a prefetch concept. The concept is based on the principle of locality that governs computing.

Spatial locality: If the data is currently in use, other data adjacent to the spatial address of the data is more likely to be used in the future, so it can be loaded into registers or main memory first to improve efficiency

When a piece of data on a disk is read, we simply read more, not more.

The InnoDB storage engine divides data into pages, with pages being the smallest unit of interaction between disk and memory. The default page size in InnoDB is 16KB. By default, at least 16KB of data is read from the disk to the memory and at least 16KB of content is flushed to the disk at a time.

For the InnoDB storage engine, all data (indexes that store user data, various metadata, system data) is stored as pages.

There are many types of InnoDB pages, such as Insert Buffer pages, undo log pages, etc., but we won’t focus on the other messy pages today. The focus of this article is the page that holds the records in our table, let’s call it the data page.

3. Data page structure

Obviously, the data page will have its own format, like the row format, so I’ll list the two fields we’ll use first, and we’ll talk about the rest.

3.1 How are user records stored

The actual table Records we store will be stored in the User Records section of the diagram in the specified row format. If the current data page is newly generated and there are no Records, the User Records section will not actually exist. When the Free Space is used up (or the remaining Space is insufficient to carry new data), it means that the Space of the current data page is occupied. If Records continue to be inserted, it is necessary to apply for a new data page. The schematic diagram is as follows:

Note that the records in the figure above pass each othernext_recordThe fields are concatenated into a single linked list, which I just didn’t draw in the diagram.

But is it just a series?

If we were to design concatenation rules, we would want some sort of “size relationship” to determine the concatenation order, not just the insertion order. After all, we are people who have studied data structures.

But can records compare sizes? Yes, the title of this article is about primary keys. We can concatenate all the records in the current data page in order of primary keys from smallest to largest. In fact, that’s exactly what the designers of MySQL did.

If you’re a rebel, you might be thinking, well, if you don’t set a primary key, does MySQL crash?

InnoDB uses a Unique key as its primary key when there is no Unique key. If there is no Unique key in the table, InnoDB automatically adds a column called DB_ROW_ID as the default primary key for each record.

Now let’s add a downlink format

Once again

  • The order in which I draw the fields is not the order in which they are actually stored on the storage device
  • Only if InnoDB really cannot determine the primary key (created without specifying a primary key, and has noThe Unique key) will be addedDB_ROW_IDcolumn

3.2 Extras: Why is it recommended to use auto-increment ID as primary key instead of UUID?

Speaking of which, why is it recommended to use an incremented ID as the primary key rather than a UUID?

In addition to the fact that the UUID primary key index takes up a lot of space, the self-increment ID is also much less expensive than the UUID in terms of resource overhead for inserting data. Since records in the data page are concatenated according to the primary key from small to large, the increment ID determines that later inserted records must be arranged after the previous record, simply add the next_record pointer. If the current data page is full, feel free to simply insert the new data page.

And UUID is different, it is not sure of the size of the order, then insert records may be considerable (and probability) is inserted into a record before (and even before the current data page), which means need to traverse the record of the current data page (or page to find relevant data), then insert your place; If the current data page is full, you can only find the data page that fits your position first, and then traverse the records in the data page to find the appropriate position for insertion.

Therefore, inserting records using UUID takes longer.

3.3 Two pseudo records in the data page

In fact, InnoDB’s designers added two pseudo-records to the InnoDB page, one Infimum and one Supremum. And the designers specify that any user record on the current data page is larger than Infimum and any user record is smaller than Supremum.

Since they are pseudo Records, they need to be separated from the contents of User Records, so they are placed in a section called Infimum+Supremum, as shown below:

Finally, in the data page, user records are saved like this:

In the figure above, I draw the primary key ID value in the real data information for our subsequent interpretation.

You may not understand why InnoDB designers add these two fields for no reason, they don’t seem to do any good for our search work.

Yes, these two items are not added to facilitate the retrieval of data in the data page, they work in the battlefield of MySQL LOCK_GAP record lock. What? Don’t understand? Nothing, I’m just saying it, it’s not useful for this article, more on that later…

3.4 Efficient primary key query scheme in data pages

So far, we have seen that in a data page, user records are one-way linked lists in ascending order of primary keys. The next thing we need to figure out is how to search for data by primary key in a data page.

If we execute the following query statement

SELECT * FROM row_format_table WHERE id = 4;
Copy the code

The easiest way to do this is to go through all the records on the current page, starting with the Infimum record and searching down the one-way linked list until you find the record with ID 4. It’s fine when the numbers are small, but if there are tens of thousands of them, no one can stand it.

So InnoDb’s designers came up with a clever way of searching by dividing all the records in the data page (including pseudo records) into groups. Each group selected the largest record in the group as the “leader” and then took the addresses of all the group leaders and catalogued them.

It’s like if we go to a school and find someone, we only know what grade they are in (determine the data page), and then ask each teacher if they have that person (the group in the data page), instead of just going through the whole grade.

In order to maximize the efficiency of this scheme (no arbitrary grouping, after all, a data page is divided into a group or each record has a group is no different from traversal), so InnoDB’s designers specify the following grouping scheme:

  • InfimumPseudo records are grouped separately
  • SupremumThe number of pseudo records in a group ranges from 1 to 8
  • The number of records in other groups ranges from 4 to 8

That’s the rule, but how does InnoDB determine how many members there are in each group? The designer came up with another idea, adding an attribute to the “group leader” to record how many members there are in the group (including himself). So let’s extend the downlink format:

The n_owned value of the group leader is the number of members (including itself). The n_owned value of the group leader is 0.

Let’s add a few more entries to the table and see what grouping is all about. Note that the DB_ROW_ID parameter is no longer drawn because we have specified the primary key ID in the table.

All the records in the image above (including pseudo records) are divided into four groups. The “leaders” of each group are individually promoted and compiled into separate “catalogs”, which InnoDB officially calls “slots”. The fact that slots are continuous in physical space means that you can easily find the one before and the one after it through a slot, which is very important.

Slot numbers start from 0. When we search for data, we can first find the corresponding slot, and then traverse in the group. Because the number of records in a group is not large, the performance loss of traverse can be ignored. Moreover, the primary key value of the “group leader” represented by each slot is also arranged in ascending order, so we can use the dichotomy method to find the slot quickly.

The figure contains 4 slots, which are 0, 1, 2 and 3 respectively. Before the binary search, the lowest slot low=0, and the highest slot high=3. Now let’s see what happens when we query the record with ID 7 on the data page.

  1. Using dichotomy, calculate the position of the middle slot,(0 + 3) / 2 = 1To see theSlot 1The primary key value of group leader is4Because the4 < 7, so setlow=1.highStay the same;
  2. Using the dichotomy again, calculate the position of the middle slot,(1 + 3) / 2 = 2To see theChannel 2The primary key value of group leader is8Because the8 > 7, so sethigh=2.lowStay the same;
  3. nowhigh=2.low=1The difference between the two is 1, there is no need to continue the dichotomy, we can be sure that our record atChannel 2And we know itChannel 2The primary key of the corresponding group leader is8But there is a one-way linked list between the records, so we cannot traverse it forward. As mentioned above, we can passChannel 2findSlot 1, and then find its “group leader”, and then walk down the “group leader” until you find the record with primary key 7.

At this point, we have a pretty good idea of how to search by primary key in a data page. However, we have only half answered the question of why MySQL primary key query is so fast. After all, you have to find the data page in order to search the data page. That’s all you need to know about MySQL indexes in every interview. I’ll cover that in the next article.

4. Important! Other fields on the data page

Finally, a couple of points, there are two things I didn’t cover in the article

  1. How are slots stored?
  2. How do I know how many slots I have in binary search?

First answer the first question, we have introduced the data Page structure, in fact, is not complete, next we introduce a field Page Directory, slot is saved in this field information.

Page Directory translates into “Page Directory” in Chinese. Does this deepen your understanding of the Directory?

As for the second question, it is also about the data page structure. I didn’t draw it all at once before, because I think it is more helpful to memorize when needed.

Next I will use all the data page structure to draw for you (very simple, don’t be afraid), temporarily useless shielding, later use again.

  • FIL_PAGE_OFFSET

The page number of an InnoDB page, which is the id of the page

  • FIL_PAGE_PREV, FIL_PAGE_NEXT

If you look at the picture, there is a two-way list between each page

  • FIL_PAGE_TYPE

There are many types of InnoDB pages, such as data pages in this article, and others such as Insert Buffer pages, undo log pages, etc. This field identifies the type of page

  • PAGE_N_DIR_SLOTS

This field stores the number of slots, and the binary method determines the value of high based on the value of this field

  • PAGE_LAST_INSERT

When a new record is inserted, read this data directly and put the new record in the corresponding position

  • PAGE_N_RECS

Number of records on the page (excluding min and Max records)

5. Recommended reading

  • How is an SQL query executed?
  • How is an SQL update statement executed

This article is the eve of the index, see you in the next index!