Interviewer: Why do MySQL indexes use B+ trees instead of other trees? Like a B tree?

How many rows can an InnoDB B+ tree hold?

Answer: About 20 million

Why so many?

Since this can be calculated, to understand this problem, start with InnoDB index data structure, data organization.

When a computer stores data, it has a minimum unit of storage, which is like cash flow and the smallest unit is a dime.

In a computer, the smallest unit of disk storage data is sector, the size of a sector is 512 bytes, and the smallest unit of file system (such as XFS/EXT4) is block, the size of a block is 4K, and for InnoDB storage engine has its own minimum storage unit, Page, the size of a Page is 16K.

The following diagrams can be used to understand the minimum storage unit:

A file in a file system is only 1 byte in size, but it has to take up 4KB of disk space.

All InnoDB data files (files with the suffix IBD) are always integer multiples of 16384 (16K).

Disk sectors, file systems, and InnoDB storage engines all have their own minimum storage units.

In MySQL, InnoDB page size is 16K by default.

The data in a table is stored in pages, so how many rows can you store on a page?

Assuming a row of data is 1K in size, a page can hold 16 rows of such data.

If the database is only stored this way, finding the data becomes a problem ** because you don’t know which page the data is on and it’s impossible to traverse all the pages. That’s too slow.

However, you can use B+ tree to organize the data, as shown in the figure below:

First, sort the data records by primary key and store them in different pages (to make it easier to understand that there are only 3 records on a page, but in fact there are many)

In addition to the page where the data is stored, there is also a page where the key + pointer is stored. The page where the key + pointer is stored is a page where the key + pointer is stored. Such a page is composed of N key + pointer.

And of course it’s also sorted. This form of data organization is called an index organization table.

Now, if you want to find a piece of data, how do you do that?

Select * from user where id=5;

So here ID is the primary key, and I’m going to look it up through this B+ tree, and I’m going to look at the root page, and how do you know where the root page of the user table is?

The root page position of each table is fixed in the tablespace file, i.e. the page with page number=3.

After finding the root page, the binary search method is used to locate the data whose ID =5 should be in the page pointed to by pointer P5. Then, the binary search method is also used to find the records whose ID =5:

| 5 | zhao2 | 27 |

It is now clear how the primary key index B+ tree in InnoDB organizes and queries data.

To sum up:

InnoDB storage engine’s smallest storage unit is a page, which can be used to store data or keys + Pointers. In the B+ tree, leaf nodes store data, and non-leaf nodes store keys + Pointers.
The index organization table determines which page the data is in through the binary search method of non-leaf nodes and Pointers, and then finds the needed data in the data page.

So back to our original question, how many rows can a typical B+ tree hold?

Here, we first assume that the height of B+ tree is 2, that is, there is a root node and several leaf nodes. Then the total number of records stored in this B+ tree is: the number of Pointers to the root node * the number of records in a single leaf node.

It has been stated above that the number of records in a single leaf node (page) =16K/1K=16. (Here we assume that the data size of a row is 1K, which is usually about 1K for most Internet business records.)

So now we need to figure out how many Pointers can a non-leaf node hold?

In fact, this is also easy to calculate, assuming that the primary key ID is bigint and the length is 8 bytes, and the pointer size is set to 6 bytes in the InnoDB source code, making a total of 14 bytes

How many of these cells we can fit on a page is essentially how many Pointers we have, 16384/14=1170.

Then a B+ tree of 2 height can hold 1170*16=18720 such data records.

According to the same principle, a B+ tree of height 3 can hold 1170117016=21902400 such records.

So in InnoDB, the B+ tree height is usually 1-3 layers, which can accommodate tens of millions of data storage.

When looking for data, a page lookup represents one IO, so a primary key index query typically takes only 1-3 IO operations to find the data.

How to find InnoDB primary key index B+ tree height?

The height of a B+ tree is usually 1-3.

In the InnoDB table space file, the convention page number is 3 to represent the root page of the primary key index, while the root page offset is 64 to store the page level of the B+ tree.

If page level is 1, tree height is 2, and Page level is 2, the tree height is 3. Page level =page level+1; Let’s try to find this page level in the real world.

Before the actual operation, you can check the page number of the root page of the primary key index to be 3 from the InnoDB metadata table or from the book “InnoDB Storage Engine”.

It can be seen that the page number of primary key index root page of customer table and LINEItem table under database DBT3 is 3, while the page number of other secondary indexes is 4.

For details about the difference between secondary index and primary key index, refer to related MySQL books.

SQL > alter table space files;

Since the root page of the primary key index B+ tree starts on page 3 of the entire table space file, its offset in the file can be calculated: 16384*3=49152 (16384 is the page size).

In addition, the page level value is stored in the first two bytes of the 64 offset of the root page as described in the InnoDB Storage Engine

So THE offset I want for page level throughout the file is: 16384*3+64=49152+64=49216, in the first two bytes.

Now use hexdump to query data at the specified offset of the tablespace file:

The page level of the linetem table is 2, and the B+ tree height is page level+1=3.

The page level of the region table is 0, and the B+ tree height is page level+1=1.

Customer table page level = 2, B+ tree height = page level+1=3;

The data volumes of the three tables are as follows:

Conclusion:

The number of rows in the LINEItem table is more than 6 million and the B+ tree height is 3. The number of rows in the Customer table is only 150,000 and the B+ tree height is also 3. You can see that the height of both table trees is 3, despite the large difference in data volume

In other words, there is not much difference in the efficiency of the two tables through the index, because only 3 IO are required for each table. So if you have a table with 10 million rows, and its B+ tree height is still 3, the query efficiency is still not much different.

The region has only 5 rows of data, and of course its B+ tree height is 1.

The interview questions

Why MySQL index uses B+ tree instead of some other tree structure? Like a B tree?

Refer to this article for a more complex version of this problem;

The short answer is:

Since B tree will store data regardless of leaf node or non-leaf node, the number of Pointers that can be saved in non-leaf node is reduced (some data are also called fan-out).

If a large amount of data needs to be stored with a small number of Pointers, the tree height can only be increased, resulting in more I/O operations and lower query performance.

summary

In this paper, starting from a problem, gradually introduced InnoDB index organization table principle, query way, and combined with the existing knowledge, answer the question, combined with practice to prove.

Of course, for the sake of simple expression, some details are ignored in the paper. For example, it is impossible for all space in a page to store data. It also stores a few other fields, such as page level, index number, etc., and the padding factor of a page also makes it impossible for a page to store all data.

Interviewer: Why do MySQL indexes use B+ trees instead of other trees? Like a B tree?

Why so many?

The interview questions

summary

Related Posts

E-commerce project optimization, RabbitMQ, Git, OSI, VIM, Intellj IDEA, HTTP, JS, Java

JMeter distributed lightweight log solution for performance monitoring

A preliminary study on Git version control