The introduction
I believe that every background development engineer in the interview process, have been asked “MySQL’s default storage engine is what? What data structure is a MySQL index?” Questions like this. I believe that everyone who is well prepared can easily answer the question “MySQL’s default storage engine is InnoDB and MySQL index uses B+ tree”. That’s the answer. But why did the programmers who wrote MySQL design it this way?
An overview of the
First of all, it needs to be clear that MySQL has no direct relationship with B+ tree. What is really related to B+ tree is InnoDB, the default storage engine of MySQL. The storage engine in MySQL is mainly responsible for data storage and extraction. MySQL also supports engines such as MyISAM as the underlying storage engine for tables. But no matter InnoDB or MyISAM, the data structure of index is B+ tree. But InnoDB uses a clustered index, and the actual data is on the B+ tree leaf nodes; MyISAM creates a separate index for the primary key of the table, and the leaf node holds Pointers to the actual data.
Next, let’s discuss why MySQL uses B+ trees.
Start with disk I/O
1. Basic concept of disk
Let’s go back in time to when the programmer uncle designed MySQL. When it comes to storage media, we can think of two types: disk and SSD hard disk. SSD hard disk is certainly sweet, but also expensive, and the database is to support the storage of T data, 2021 think this road is too expensive, you still have to use the disk ~
The structure of a traditional hard disk is shown in the figure above. It has one or more platters, each of which can have two sides and is used to store data. There’s a principal axis in the middle, and all the plates are going around that principal axis. A composite arm has multiple magnetic head arms, and each magnetic head arm has a head, which is responsible for reading and writing data.
As shown in the figure above, the surface of each disk is divided into several narrow concentric rings, on which data is stored. Such rings are called tracks. Depending on the size of the hard drive, the number of tracks can range from hundreds to thousands. Each track can store several kilobytes of data, but the computer doesn’t have to read and write that much data at a time.
Therefore, each track is divided into several arcs, each of which is a Sector. Sectors are the physical units of storage on a hard disk, and it is now an industry convention that each sector can store 512 bytes of data. That is, even if the computer only needs one byte of data, it has to read all 512 bytes of data into memory and then select the desired byte.
Cylinder is an abstract logical concept. Simply speaking, a track in the same vertical region is called a cylinder. As shown in the figure above, the collection of tracks at the same position on each disk belongs to the same cylinder.
It is important to note that the disk reads and writes data on cylinders. When the magnetic head reads and writes data, it starts at the starting cylinder of the disk, and then works its way down to different disks on the same cylinder. The head is not transferred to the next cylinder until all heads on the same cylinder have been read and written. The read-write design is made because selecting the head (from which head the data will be taken) is done electronically, while selecting the cylinder (from which head the data will be taken) is done mechanically. And the speed of the mechanical switch is certainly far less than the electronic switch. In order to read and write more quickly, data is read and written on cylinders instead of disks. Because of this, it is valuable to store data on the same cylinder.
According to the above information, we can get the calculation formula of disk capacity as follows:
Hard disk capacity = number of disks × number of cylinders × number of sectors × 512 bytes
2. Disk reading and writing
Modern hard disk search uses CHS Head Sector. We can look at the disk read and write data in three parts.
- When the hard disk reads data, the read-write magnetic head moves along the radial direction and moves to the top of the track where the sector is to be read. This mechanical switching time is called seek time. The seek time varies due to the distance between the start position of the read-write head and the target position.
- When the head reaches a specified track, the disk rotates to cause the sector to be read to rotate below the read-write head, a time known as rotational latencytime.
- It also takes time to read and write data, which is called transfer time.
From this introduction, it is easy to understand that seek time and rotation delay time are particularly time-consuming. Because the goal of computers is higher, faster, and stronger. Database depends on computer storage, so MySQL uncle design data structure must also consider the characteristics of disk read and write, to design a data structure to make the query faster.
3. Continuous I/O over random I/O
As we all know, the function of database software like MySQL is actually divided into data storage and data query. Query data depends on saving data, and the way data is saved certainly affects the speed of the query. Since the data is stored on the disk, the computer memory must have to deal with the disk, and this process, along with the disk I/O. We can divide disk I/O into the following two types based on how we query the disk:
- Continuous I/O: The initial sector address given by this I/O and the ending sector address of the previous I/O are completely contiguous or not far apart.
- Random I/O: If the initial sector address given by this I/O is significantly different from the end sector of the last I/O, it will be counted as a random I/O.
Because for continuous I/O, the head hardly needs to change channel, or change channel for a short time; For random I/O, if there is a lot of I/O, the magnetic head will keep changing lanes, resulting in a great loss of efficiency. This is why continuous I/O is more efficient than random I/O.
Because read and write are storage-dependent and queries often have conditions, the data in the query is not contiguous. So MySQL people thought, can you design a storage way, avoid random I/O or reduce the number of random I/O, to improve the efficiency of the query?
Two, faster search – trees
As a programmer, the term “tree” must be familiar to all of you (what? You don’t know? Face the wall). The data structure of the tree is often involved in algorithm problems. The types of trees are generally:
- Binary (search/sort) tree: BST
- Balanced binary search tree: BBST
- Red-black tree: BRT
- B-trees (also called B-trees)
- B + tree
- B * tree
- R tree
This article will not cover the characteristics of each tree, but will open a new article to cover these trees and their characteristics in more detail. Because we are an article suitable for all ages, so we start from the principle of tree search ~
1. Binary (search/sort) tree lookup
You’ve all heard of binary trees, but it’s usually a root node, with a left child and a right child hanging from the root node. The left and right children can be the root of the subtree. If you add a few more requirements to this, it becomes a binary search tree (BST). The binary search tree is defined as follows:
- If its left subtree is not empty, then the value of all the nodes in the left subtree is less than the value of its root node.
- If its right subtree is not empty, then the value of all the nodes in the right subtree is greater than the value of its root node.
- Its left and right subtrees are binary search trees, respectively.
We can see from the above, the root node 5 left subtree of the value of any node is less than 5, 5 right son all nodes in the tree root nodes are greater than 5, and we with 2 or 7 as a root node, still can draw “left subtree all node values are less than the value of its root node, all right child the tree node values were greater than the value of its root node” this conclusion.
Because binary search trees have this property, let’s say we look for data 3. Our algorithm path is as follows:
- 3 is smaller than the root node 5. Comparing the left subtree, the temporary root node is set as the left subtree root node 2.
- 3 is larger than root node 2. Compare the right subtree, and the temporary root node is set as root node 3 of the right subtree.
- 3 is equal to the root of 3, find the target data.
As we can see from the query path above, we do not need to traverse all the nodes, and the binary search tree does not consume any extra space. Compared to traversal search, this way to find a specific value, the efficiency is greatly optimized. And remember, it’s not just numbers you can compare. Because Unicode,ASCII,UTF-8, and so on are computer codes that make characters comparable. If we are looking for a character or number, this approach can greatly reduce the query time.
2. B-Tree (B-Tree)
Although the binary search tree can optimize the query, but we have found a problem. The database needs to be able to process tens of millions of pieces of data. When the amount of data becomes extremely large, if we still use binary search tree to store data, then the binary search tree will become very, very high. And, because when we store data, we generally store it sequentially, that is, sequential I/O that performs a write. When querying, however, the data is not always ordered, so the data is read into memory for computational comparison, along with random I/O. Because a downward search of a binary tree is often a random I/O, if the tree is too high, then there will be a lot of random I/O, and the query efficiency will be reduced.
At this point, the smart friend is thinking, if the tree is “dumpy, dumpy”, reduce its random I/O, will be able to speed up the query!
Therefore, our B-tree is shining, and B-tree is also called B-tree. For a B-tree of order M (where a subtree has at most M children), it is defined as follows compared to binary search tree:
- Each node has at most m child nodes.
- Every non-leaf node (except the root) has at least Ceil (m/2) child nodes. (order 3 has at least 2 children, and order 5 has at least 3 children…)
- If the root is not a leaf node, the root has at least two child nodes. (Order 2 has at least two child nodes)
- A non-leaf node with k child nodes contains k -1 keys. (If he has k sons, then his node has k-1 tokens.)
- All the leaf nodes are on the same level, and the leaf nodes have only keywords and a pointer to the child is null
Note: Ceil is the furtherance of division. For example, if 7/6 results in 1 with 1 remaining, then we will output 2.
As shown in the figure above, this is a fourth-order B-tree. If we want to find data 19, we have the following path:
- 19<24, because the root node has only one key code 24, so directly compare the left subtree A;
- To judge whether 19<5 is true, the result is not true, so subtree B is not considered.
- If 5<19<13 is true and the result is not, then the subtree C is not considered.
- To judge whether 13<19<17 is true and the result is not, then the subtree D is not considered.
- To judge whether 17<19 is true and the result is true, then consider the subtree E;
- Because the subtree E is a leaf node, its children are null. Determine whether 19 exists in leaf node E, and the result is there. Find data 19. If 19 does not exist in leaf node E, then the data we are looking for does not exist.
You can find that with B-tree, MySQL can squeeze more data into the tree under the premise of “dumpy” tree, and can enjoy the advantages of improved efficiency of binary tree query. If we use the B tree as the index, the actual data corresponding to the destination key is stored in each node.
3. B + tree
Why not use B trees as an indexed data structure if B trees make MySQL query faster? This is because our B+ tree is an advanced version of a B+ tree.
- The leaf node contains all the keyword information, and the actual data information for these keywords. Keywords for leaf nodes are also incrementally linked. The data at the end of the left holds a pointer to the beginning of the node on the right.
- All non-leaf nodes can be considered indexes. A non-leaf node contains only the largest or smallest keyword in its subtree. It doesn’t point to specific information.
- An intermediate node with k subtrees contains k elements (k-1 elements in the B-tree). Each element does not hold data, but is only used as an index. All data is stored in the leaf node.
- The largest element of the root node is the same thing as the largest element of the entire B plus tree. No matter how many elements are inserted or deleted in the future, always keep the largest element in the root node.
As shown in the figure above, this is a B+ tree. Through such a design, we can find:
- A single node stores more elements, which allows us to make the tree more squat, resulting in fewer IO queries.
- Because the data that appeared in the whole tree appeared in the lowest leaf node, and only in the leaf node will store the target data information, so the query performance is more stable;
- In a leaf node, the left leaf node points through a pointer to the right leaf node. Then all the leaf nodes form an ordered linked list, which is convenient for range query.
4. Why not use Hash
As you can see from the above, if you use a B+ tree as MySQL’s data store, the time complexity will be order log n, which is the height of the tree. However, if we query a specific data as a Hash, the time complexity can reach O(1). So why aren’t MySQL guys thinking about this design? We can look at the following SQL:
SELECT * FROM class WHERE teacher = 'yuann' ORDER BY id DESC
SELECT * FROM class WHERE student_number > 50
The above two SQL addresses sorting and range queries. We know that a Hash is computed to get the target data, and that the computed result is often a point. Obviously, a hash index is not a quick way to handle sorting and range queries. The query will fall back to a full table scan and determine if the criteria are met. Obviously, full table scans are a bad situation, so MySQL uncleers don’t use Hash as indexes.
Why does MySQL index use B+ tree
- The computer reads and writes the hard disk data through the magnetic head search, in the track through the rotation to find the corresponding location of the data, data from the hard disk to the memory has three stages: head search, disk rotation and data transfer. Steps 1 and 2 are particularly time-consuming. So you can save a lot of time by minimizing the number of head moves you can make while reading and writing information. And every time the head moves, it corresponds to every time the B tree goes down to find children. Because the B type tree has a structure of multiple keywords under the same node, the height of the tree can be reduced, so the query efficiency is improved.
- Because B+ tree data exists in leaf nodes, the query efficiency is more stable than B tree.
- For the database, range query and sort are very frequent. B+ tree traversal only needs to traverse leaf nodes compared to B tree traversal. Range query reduces random I/O. At the same time, Hash handling range queries and sorts will fall back to full table scans, which will be inefficient.
This article is licensed under a Creative Commons Attribution – Non-Commercial – Share Like 4.0 international license. When reprinting, please indicate the link of the original text. When using the picture, please keep all the content in the picture. It can be scaled appropriately and attach the link of the article where the picture is in the reference.
Author: Yuann
It’s Design — Why does MySQL use B+ trees for indexing?
Post Date: 2021-04-15