Sit right into the flower really angry, go out a smile across the river
preface
Why does MySQL use B+ trees as indexes?
If it is pure guess why MySQL database index uses B+ tree? So the answer to this question is always going to be around what are B+ trees and what are their advantages.
(This is not the way I started to think, I read a lot of articles from this dimension of the questions, the answers let me disappointed. Until the other day I asked the 5 year old programmer sitting next to me who was always looking for fish; His languid reply: Why do you think it’s a tree structure? Hey, hearing this answer, suddenly opened my mind, a little interesting!
Let’s get rid of all the preconceived answers about what a B+ tree is, what it’s good for.
(I didn’t want the hard answer shoved down my throat.)
What I want is why?
Why choose tree structure when there are so many data structures available for MySQL index? Why so many tree structures? Why B+ trees?
That’s what I have to figure out! What I want is not just the answer but the context behind it!
I want not only to know what it is, but also to know why.
Of all the data structures, why a tree?
At the logical level, numerous data structures can be divided into linear structure and nonlinear structure.
Linear structure: array, linked list, based on their derived hash table (hash table also known as hash table), stack, queue and so on.
The nonlinear structure includes tree and graph.
There are other data structures such as: skip tables, bitmaps are also evolved from the basic data structure, different data structures exist to solve some scenarios.
If we want to know what data structure index is suitable for, then we need to solve what problems (pain points) and need to play what role, and then look at what data structure to use; The latter is the effect; the former is the cause.
MySQL stores data on disk. Even if the device is powered down, data stored on disk is not affected and data loss is guaranteed. This means that MySQL data on disk is persistent.
But data storage on disk comes at a cost, in the form of millisecond processing speed, which is nothing compared to the nanosecond speed of memory.
You may have no idea what units of time are, but a millisecond is tens of thousands of times slower than a nanosecond.
Although the index is stored on the disk, when you use the index to search for data, you can read the index from the disk and save it to the memory, and then find the data from the disk. Then the data read from disk is also put into memory.
Indexing allows disk and memory to join forces, taking advantage of memory’s ride and experiencing nanosecond speed.
However, no matter how optimized the query process is, as long as the root is still on the disk, it is inevitable that multiple disk I/ OS will occur, and the more disk I/ OS occur, the more time will be consumed.
(For those of you who are smart, this is actually a pain point that needs to be addressed.)
- Do as little I/O on the disk as possible.
But there are so many data structures to choose from.
Since the purpose of the index already determines which data structures are available, you can narrow down the selection of other data structures.
Start with the primary purpose of why the index itself is built.
- Be able to quickly and efficiently range searches by interval.
Of course, dynamic data structures that support efficient range queries in addition to their primary purpose can also have operations such as insert updates.
So what other data structures can you think of besides tree structures that satisfy these two main conditions?
Hash table, hop table
Hash table
Let’s take a look at a hash table, which we’re all familiar with, because the physical storage of a hash table is an array, and an array is a contiguous address space in memory. Data is stored as keys and values. Hash tables have precise queries, so the time complexity is O(1).
And what makes a hash table so fast is that it calculates the index of the array using the Key to find the Value quickly. The simplest method of calculation is the remainder method, which first calculates the HashCode of the key, and then complements the HashCode based on the array length of the hash table. The remainder obtained from the remainder is the array subscript, and finally accesses the key and Value stored in the hash table from the subscript.
But the subscript computed by Key might be the same, for example, HashCode 1010 mod 6 is 2, but HashCode 1112 mod 6 is also 2. The hash algorithm randomly calculates the length of the remainder of HashCode and may have the same array subscripts, which is called hash conflict.
Hash conflicts are usually solved by linked lists. When a hash conflict occurs, data elements with the same subscript are replaced with storage Pointers, and data elements with different keys are added to the linked list. The search is done by traversing the list with Pointers and matching the correct Key.
As you can see above, Key is a “rough seed” string and Value is “don’t forget to follow, like, comment”. We do this by calculating the integer int whose key is HashCode (1010). Then use HashCode (1010) to mod a hash array of length 6 to produce 2, whose Value element is (Key = “a wild seed “,Value =” don’t forget to follow, like, comment “).
Although the hash table can be efficient equivalence query. Such as SQL:
select * from weixin where username ="A fierce seed."Copy the code
However, interval query is not supported. Such as SQL:
select * from weixin where age < 18
Copy the code
So if the hash table is used as an index, when doing a range query it means a full scan. However, NoSQL databases like Redis, whose storage form is KV, are suitable for equivalent query scenarios, but that is another topic.
Jump table
Now let’s look at the skip watch. The jumper seems to be a relatively unfamiliar data structure for us, but it is one of the more commonly used data structures in Redis. The underlying essence of a hopelist is an ordered linked list that can be searched in binary order. And add an index layer to the base of the list. It supports dynamic operation such as insert and delete as well as efficient query by interval. The time complexity of search, insert and delete is O(logn).
To understand a skip list, let’s look at linked lists. Assuming that linked lists store ordered data, if we want to query one piece of data, in the worst case, we have to go through the entire list from scratch in order n time.
As shown in the figure below, a skip list is an index layer on top of a linked list. Can play the role of support interval query.
As shown in the figure above, if we want to query a node of 26, the hop table can traverse the index layer first. When traversing the 21 nodes in the index layer, it will find that the next node of the index layer is 36 nodes. Obviously, the node of 26 to be searched is in this interval. At this point, we only need to move down to the original chain layer through the index layer pointer to the original list, and we only need to traverse 2 nodes to find 26. If you had to traverse 10 nodes using the old list, now you only traverse 8 nodes.
In the picture below, a picture is worth a thousand words. When the amount of data is large, a linked list containing multiple nodes can be highlighted to see the advantages of the index layer after the establishment of a five-level index. At the same time, note the rule that “with a layer of index, the number of nodes that need to be traversed is reduced and the query efficiency is improved.” (From the user’s point of view, the skip list guy is basically telling the linked list where to start looking faster.)
Seeing this, it seems that a jumper is also a good data structure to use as an index. But don’t forget the first condition, “do as little I/O on the disk as possible.” However, the hop table obviously fails to meet this condition. As the amount of data in the hop table increases, the index layer will also increase, which will increase the number of I/O reads, thus affecting performance.
So let’s go back to the question of “Of all the data structures, why a tree?” We found that hash tables and skip tables do not well meet the two main conditions for solving disk pain points and indexing purposes. So let’s see why we chose a tree.
Tree structure
Let’s take a look at the components of a tree in real life, starting with roots, branches, and leaves. The tree structure is also abstracted into the same, the top of the tree structure is the root node (root), the left node is called the left subtree, the right subtree corresponds to the node on the right, the end of the tree without nodes is called the leaf node.
The hierarchy of the tree can be divided into upper and lower levels and same level nodes. As shown in the figure below, D and E are the child nodes of NODE B, so node B is their parent node, and node C, which is on the same level with node B, is the brother node of node B.
At the same time, the maximum number of levels of the tree is called the height (depth) of the tree. The height of the tree in the figure is 3.
In terms of the hierarchy of the tree structure, in fact, the tree structure is not a bit similar to the previous hop list. The reason why jumpers are so fast is because they have an index layer that can be queried efficiently by interval.
The tree structure determines the nature of the traversal of the data itself to support the interval query. Plus, trees have the advantage of being non-linear compared to linear arrays, which don’t have to be contiguous. So when a tree inserts new data, it doesn’t have the overhead of moving all the data nodes behind it, like when an array inserts new data. Therefore, the tree structure is more suitable for inserting dynamic data structures such as updates.
Tree structure in order to meet the index purpose and other conditions, as to reduce the pain point of disk query operation in fact, we can choose the data structure based on tree structure.
That many tree structures? Why B+ trees?
Out of all the tree structures, besides B+ trees, what other tree structures can you think of? Binary tree search tree, self-balanced binary tree, B tree.
Binary tree
Before understanding binary search tree or self-balanced binary tree, we need to simply know what is binary search tree, what is binary search tree. Because if you look at a binary search tree it’s just a combination of these two trees.
Let’s take a look at a binary tree. The tree structure of a binary tree defines that each node can have 0 or 1 child node, but no more than 2 child nodes.
There are two forms of binary tree: full binary tree and complete binary tree
Full binary tree
A full binary tree is defined as a tree in which all non-leaf nodes have left and right child nodes, and all child nodes are at the same level.
Complete binary tree
A complete binary tree is defined as a complete binary tree if all the nodes of the tree are the same as the nodes of a full binary tree of the same depth. The diagram below.
Binary search tree
Next we will simply look at the binary search tree, the binary search tree of the “two” is not the two, because the “two” that can be said to represent the binary tree, can also represent binary search, because the binary search tree is binary also fusion binary search.
First let’s take a look at binary search, binary search can avoid ordered array traversal query, because we know that this case if you want to find a number of the worst case is O(n) complex, the overall query efficiency is not high. If the array is ordered, the query scope can be halved by binary lookup, which is O(logn). As shown in the figure below.
Therefore, the binary search tree is different from the ordinary binary search tree, which puts the elements smaller than the root node in the left subtree, while the right subtree is the opposite of the element magnified to the root node. (In plain English, the root node is the median of the left and right subtrees, the left is less than the median, and the right is larger than the median, which is not a binary search algorithm.)
As shown in figure, the binary search tree in finding the data, the only need to find elements, comparing with the tree node element, when the element is greater than the root node is the right subtree search, element is less than the root node is the left subtree search, element if happens to be the median is exactly the root node, the binary search trees have efficient query.
However, binary trees also have obvious disadvantages. In extreme cases, if the smallest or largest element is inserted each time, the tree structure will degenerate into a linked list. The time complexity of querying the data is O(n), as shown below.
When binary lookup trees degenerate into linked lists, we all know that linked lists are not only inefficient queries but also add disk IO operations, so we have to cross the next tree data structure.
Self-balanced binary tree
Self-balanced binary tree is to solve the problem that binary search tree degenerates into a linked list under the extreme. Self-balanced binary search tree is also called balanced binary search tree (AVL tree).
(You can see the evolution from simple binary trees to binary lookup trees to self-balanced binary trees. A simple thing gradually becomes more complex. If we only know the answer, we won’t know the whole story.
In fact, balanced binary search tree is to add a constraint on the basis of binary search tree: make the height difference between the left and right subtrees of each node not more than 1. In this way, the left and right subtrees are balanced, and the time complexity of the query data operation is O(logn).
As shown in the figure below, a balanced binary lookup tree maintains self-balancing element data for each insertion.
A comparison of ordinary non-binary trees and balanced binary trees is shown in the figure below.
Of course, there is the red-black tree common in Java collection classes, which is also a self-balanced binary tree.
However, no matter whether a self-balanced tree is a balanced binary search tree or a red-black tree, the height of the tree will increase as the number of inserted elements increases, which also means that the number of disk I/O operations will increase, affecting the overall query efficiency.
B tree
We usually see B+ trees and B- trees, we can’t help but read B- tree as “B minus tree “, but the B- tree is just a hyphen, so B- tree is called B tree.
Self-balancing binary tree while to find the time complexity of O (logn), the front also said it is itself a binary tree, each node can only have two child nodes, then increases as the amount of data, the number of nodes, the tree height also increases (that is, the deeper the depth of the tree), increase the number of disk I/O, affect the query efficiency.
So if you look at the progression of binary trees in the tree structure, binary trees create new bugs (or pain points) every time they solve a new problem.
Just as Lu Xun once said, “when a problem is solved, new problems will appear.”
Seeing this, it’s not hard to guess that the emergence of B trees solves the problem of tree height. The b-tree, rather than the “XXX binary tree” in its name, is no longer limited to two children in a parent node, but allows M children (M > 2). Moreover, a node of a B-tree can store multiple elements, which reduces the height of the tree compared to the previous binary tree data structures.
The nodes of a B-tree can contain multiple byte points, so a B-tree is a multi-fork tree, and the maximum number of child nodes contained in each node is called the order of the b-tree. The figure below is a b-tree of order 3.
Each node in the figure above is called a page. The basic unit of data read in mysql is a page, and a page is the disk block we mentioned above. The P node in the disk block is a pointer to the child node. Pointers are present in the tree structure, as they were in the previous binary tree.
So let’s take a look at the figure above. What happens when a b-tree of order 3 looks for elements of 90?
Start from the root node, that is, disk block 1. Determine that 90 is between 17 and 35. Locate disk block 4 using pointer P3 in disk block 1. Again, compare between 65 and 87 in disk block 4, and finally find disk block 11 in pointer P3 of disk 4. I’m going to find a key that matches 90.
It can be found that when a third-order B-tree searches for leaf nodes, the tree height is only 3, so the search process only needs 3 disk I/O operations at most.
It may not be true when the data is small. However, when the amount of data is large, the number of nodes will also increase. In the case of the self-balanced binary tree, the tree can only have two leaves at most and can only expand child nodes vertically. Therefore, the height of the tree will be very high, which means more disk I/O operations are required. B trees, on the other hand, can reduce the height of the tree by extending nodes horizontally, so the efficiency is naturally higher than that of binary trees. (To put it bluntly: to get fat.)
Seeing this, I’m sure you also know that if B trees are so good, there’s nothing going on with B+ trees.
And then, why don’t we use B plus trees instead of B trees?
As you can see, the B-tree already satisfies all of our requirements, reducing disk I/O operations and supporting interval lookups. Note, however, that while b-trees support interval lookups, they are not efficient. For example, in the above example, b-tree can efficiently query the value of 90 through equivalent value, but it is not convenient to query the results of all numbers within the range of 3 to 10 within a period. Since the middle order traversal is required for b-tree range query, the parent node and child node need to constantly switch back and forth, involving multiple nodes, which brings a lot of burden to disk I/O.
B + tree
B+ tree can be seen from the symbol of + is the upgraded version of B tree, the index underlying data structure of innoDB engine in MySQL uses B+ tree.
A B+ tree performs an upgrade over a B tree: none of the non-leaf nodes in a B+ tree store data, but serve only as indexes. The leaf node holds all the data for the entire tree. Leaf nodes form an orderly linked list from small to large, which points to adjacent leaf nodes, that is, an ordered bidirectional linked list is formed between leaf nodes. The structure of B+ tree is shown below.
(B+ tree is not a bit like the front of the hop table, data is data, the upper layer is formed by the bottom interval index layer, but it is not like the hop table is longitudinal expansion, but horizontal expansion of the “hop table”. This has the benefit of reducing disk I/O and improving range lookup efficiency.
Next, let’s look at the insertion and deletion of THE B+ tree. The B+ tree has a large number of redundant nodes. It can be found from the above that all the elements of the parent node will appear in the child node, so when deleting a node, it can be directly deleted from the leaf node, which is faster.
Compared with B+ trees, B trees have no redundant nodes and complex tree deformation will occur when nodes are deleted, while B+ trees have redundant nodes and will not involve complex tree deformation. The same is true for B+ tree insertions, which involve at most one branch path of the tree. B+ trees also do not need more complex algorithms, like black mangrove rotation to automatically balance.
conclusion
It is a question from the title of the article. We did not directly answer why MySQL uses B+ tree as index. Instead, we asked two questions to me, but I don’t know if I have any questions to you. Another question is why B+ trees, of all the tree structures?
To get the result, we need to know the cause, we start from two aspects, because MySQL data is placed on disk, and disk processing speed is millisecond level, if the disk IO do too many queries, will bring the query burden, so to minimize the disk I/O operation to do the query. The other is from the primary purpose of the index itself, to be able to range lookup efficiently by interval.
With the cause, we can start to explore the effect, so we can answer the first question, “Of all the data structures, why a tree?” Among other data structures linear by logical structure are hash tables and hop tables. Hash table base on array, although can be efficient query, but only equivalent query, and cannot support range query. The bottom layer of the hop table is a linked list. Efficient interval query can be realized through the index layer, but with the increase of data volume, the index layer also increases with the increase of data volume. So the use of tree data structure, tree structure of its characteristics determine the traversal of data itself is purely natural support by the interval query. The tree structure does not have the overhead of a linear array for insertion and other operations, so it is more suitable for inserting and updating data structures for dynamic operations.
Then we asked, “Why B+ trees, of all trees?” From the binary tree structure, the parent node can only have two child nodes at most, and then from the binary tree plus binary search tree, the binary tree shows efficient query ability. But binary search trees degenerate into linked lists in extreme cases. So step up to a self-balanced binary tree, where the difference between the left and right subtrees of each node cannot be greater than 1. However, a binary tree can contain a maximum of two child nodes. Therefore, a high tree height may cause excessive I/O query operations on disks.
So we finally get to B tree, which is a multi-fork tree, but it can only be efficient single query, not efficient interval query. B+ tree is an upgrade of B tree. All non-leaf nodes are used for indexing. Only leaf nodes store data and are ordered bidirectional linked lists.
Well, that’s all for today.
Okay, actually, it’s been here for a couple of weeks.
I am a fierce and agile seed, afraid of what infinite truth, into an inch, have into an inch of joy. Welcome to: follow, like, favorites and comment, and we’ll see you next time!
The article continues to update, you can search wechat “a fierce and agile seed” to read the first time.