preface
Whenever we execute a SQL discovery is slow, we will subconsciously respond to whether to add an index, so we have wondered why the index will make the data search faster, the underlying index is generally stored in what structure, I believe we have read the title has the answer, yes! B + tree! So it is relative to the general linked list, hash, why most storage engines use it, today I will unveil the veil of B+ tree, I believe that read this article, B+ tree is no longer mysterious, to your understanding of the following high-frequency interview questions will be a great help!
- Why indexes often use B+ trees as the underlying data structure
- What other indexes do you know besides B+ tree indexes
- Why is it recommended to increment the id as the primary key
- What is page split, page merge
- How do I find row records by index
This article will explain B+ trees from the following aspects
- Define the problem
- Comparison of several common data structures
- What are the issues to consider when creating an index, and how to do it more efficiently
Define the problem
To understand why B+ trees are used at the base of an index, it depends on the problem it solves, and we can think about what SQL we use most of the time.
Suppose we have a table of the following users:
CREATE TABLE `user` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL COMMENT 'name'.`idcard` varchar(20) DEFAULT NULL COMMENT 'Identity Card No.'.`age` tinyint(10) DEFAULT NULL COMMENT 'age',
PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User Information';
Copy the code
Generally, we have the following requirements:
1. Query user information based on the user ID
select * from user where id = 123;
Copy the code
2. Search for user information based on interval values
select * from user where id > 123 and id < 234;
Copy the code
3, according to the REVERSE order of ID, paging out user information
select * from user where id < 1234 order by id desc limit 10;
Copy the code
From the above common SQL we can see that the data structure used for the index must meet the following three conditions
- Fast and precise lookup based on a value
- To quickly find the data of the interval based on the upper and lower limits of the interval values
- Index values need to be sorted and support fast sequential and reverse lookup
Let’s take the primary key index (ID index) as an example to see how to construct it with the corresponding data structure
Comparison of several common data structures
Now let’s think about which data structures satisfy these conditions
1. Hash table
Hash table (also known as hash table) is a data structure accessed directly according to the Key value. It allows the code value to be mapped to the corresponding position of the hash table through the transformation of the hash function, and the search efficiency is very high. Hash index is based on hash table implementation, assuming that we have established a hash index of the name, the search process is as shown in the figure below:
For each row data, the storage engine will be for all the indexed column name column of (pictured above) to calculate a hash code (above the position of the hash table), each element in the hash table pointer to data rows, because only store the hash value of the corresponding index itself, so the index of the structure is compact, the hash index search speed is very fast! But hash indexes have their own disadvantages, as follows:
- For hash indexes, only queries that exactly match all columns of the index are valid. For example, if I create A hash index on columns (A,B), I cannot use the index if I only query data column A.
- Hash indexes are not stored in index order, so they cannot be used for sorting, that is, they cannot be quickly looked up by interval
- Hash indexes contain only hash values and row Pointers and do not store field values, so you cannot use values in the index to avoid reading rows, although since hash indexes are mostly done in memory, this is not a problem in most cases
- Hash indexes only support equivalent comparison queries, including =,IN(), and do not support lookups of any range, such as age > 17
In summary, hash indexes can only be used in certain situations, and when used correctly, they can provide significant performance improvements. For example, There is a special feature called “adaptive hash indexes” in InnoDB engine. If InnoDB notices that certain index column values are being used frequently, It creates a hash index in memory on top of the B+ tree index, which gives the B+ tree the benefits of hash indexes, such as fast hash lookups.
2, list
Bidirectional lists support both sequential and reverse lookup, as shown below
But apparently does not support our said according to a certain value or range of quick search, and we know that the data in the table are increasing, the index is to insert the updated in time, the list clearly doesn’t support rapid insertion of data, so can be in the list on the basis of the modification, let it supports quickly find, update, delete. There is a structure that just fits our needs, and here we introduce the concept of a skip table.
What is a skip watch? Simply put, a jumper is a linked list with multiple layers of indexes on top. As shown in the figure below
Let’s say we want to find the interval between 7 and 13. Instead of starting from scratch, we just need to start at the secondary index in the figure above and iterate three times to find the interval position of the list. The time complexity is O(logn), which is very fast. In fact, it’s very similar to a B+ tree, except that B+ trees are evolved from balanced binary lookup trees, so let’s take a step by step look at how to transform a balanced binary lookup tree into a B+ tree.
First let’s see what is a balanced binary search tree. The balanced binary search tree has the following properties:
- If the left subtree is not empty, the values of all nodes in the left subtree are less than the values of its root node.
- If the right subtree is not empty, the values of all nodes in the right subtree are greater than or equal to the values of its root node.
- The absolute value (balance factor) of the difference in height between the left and right subtrees of each non-leaf node is at most 1.
Here is a balanced binary search tree
We can see from its properties that the time of finding nodes in a balanced binary search tree is O(log2n).
Now let’s turn it into a B+ tree
As you can see, the main difference is that all the node values are linked together on the last leaf node using a bidirectional linked list. Now, if we want to find the number between 15 and 27, we just need to find 15 (logn = 3 times) and then go back and forth until we get to 27. Then we can find the nodes in this interval, which perfectly supports the three requirements we mentioned: quick value lookup, interval lookup, and reverse order lookup.
So if I have 100 million nodes, how many times do I have to query each node, which is log2.1 billion at most, which is 27 times, if I have 100 million nodes in memory, then 27 times is not a problem, which is pretty fast, but a new problem arises, what is the size of these 100 million nodes in memory, so let’s do a quick calculation, Assuming 16 bytes per node, 100 million nodes would take up about 1.5 GIGABytes of memory! This is a terrible amount of space for a resource that is so valuable in memory, and this is just an index. Usually we define multiple indexes in a table, or define multiple tables in a library, and then the memory quickly fills up! So loading a B+ tree index in memory is obviously problematic.
It doesn’t fit, we can put it on disk, there’s a lot more space on disk than there is on memory, but here’s the problem, we know there’s a huge difference between memory and disk, usually in nanoseconds, and disk in milliseconds, reading the same amount of data, and the difference can be tens of thousands of times, So the 27 queries we calculated in the previous step would be very bad to look at on disk (finding a node is considered one disk I/O, which means 27 disk I/OS!). Can the 27 queries be optimized?
And you can obviously see that the number of queries depends on the height of the tree, and what does the height of the tree depend on, and what does the height of the tree depend on the number of children of each node, which is N in an n-tree, so let’s say we have 16 numbers, let’s build them out of binomial trees and quinomial trees, what are the heights of the trees
You can see that if you use a binary tree, you need to traverse 5 nodes, and if you use a five-tree, you only need to traverse 3 times, and you lose 2 disk I/OS. Going back to the 100 million nodes, how many I/OS does it take to build a 100-tree
As you can see, the maximum traversal is five times (in fact, the root node usually exists in memory, so it can be considered four times)! Disk I/O went from 27 to 5! The performance can be greatly improved, some people say that 5 times is still too much, can not change 100 fork tree to 1000 or 10000 fork tree, so that I/O times can not be further reduced.
Here we need to understand the concepts of page (page), in the computer, memory or disk operating system are read according to the size of the page (usually a 4 KB page size), disk read ahead, every time they read in advance will be continuous data read into memory, thus avoiding the IO for many times, this is the famous local principle in the computer, That I used a piece of data, most likely the data near the data will be used, simply load together, province IO slow speed, many times how much this continuous data, must be an operating system page size integer times, the continuous data is MySQL page, the default value is 16 KB, that is for B + tree node, It is best to set the page size (16 KB) so that a node in the B+ tree will only have one IO read.
InnoDB uses the pool buffer in memory to manage page data from disk. InnoDB uses the pool buffer in memory to manage page data from disk. InnoDB uses the pool buffer in memory to manage page data from disk. Pages that are too large can quickly fill up the cache pool, potentially causing pages to move in and out of memory and disk frequently, affecting performance.
Based on the above analysis, we believe that it is not difficult to guess how to set N in the n-fork tree, as long as the size of each node is equal to the size of a page (16KB).
Page split and page merge
Now let’s look at the question at the beginning, why recommend the increment id as the primary key, do not create a primary key, some people may say that the user id card is unique, can use it as the primary key, suppose to use id card as the primary key, what will be the problem?
B+ trees maintain index orderliness by updating the index every time a record is inserted or updated. Suppose the original B+ tree indexed by ID card is as follows (assuming a binary tree, only the first four bits of ID card are listed in the figure)
Insert id (3604) into db, update index (3604), insert id (3604) into db (3504)
If the id number 3604 is inserted after 3504, the number of elements in this node will be 3, which obviously does not meet the requirements of binary tree. This will cause page splitting, and the node needs to be adjusted to meet the requirements of binary tree
As shown in the figure, the binary tree condition is met after adjustment
This adjustment caused by page splitting is bound to lead to performance degradation, especially if id card is used as the primary key. Due to the randomness of ID card, it is bound to cause a large number of inserts in random nodes, which leads to a large number of page splitting, which leads to a sharp decline in performance. If the primary key is auto-increment ID, Since the id generated in the newly inserted table is larger than all the values in the index, it will either join the existing node (with less than one element) or join the new node (as shown below), so if the primary key is the increment ID, there will be no page splitting problem. Recommended!
When does a page merge occur? When a table record is dropped, the index is also dropped, and a page merge occurs, as shown in the figure below
When we delete the row with id 7 and 9, the index in the figure above needs to be updated. Delete 7 and 9. At this time, 8 and 10 should be combined to a node, otherwise 8 and 10 are scattered on two nodes, which may cause two IO reads, which is bound to affect the search efficiency! So when does a page merge happen? We can set a threshold, for example, for an n-fork tree, when the number of nodes is less than N/2, we should merge with nearby nodes, but it is important to note that the size of the elements in the merged nodes may be larger than N, resulting in page splitting. You have to adjust the parent node and so on to make it an N-tree.
How do I find row records by index
We believe that after reading the above B+ tree index introduction should also have a doubt, how to find the row record according to the corresponding index value, in fact, the corresponding row record is placed in the last leaf node, found the index value, also found the row record. Such as graphic
It can be seen that the non-leaf node only stores the index value, and only stores the row record in the last row, which greatly reduces the index size, and the row record can be found as long as the index value is found, which also improves efficiency.
Such indexes that store an entire row of records on a leaf node are called clustered indexes, and others are called non-clustered indexes.
Summary of B+ trees
To sum up, B+ trees have the following characteristics:
- The number of neutron nodes in each node cannot exceed N or be less than N/2 (otherwise it will cause page splitting or page merging).
- The exception is that the root node can have no more than m/2 children
- The M-tree stores only indexes and does not really store data. Only the leaf node in the last row stores row data.
- The leaf nodes are connected in series through a linked list, which makes it easy to look up by range
conclusion
SQL is commonly used in this paper by the daily 1 summarizes the characteristics of B + tree, believe that everyone should be on B + tree index have a clearer understanding, so why do we need to grasp the underlying original, learned B + tree, then look at a couple of questions in the beginning, in fact, so much, dig the ground floor, sometimes really can make you should be in constant change.
Finally, welcome everyone to pay attention to my official number “code sea”, common progress!
Shoulders of giants
www.rainybowe.com/blog/2016/0… Time.geekbang.org/column/arti…