Talk about your personal understanding of the B+ tree

Today is the first day of the rest of your life. Today is the first day of the rest of your life.

preface

Gold three silver four, job seekers more and more active, interviewers ask questions from a more difficult Angle. In a word, ask the underlying implementation. We all know that THE implementation of MySQL index uses B+ tree, this article is mainly to share a little personal understanding of B+ tree.

Why do MySQL indexes use B+ trees?

As we all know, there are many data structures that are more efficient to find, so why use B+ trees? First, briefly analyze several data structures with high query efficiency.

Hash table

The query performance of hash table is very good, the time complexity is O(1). However, hash tables do not support fast lookup of data by interval.

Jump table

A skip list is composed of multiple indexes on top of a linked list. It supports quickly inserting, finding, and deleting data in order (logn) time. Also, the hop table supports fast lookup of data by interval. All you need to do is locate the node in the list that corresponds to the starting point of the interval, and then from that node through the list to the node that corresponds to the end point of the interval, and the data that you get from the traversal is the data that satisfies the interval value.

Ps: Redis’ ordered set is built by the jump table + hash table

It looks as if a skip table could also satisfy the index requirements. Maybe because there was a B+ tree and then there was a skip list, so that’s why I chose B+, right?

Balanced binary lookup trees

Balanced binary search tree query performance is high, the time complexity is O(logn). Furthermore, an in-order traversal of the tree yields an ordered sequence of data from small to large, but this is still not enough to support a quick lookup of data by interval.

And we need to consider the DISK IO, data index is stored on the disk, when the use of the index, the index will be added to the memory, (memory access speed exceeds the speed of the disk access) when the data is large enough, the index is added to the memory, obviously not realistic.

B tree

Now, if you look at the B tree, B tree is more divided than a balanced binary search tree, so it’s a multi-fork balanced search tree. So B has a lower height and more forks. If the binary search tree is compared to a person, then the person is tall and thin, B tree is short and fat, a true portrayal of hua, pierced the heart…

Ps: The disk reads and writes as many times as the tree is tall. And then there’s the mongodb index that uses B.

B + tree

B tree and B+ tree are actually a kind of tree, first from B tree to B+ tree layer by layer, step by step analysis.

The difference between B trees and B+ trees

B tree is a self – balancing multi – fork search tree. It has the following advantages:

Keep the keys in order to iterate in order
Use hierarchical indexes to minimize disk reads
Use partially filled blocks to speed up inserts and deletions
Maintain index balance through elegant traversal algorithm
Minimize space waste by keeping internal nodes at least half full.
You can handle any number of inserts and deletions.

B+ tree is equivalent to the upgraded version of B tree. Compared with B tree, B+ tree makes full use of node space and makes the query speed more stable. Its speed is completely close to binary search (binary search speed is not explained). B+ has the following advantages on the basis of B tree:

B+ tree has fewer layers: B+ tree stores more key words per non-leaf node, and the tree has fewer layers so the query data is faster;
B+ tree query speed is more stable: B+ all keyword data addresses exist on leaf nodes, so the number of times of each search is the same, so the query speed is more stable than B tree;
B+ tree naturally has sorting function: all leaf node data of B+ tree constitute an ordered linked list, which is more convenient to query data within the size range. Data tightness is high, and the cache hit ratio is higher than that of B tree.
B+ tree traversal of all nodes is faster: B+ tree traversal of the whole tree only requires traversal of all leaf nodes, instead of traversal of each layer as B tree does, which is conducive to full table scan of the database.

conclusion

Because of this nature of B+ trees, this data structure is often used in file systems and database indexes. By expanding the storage number of each node, continuous data can be located and accessed quickly, effectively reducing the search time, improving storage space locality, and reducing I/O operations.

B+ tree

The number of neutron nodes per node cannot exceed M and cannot be less than m/2
An exception is that the number of children of a root node may not exceed m/2
The m-tree only stores indexes, it doesn’t really store data, it’s kind of like a skip table
The leaf nodes are linked together in a linked list to facilitate searching by interval
In general, the root node is stored in memory and the other nodes are stored on disk

Q: Why does Mysql use B+ tree instead of B tree

BB for a long time, the key to.

1. Impact of IO on performance

Each node of the B tree stores data, whereas the B+ tree stores data only on the leaf nodes, so for the same amount of data, the B tree has a higher height and more IO (usually height =IO number).

2. For storage and use

Database indexes are stored on disks. If the amount of data is too large, the entire index cannot be loaded into memory. Instead, each disk page (corresponding to the node of the index tree) can be loaded in fragments.

3. Further optimization of mysql base

The leaf node is a two-way linked list, and the head node and the tail node of the list are also circular pointing.

end

Your likes and attention are the biggest support for me, thank you!