It’s time to brag with friends every week. As a rookie, there are a lot of things to learn every day. Today, I want to share with friends an article I saw before.
1. Differences between B trees and B+ trees
B trees and B+ trees are both balanced trees (not balanced binary trees, but balanced multipath lookup trees). A B-tree is also called a B-tree. Each node of a B-tree contains at most K child nodes. K is the order of the B-tree, which is determined by the size of the disk page. A B-tree is a self-balancing tree. After a node is added or deleted, it is balanced according to the rules of the B-tree. Can have a look at this article, figure, can turn a boring thing, very interesting draw, www.jianshu.com/p/8b653423c…
A B+ tree, like a B tree, has the same principle and the same storage structure, except that the non-leaf nodes of a B+ tree don’t store data, they just store indexes of leaf nodes, and all data is stored on leaf nodes. What’s the good of that? There are mainly the following nodes:
-
B+ tree has better disk read capability and better I/O performance
B + tree of all data is stored in a leaf node, not a leaf node only store index data and a pointer, so compare B tree each node to store data, hold more index, index pointer obviously can be stored in disk pages under the condition of the same size, each a leaf node B + tree can hold more index, You can also find the data you need to find faster.
-
B+ tree query efficiency is more robust
B + tree of all data, stored in a leaf node, query any data, all need to fall on a leaf node, compared to B tree data on each node, the performance of B + tree is more stable, and more robust, may in some scenarios below, B tree is better than high performance, but a large number of data samples, B+ trees are more stable than B trees.
-
B+ trees have better full table search capability
All data of a B+ tree is stored on leaf nodes. In a full table scan, only all leaf nodes need to be scanned. Data of a B tree is also stored on non-leaf nodes.
2. Advantages of B+ trees
The main advantage of B+ trees, as mentioned above, is that each non-leaf node only stores index data, so it can store more indexes, reduce the height of the tree, and reduce the number of I/OS.
3, why B+ tree index at ten million level, relatively high efficiency
Let’s start with a concept:
- Page: InnoDB stores and exchanges data in pages. Each page is 16K in size
- Pointers: Mysql Pointers are typically 6 bytes in size
As shown in the figure above, this is an innoDB B+ tree index. The non-leaf nodes of the B+ tree only store indexes and Pointers, and only leaf nodes can store actual data.
Let’s assume that a B+ tree is 2 in height, each index size of mysql is 8 bytes, pointer size is 6 bytes, and the data size of a row is 1K. Number of data that can be stored = number of Pointers to the root node * number of rows recorded on a single leaf node.
A B+ tree of height 2 has 16K Pointers /14 bytes, which is about 1170 Pointers. Each 16K page can hold 16 lines of records, so it can eventually hold 1170*16 = 18720. If you follow this theory and expand to a B+ tree of height 3, It can store 1170 * 1170 * 16 =21902400 records. If it is extended to 4 layers, it becomes 1170 * 1170 * 1170 * 16, which is about 250 million data
When height is 3, each data is retrieved 3 times, and only 3 disk dispatches are needed to get the data you want. Although B+ trees of height 3 and 4 only increase the height by 1, the stored data missing is increased by more than 1000 times. Considering the efficiency of retrieval and the performance of disk IO, mysql performs better at ten million level.
4. Some thinking
The size of the index will affect the number of Pointers stored on each node. If an index is very large, the number of Pointers will be smaller, and the height of the B+ tree may be higher, which will inevitably affect the efficiency of query and IO. Therefore, in many materials or specifications, it is recommended that Mysql index length not set too long, and at the same time, it is recommended to use fixed length index and numerical index, try not to use VARCHAR as the primary key index, the reason should also affect the number and size of Pointers to compare efficiency.
5. Conclusion
I feel that my friends can finish reading this article and gain the support of my friends, which will be my biggest motivation and the biggest motivation to continue to learn and move forward