Index Data Structure of B-tree and B+Tree

Scan the qr code below or search the wechat public account, you can follow the wechat public account, read more Spring source analysis, Java concurrent programming and Netty source code series articles.

The tree

Tree is a very common data structure, according to the number of child nodes, we can divide the tree into binary tree and multi-fork tree. A tree with at most two children of each node is called a binary tree. Typical binary trees include binary search tree, complete binary tree, full binary tree, binary balance tree, red black tree and so on. Trees with more than 2 children are called multi-fork trees. Common multi-fork trees include B trees and B+ trees.

B tree and B+ tree is a kind of multi-path search tree, which is evolved from binary search tree and is often used in the index structure of database. Besides, B+ tree and B tree have many similarities and are easy to be confused. Therefore, this paper puts them together for study and comparison.

B-Tree

B+Tree B+Tree B+Tree B+Tree B+Tree B+Tree B+Tree B+Tree = B+Tree = B+Tree = B+Tree = B+Tree = B+Tree = B+Tree = B+Tree

For tree data structure, there is a concept called degree (also called order) to describe the tree structure, which describes the number of neutron nodes of a node. For example, a binary tree, each node has at most 2 children, so the degree (order) of the binary tree is 2. For B-tree, there is also the concept of order. For example, a b-tree of order 5 means that each node has at most 5 children.

For a B-tree of order M, it has the following properties:

Each node has at most m children;
Every non-leaf node (except root node) contains at least M /2 children.
If the root is not a leaf, the root has at least two children;
For a non-leaf node, it can store m-1 keywords at most (the so-called keywords, we can understand as the data stored on the node);
On each node, all the keywords are in order, from left to right, from small to large;
The value of the left subtree of each keyword is smaller than the current keyword, and the value of the right subtree is larger than the current keyword.
Each node has an index and data (it is important to remember that this is one of the most important differences from B+ Trees described later).

From the above properties, for the root of b-tree, the number of keywords ranges from 1<= k <= m-1; Non-root, the keyword range is m/2 <= k <= m-1. With these properties in mind, let’s take a look at the insert, find, and delete processes of a B-tree.

1. Insert the

When inserting data into a B-tree of order M, in order to ensure the properties of the above B-tree, we need to insert keywords (data) according to the following rules: After inserting keywords into the current node, determine whether the number of keywords in the current node is less than or equal to M-1. If so, the insertion ends. Otherwise, you need to split the current node. How do you split it? Split at m/2 to form the left and right parts, that is, two new children, and then move the keywords at m/2 into the parent (split from the middle).

For a b-tree of order 5 (that is, non-root node, the number of keywords in the range of 2 <= k <= 4), we insert the following data into the Tree: 50,30,40,25. The process is as follows. First, insert 50, 30, 40 and 25 in sequence, and the node states are as follows.

Then, when we insert 20 more, 5 keywords are stored in the current node. Since the current tree is a fifth-order tree, each node can only store 4 keywords at most, so the current node needs to be split. How do you split it? It splits the node from the middle (m/2) into the left and right (rounded up to 3 by 5/2), so it splits the node from where the number 30 is, and then it puts the number 30 into the parent (since the parent is empty, it creates a new node and puts 30 into that node), Then, the two numbers 20 and 25 on the left of 30 form a new node, and the two numbers 40 and 50 on the right form a new node, which respectively point to the left and right sides of keyword 30. The schematic diagram is as follows:

Continue inserting data 15, 10, 13 into the tree. When we insert 13, we find that the number of keywords in the node exceeds M-1 again, so we need to split the node again. Now we split off the middle number 15, put it in the parent node, and the left and right parts form the new node.

Continue to insert data into the tree once more: 18, 60, 55, 45, 26, 17, 8, 3, 5. The structure of the B-tree is shown in the following figure.

If you want to experience the dynamic process of b-tree data insertion, you can go to the following learning website to manually insert data and experience the dynamic process of data insertion. (website: www.cs.usfca.edu/~galles/vis… A very good learning data structure and algorithm website)

2. Look for

The search operation of a B-tree is relatively simple, similar to that of a binary search Tree.

The search starts from the root node, traverses the keywords of the root node in turn, and finds the first keyword that is not less than the data to be searched;
Determine whether the data to be searched is equal to the current keyword, if so, return the data;
If not, it indicates whether the data to be searched is smaller than the current keyword. Therefore, enter the left subtree of the current keyword to search, and the search process is similar to that of the root node. Repeat the above steps.

Using the b-tree data above as an example, find the number 10. The search starts from the root node, and the first keyword not less than 10 is 20. Since the data 10 to be searched is not equal to keyword 20, the search enters the left subtree of keyword 20. At this time, the pointer points to the node where keyword 8 and keyword 15 are located. Therefore, the left subtree search of keyword 15 is entered. At this time, the pointer points to the node where keyword 10 and keyword 13 are located. It is found that keyword 10 in this node is equal to the data to be searched. (As shown in the figure below, red represents the path in the search process)

If you want the number 60, you do the same thing, you start at the root, and you find that all the keywords in the root are less than 60, so you go to the right subtree of the last keyword in the root, that is, the right subtree of keyword 20. Similarly, it is found that all the keywords in the node where keywords 30 and 50 are located are less than 60. Therefore, it enters the right subtree of the last keyword 50 of the current node to search, and finally finds 60 and returns data. As shown in the figure below, red represents the path during the search)

3. Delete

In the process of data insertion of b-tree, nodes will be split in order to meet the properties of B-tree. Similarly, in the process of data deletion, it may be possible to delete a keyword and not meet the properties of B-tree. Therefore, nodes will be merged in the deletion process. The deletion process is relatively complex, but in general, it can be boiled down to the following three scenarios.

If it is a leaf node and the number of keywords in the leaf node is not less than M /2 after the keyword is deleted, delete the keyword directly.
If it is a leaf node, the number of keywords in the leaf node is less than M /2 after the keyword is deleted. In this case, the property of b-tree is not satisfied, so you need to borrow the keyword from the brother node. If the number of keywords in the sibling is greater than m/2, then we can borrow, first move the parent node to the current node, and then move one of the keywords in the sibling to the parent node; If the number of keywords in a sibling node is less than or equal to m/2, and if the sibling node lends a keyword, then it has less than m/2 of its own keywords, which is not in line with the b-tree property, so it cannot borrow at this time, and then it needs to delete the keyword to be deleted, and then move the parent node here. The current node is then merged with its siblings.
If the keyword is deleted from a non-leaf node, delete the current keyword first, then use the smallest keyword in the right subtree to fill the current position, and then delete the newly added keyword from the right subtree. The deletion operation is the deletion operation of the B-tree. (The smallest keyword in the right subtree must be in the leaf node, so the deletion process is to delete the keyword in the leaf node, that is, the flow of scene 1 and scene 2).

The following describes the deletion process in the preceding three scenarios. The following data is used as an example.

Delete keyword 29 from b-tree. Since the node where keyword 29 resides is a leaf node, the number of keywords in the current node is 3 after it is deleted, that is to say, the number of remaining keywords to be deleted is not less than M /2 (5/2=2), which meets the scenario 1 mentioned above. Then the keyword can be deleted directly.

Delete keyword 55 from b-tree, because the node where keyword 55 resides is a leaf node, after deleting 55, the number of remaining keywords in the current node is 1, less than m/2, so you need to borrow the keyword from the sibling node. There are four keywords (40, 45, 47, 49) in the siblings of the current node, which are greater than m/2, so the keywords can be lent, which conforms to the first case of scenario 2. Therefore, delete keyword 55 first, then move keyword 50 from the parent node to the current node, and then move keyword 49 from the sibling node to the parent node, as shown in the diagram below.

Delete keyword 17 from b-tree. Because the node where keyword 17 resides is a leaf node, the number of remaining keywords in the current node is 1, less than m/2, so you need to borrow the keyword from the sibling node. There are two keywords (10 and 13) in the sibling of the current node, which are less than m/2, so the keyword cannot be lent, which conforms to the second case of scenario 2. Therefore, delete keyword 17 first, then move keyword 15 from the parent node to the current node, and then merge the current node with its sibling node (the node where keywords 10 and 13 reside). The schematic diagram is as follows.

Then we find that when we delete keyword 17, we remove a keyword from the parent node (the node where keyword 8 resides), and its parent node only has one keyword left. The parent node does not conform to the properties of b-tree, so we need to continue operation. Let the parent borrow the keyword from its sibling. The parent has no sibling on the left, so it borrows the keyword from the sibling on the right (key 30 and key 49). It turns out that the number of keywords in the right sibling node is not more than m/2. If the sibling node borrows the keyword, it does not meet the property again, so it meets the second case of scenario 2 mentioned above, so it needs to merge the nodes. So the next operation is: move the parent of keyword 8 to the node where keyword 8 is located, and then merge keyword 8 with the node where keyword 30 and 49 are located. The schematic diagram is as follows:

Then delete keyword 20 from b-tree. The node 20 is a non-leaf node, so it meets scenario 3. Therefore, delete keyword 20 directly, and then take out the smallest keyword 25 from the right subtree of 20 and fill it in the position of 20. Finally, delete keyword 25 from the node of the right subtree. The deletion process of keyword 25 can correspond to the deletion scene of the above leaf node. The schematic diagram is as follows:

The above is the process of deleting keywords from b-tree. Compared with the process of inserting and searching, the process of deleting keywords is more complicated. Therefore, it is best to go to this visual website to learn. (Open it on PC at www.cs.usfca.edu/~galles/vis… In this visual website, after removing the leaf node key word, take is the largest of the left subtree of the keyword filling, and this article explain said is take the smallest keywords in the right subtree populated, is essentially no difference, both are in order to meet the nature of the b-tree, and ensure that all the key words on each node is ordered. In the process of learning data structures and algorithms, there is no need to obsess over details, but the important thing is to learn thinking.

conclusion

Summarize the moment. This paper mainly explains the b-tree related properties, combined with the schematic diagram of the detailed introduction of the process of insertion, search, delete. In this example, I deliberately did not add duplicate data to the B-tree. What if I insert duplicate data into the B-tree? If there’s a duplicate number, all we have to do at the time of insertion is decide whether to put the duplicate number in the left subtree or the right subtree, depending on which side we want to put it on. When searching for data, you can’t stop searching immediately after finding a data that meets the requirements. You also need to continue searching until the first data that does not meet the requirements. Similarly, when deleting, you also need to delete all data.

Space is limited, so the next blog post will cover B+Tree and database indexes.

Redo log — How MySQL data does not lose when it is down

Index Data Structure of B-tree and B+Tree

The tree

B-Tree

1. Insert the

2. Look for

3. Delete

conclusion

related

Related Posts

Python Novice crawler 6: Sending messages using the SMS interface

Python real case, Pyecharts module, Python to achieve 5G data visualization

Overview of the Spring Framework and getting started