preface

A new series, “Easy to Understand Data Structures and Algorithms by Looking at Pictures”, is launched, mainly using pictures to describe common data structures and algorithms, easy to read and understand. This series includes dozens of examples of heaps, queues, lists, trees, graphs, sorts, and more.

B tree

B tree is the balanced search tree, generally understood as the balanced multi-path search tree, also known as B- tree, B_ tree. It is a self-balanced tree data structure that can search, insert, and delete stored data in order (log n) time complexity. B trees are usually used in storage systems, such as databases and file systems.

B tree characteristics

  • A B-tree can define an M-value as a predetermined range, that is, an M-way (order) B-tree.
  • Each node has a maximum of M children.
  • Each node has at least CeiL (m/2) children, except for root and leaf nodes.
  • For the root node, the range of the number of subtrees is [2,m], and the range of the number of values in the node is [1,m-1].
  • For non-root nodes, the number of values within the node ranges from [Ceil (m/2)-1,m-1].
  • The root node (non-leaf node) has at least two children.
  • A non-leaf node with K children contains K-1 values.
  • All leaf nodes are in the same layer.
  • The values inside the nodes are arranged in ascending order.
  • Several values of the parent node are divided into multiple subtrees as separated values. The left subtree is less than the corresponding separated value, and the corresponding separated value is less than the right subtree.

Here is a fourth order B tree,

insert

Suppose we now build A fourth-order B-tree and start inserting “A” directly as the root node,

Insert “B”, greater than “A”, to the right,

Insert “C” in order to the end,

Continue to insert “D”, and the result is directly added as shown in the following figure. At this time, the storage capacity of nodes is exceeded. For fourth-order B-tree, each node stores a maximum of three values.

The splitting operation is to select the median value of the node to be split, which is “B” in this case, and then place the median value of “B” in the parent node. Since there is no parent node, a new parent node is created to store “B”. The values smaller than “B” are used as the left subtree, and those values larger than “B” are used as the right subtree.

Continue to insert “E”, “E” is greater than “B”, go to the right child node,

Compare “C” and “D”, greater than them, put them on the far right,

Insert “F”, “F” is greater than “B”, go to the right subtree,

“F” is compared with “C”, “D” and “E” respectively, greater than them, and placed on the far right, which triggers the split operation,

Select the median value “D” of the node to be split, and then place the median value “D” in the parent node. The “B” in the parent node is less than “D”, so it is placed to the right of “B”, and the values less than “D” are used as the left subtree, and those values greater than “D” are used as the right subtree.

Continue to insert “M”, the result is,

Insert “L ‘, greater than” B “, “D”, go right subtree,

“L” is greater than “E” and “F” is less than “M”, so I put it in a third position, and that triggers the split operation,

Select the median value “F” of the node to be split, and then put the median value “F” into the parent node. Both “B” and “D” in the parent node are less than “F”, so put it to the right, and those values less than “F” are used as the left subtree, and those values greater than “F” are used as the right subtree.

Insert “K”, the result is,

Insert “J”, greater than “B” “D” “F”, go to the right subtree,

“J” is less than “K”, “L” and “M”, so it is put in the first position, and then the split operation is triggered,

Select the median value “K” of the node to be split, and then put the median value “K” into the parent node. The “B”, “D” and “F” in the parent node are all less than “K”, so put them into the right, and the values less than “K” are used as the left subtree, and those values greater than “K” are used as the right subtree. The parent node also triggers the split operation,

Select the median value “D” of the node to be split, and place the median value “D” in the parent node. Since there is no parent node, create a new parent node to store “D”. The values smaller than “D” are used as the left subtree, and those values larger than “D” are used as the right subtree.

Insert “I”, greater than “D”, go right subtree,

The right subtree is not a leaf, and as I go down, “I” is greater than “F” and less than “K”, so on the second branch,

“I” is less than “J”, so I put it on the left,

Similarly, insert “H”, the result is as follows,

Insert “G” to the left subtree,

It branches in the middle,

Trigger the split operation,

Select the median value “H” of the node to be split, and then put the median value “H” in the parent node. “H” is greater than “F” in the parent node but less than “K”, so put it in the middle, and the values less than “H” are used as the left subtree, and those values greater than “H” are used as the right subtree.

In summary, the core of the insert operation is the split operation. The case without splitting is relatively simple and can be inserted directly; If the capacity of the node exceeds that of the parent node, which can be customized, you need to split the node. Note that the parent node may need to be split again.

To find the

B tree search is relatively simple, the search process is similar to binary search tree, starting from the root node search, according to the comparison value to find the corresponding branch, continue to search the sub-tree.

If I look for “I”, “I” is greater than “D”, go to the right subtree,

“I” is greater than “F” and “H” but less than “K” when compared with the internal value of the node respectively, and goes to the third branch,

Compare the values in each node to find “I”.

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

My 2017 article summary – Machine learning

My 2017 article summary – Java and Middleware

My 2017 article summary – Deep learning

My 2017 article summary — JDK source code article

My 2017 article summary – Natural Language Processing

My 2017 Article Round-up — Java Concurrent Article


Talk to me, ask me questions:

Welcome to: