preface

So let’s conclude today, B tree, B minus tree, B plus tree, these three trees. For B-tree and B-tree, there are two ways of saying on the Internet. One is that B-tree is a binary search tree, and B-tree is a multi-way search tree. Another way to say it is that a B tree is a B tree, and a B tree is a B tree. After consulting the data, we come to the conclusion that the latter statement is correct.

The following is the definition of B tree in Baidu Encyclopedia:

The way to find a given keyword in a B-tree is to first take the root node, and the keyword K1 contained in the root node… Kn searches for the given keyword (sequential search or binary search). If the keyword equal to the given value is found, the search succeeds; Otherwise, it must be sure that the keyword to be searched is between Ki and Ki+1, and Pi is the pointer pointing to the child root node. In this case, the node pointed to by pointer Pi is taken to continue the search until it is found, or the search fails when pointer Pi is null.

Let’s look at the definition of B-tree in Baidu Encyclopedia:

In computer science, a B-tree is a self-balancing tree that keeps data in order. This data structure allows data lookup, sequential access, insertion, and deletion to occur in logarithmic time. A b-tree, in general, is a general binary search tree that can have more than two children. Unlike the self-balancing binary search tree, the B-tree is optimized for reading and writing large chunks of the system data. B trees speed up access by reducing the intermediate process of locating records. B trees are data structures that can be used to describe external storage. This data structure is often used in the implementation of databases and file systems.

In summary, B-tree is another name for B-tree, and the two refer to one tree.

Before introducing BTree, let’s introduce the familiar binary search tree.

  • If the left subtree of any node is not empty, then the value of all nodes in the left subtree is not greater than the value of its root
  • If the right subtree of any node is not empty, then the value of all nodes in the right subtree is not less than the value of its root.
  • The left and right subtrees of any node are also binary search trees.

The binary search tree will have an extreme case, as shown in the figure below on the right.

Obviously, the search performance on the right is linear, so using a binary search tree is as balanced as possible to keep the structure of the tree on the left.

The actual binary search tree is based on the original binary search tree plus the balance algorithm, that is, “balanced binary tree”.

BTree

BTree is a balanced multi-tree.

To understand the properties of btrees, let’s understand two concepts:

  • Degree: In a tree, the number of children of each node is called the degree of that node
  • Order: The maximum number of children of a node

A b-tree of order M is a balanced M-way search tree with the following properties:

  • The root node has at least two children
  • Chrysene M /2 gp – 1 <= j <= m-1;
  • Chrysene m/2 nodes <= k <= m;
  • All the leaves are at the same level

The following figure shows a BTree of order 3

To simulate the process of finding a single element 17:

  • According to the root node pointer, find the root disk block 1 of the file directory, and import the information in it to the memory [1 disk I/O].
  • At this point there are two files in memory, named 13 and 31, and three that store the data of the other disk page addresses. According to the algorithm, 13<17<31, so find the second pointer stored in the root node
  • According to the second pointer, locate the disk block 3 and import the information into memory [2 disk I/O].
    • At this point in memory, there is a file named 18, and two data store the other disk page address, according to the algorithm, 17<18, so find the first pointer to the node with element 18
  • According to the first pointer, locate the disk block 7 and import the information into memory [3 disk I/O].
  • At this time, there are two files in memory, named 14 and 17. We find file 17 according to the algorithm, and locate the disk address of the file memory

To simulate the process of finding elements 17-35 in the range:

  • First, look top down for the lower limit of the range, element 17
  • To element 18
  • To element 23
  • To element 31
  • To element 35, and the traversal is complete

Above described the BTree data structure query single element and range query element process.

The number of nodes generated by the same number of elements in a BTree is much smaller than the number of nodes in a binary tree. The difference in the number of nodes is equivalent to the number of disk I/ OS.

B+Tree

B+Tree is also a multipath search Tree, which is usually used in databases and file systems of operating systems. It is characterized by its ability to keep data stable and orderly, and its insertion and modification have relatively stable time complexity.

For a B+Tree of order M, it has the following properties:

  • Each node has at most M children
  • Every node except the root node must have at least [m/2] children, and the root node must have at least two children
  • A node with K children must have K keywords

It can be seen from the figure that compared with BTree, all data of B+Tree have leaf nodes, and the leaf nodes form a linked list.

To simulate the process of B+Tree searching for a single element 17:

Let’s look for element 17 again. It traverses the same way as BTree, but it is more efficient than BTree because B+Tree has no data elements, so the same disk page can hold more node elements. This means that the B+Tree structure is shorter and fatter than the BTree structure in the same situation, so there are fewer I/O queries.

To simulate the process of finding elements 17-35 in the B+Tree range

  • From top down, find element 17 at the lower limit of the range
  • Iterates through the list pointer to element 18
  • Iterating through the list pointer to element 23
  • Iterates through the list pointer to element 31
  • Iterates through the list pointer to element 35

The range search of B+Tree can be directly searched through the linked list composed of leaf nodes, and the efficiency is much higher than that of BTree.

BTree and B + Tree

  • With the same amount of data, B+Tree provides better query performance and fewer I/O operations than BTree
  • The search performance of BTree is unstable. In the best case, only the root node is searched, and in the worst case, the leaf node is found. The search for B+Tree will always find the leaf node, so each search is stable
  • In the range query, the query performance of BTree is much lower than that of B+Tree, because BTree does the in-order traversal of the Tree after finding the lower limit of the range. After finding the lower limit of the range, B+Tree directly finds the elements in the range through the linked list pointer of the leaf node.