Introduction to the

A B-tree is a balanced multipath lookup tree. Note that a B-tree is a B-tree, and the “-” is a hyphen, not a minus sign. Most self-balancing search trees, such as AVL and red-black trees, assume that all data is stored in main memory.

So why use b-trees (or why have b-trees at all)?

To explain this, let’s assume that we have hundreds of millions of bytes of data that cannot be stored in main memory. We can only read data in chunks from disk. Disk I/O is time-consuming compared to main memory access time, and the main purpose of b-tree is to reduce disk I/O.

Most operations to balance a tree (find, insert, delete, Max, min, and so on) require O(h) disk access operations, where H is the height of the tree. But for b-trees, the height of the tree is not going to be logn, where n is the number of nodes in the tree, but a height that we can control by adjusting the number of keys that the nodes in the b-tree contain. Keep the height of the B-tree a small value). In general, b-tree nodes contain the same number of keys as the disk block size, ranging from a few to thousands of keys. Because the height of b-trees is controllable (generally much less than LOGn), the disk access time of B-trees is greatly reduced compared to AVL and red-black trees.

We’ve talked about how red black trees are better than AVL trees, so let’s compare red black trees to B-trees and explain the first paragraph with an example.

example

Suppose we now have 838,8608 records. For a red-black tree, the height of the tree is h=log(838,8608)=23, which means it would take 23 disk I/O operations to find the leaf node. In b-trees, however, the situation is different. Assuming that each node can contain eight keys (of course, the real situation is not equal, some nodes may contain more keys than eight, some fewer), then the height of the whole tree will be at most 8log8 =7.8) layers. This means that it takes only eight disk accesses to find a key on a leaf node, which is where the B-tree comes in.

B tree properties

  1. All leaf nodes appear on the same layer and carry no information (they can be considered external nodes or failed nodes, but in fact they do not exist and the pointer to them is null).

  2. The number of keywords contained in each node has upper and lower bounds. These bounds are represented by a fixed integer t≥2 t\geq2 t≥2, called the minimum degree of the B-tree, where t depends on the size of the disk block:

    A. Each node except the root node must contain at least t−1 T-1 T −1 keywords. Therefore, every internal node except the root node has T children. If the tree is not empty, the root node has at least one keyword.

    B. Each node can contain at most 2 T −1 2T −1 keywords.

  3. A node containing x x x keywords has x+1 x+1 x+1 child;

  4. All the keywords in a node are arranged in ascending order, and all the keys of the child node between the two keywords k1, K 1,k k1 and K 2,k_2 k2 are within the range of (K 1, K 2) (k_1,k_2) (k1,k2).

  5. Unlike binary sort trees, b-tree searches start at the root and do multiple branch selection based on the child tree of the node. Binary sort trees do two-way branch selection and perform a disk I/O operation for each decision.

  6. Similar to other flat binary trees, the time complexity of search, insert and delete operations in B-trees is O(L O gn) O(logn) O(logn).

The figure above is a typical B-tree, where the minimum degree t=2 t=2 t=2, the root node contains at least one keyword P, every node other than the root node contains at least one t-1 = 1, and each node contains at most 2T-1 = 3 keywords. The root node containing three ones P has 1 + 1 = 2 children, and the node containing three keys (C, G, L) has four children. All the keywords in the same node are arranged in ascending order. For example, the internal nodes of node (D, E, F) are arranged in ascending order and are located between the keywords C and G in their parent node. All the leaves are empty

B- tree lookup

The search operation of a B-tree is very similar to that of a binary sort tree (BST), except that each node in a B-tree contains multiple keywords. Assuming that the keyword to be searched is K, we start at the root and recursively work down. For each accessed non-leaf node, if the node contains the keyword k to be searched, the node pointer is returned; Otherwise, we recurse to the appropriate child of the node (the child is preceded by the keyword greater than k). Returns null if a leaf node is reached and k is not found.

B- tree search operation demonstration

Let’s take finding the keyword F as an example.

Step 1: Access the root node P and find that the keyword F is less than P, then search the left child of node P.

Step 2: access the left child node [C, G, L] of node P. When a node contains multiple keywords, access is carried out sequentially. First, it is compared with keyword C and found to be larger than C. Compared with keyword G, the keyword F is smaller than G, indicating that the keyword to be searched is in the child between keyword C and keyword G.

Step 3: Access the subgeneration between keyword C and keyword G. The subgeneration node contains three keywords [D, E, F], perform sequential traversal, compare keyword D and F, and F is larger than D

Sequential access keyword E, F is larger than E:

Keyword F is searched in sequence. The search succeeds. Returns a pointer to the nodes [D, E, F].

Here we take a look at a definition of a node in a B-tree:

 int keys; // Store an array of keywords
 int t;  T-1 <= num <= 2t-1
 BTreeNode C; // An array of Pointers to the child node
 int n;  // Record the number of keywords in the current node
 bool leaf; // a flag for a leaf, true if it is a leaf, false otherwise
Copy the code

These are the most critical attributes of a node. The complete definition of the node in the B-tree is:

class BTreeNode 
{ 
 int keys; // Store an array of keywords
 int t;  T-1 <= num <= 2t-1
 BTreeNode C; // An array of Pointers to the child node
 int n;  // Record the number of keywords in the current node
 bool leaf; // a flag for a leaf, true if it is a leaf, false otherwise
public: 
 BTreeNode(int _t.bool _leaf);

 // 
 void traverse(a); 

 // Find a keyword
 BTreeNode search(int k); // If none is present, NULL is returned

// Set friends to access private members of the BTreeNode class
friend class BTree; 
}; 

/ / B - tree
class BTree 
{ 
 BTreeNode root; // A pointer to the b-root node
 int t; / / the minimum degree
public: 
 // initializes a tree as an empty tree.
 BTree(int _t) 
 { root = NULL; t = _t; } 

 // Perform an intermediate traversal
 void traverse(a) 
 { if(root ! =NULL) root->traverse(a); }// find a keyword k in the b-tree
 BTreeNode search(int k) 
 { return (root == NULL)? NULL : root->search(k); }};Copy the code

There may be some C++ basics involved, but you don’t have to worry about algorithms, just the most important property definitions of a b-tree node.

B- tree search operation implementation

// The implementation of the b-tree lookup operation
BTreeNode BTreeNode::search(int k) 
{ 
 // Find the first keyword that is greater than or equal to the keyword k
 int i = 0; 
 while (i < n && k > keys[i]) 
  i++; 

 If the first keyword found is equal to k, return the pointer to the node
 if (keys[i] == k) 
  return this; 

 // Return NULL if the key k is not found and the current node is a leaf node
 if (leaf == true) 
  return NULL; 

 // Recursively access the appropriate children
 return C[i]->search(k); 
}
Copy the code

Middle order traversal of a B-tree

Middle-order traversal of a B-tree is also similar to middle-order traversal of a binary tree. We start with the leftmost child, recursively print the leftmost child, and repeat the same process for the rest of the children and keywords. Finally, recursively print the child on the far right.

The middle-order traversal result of this graph is:

** Be sure to note that there should be 26 letters, but the letter I ** is missing, and we can insert it later when we look at the insert operation.

Sequence traversal implementation code

void BTreeNode::traverse() 
{ 
 // There are n keywords and n+1 children
 // Iterate over n keywords and the first n children
 int i; 
 for (i = 0; i < n; i++) 
 { 
  // If the current node is not a leaf node, before printing key[I],
  // start by traversing the subtree root C[I].
  if (leaf == false) 
   C[i]->traverse(); 
  cout << "" << keys[i]; 
 } 

 // Prints a subtree rooted in the last child
 if (leaf == false) 
  C[i]->traverse(); 
} 
Copy the code

B- tree insertion operations

A newly inserted keyword k is always inserted into the leaf node. Similar to the insertion operation in a binary sort tree, we start at the root, traverse down to the leaf, reach the leaf, and insert the keyword K into the corresponding leaf. Different from BST, we define a value range of the number of keywords that can be contained by a node through the minimum. Therefore, when inserting a keyword, we need to confirm whether the node exceeds the maximum number of keywords that can be contained by the node itself.

Does a node have room for the current node before inserting a keyword k?

We can do this using an operation called splitChild(), which splits the children of a node. In the figure below, node Y, the child of X, is split into two nodes, y and Z. The split operation moves a key I I I up, and splits the key I I I up into the node Y containing the keyword [G, H] and the node Z containing the keyword [J, K]. This process, also known as B-tree growth, is distinguished from the downward growth of BST.

To sum up, when inserting a new keyword K into the B-tree, we access from the root node to the leaf node. Before traversing a node, we first check whether the node is full, that is, it contains 2T-1 keywords. If the node is full, we split it and create a new space. The pseudocode for the insert operation is described as follows:

Insert operation pseudocode

  1. Initialize thexAs the root
  2. whenxIf it is not a leaf node, do the following:
    • findxThe next child node to be accessed fromy
    • ifyIf it is not full, the node will beyAs a newx
    • ifyIt’s full. Split ityThe nodexPointer to the nodeyTwo parts of. ifkyIf the middle keyword is small, theyThe first part as newx, otherwise willyThe second part as newxWhen willyAfter splitting, willyTo move a keyword to its parentxIn the middle.
  3. whenxIf it is a leaf node, the second step ends; And since we’ve looked at all the nodes ahead of time,xThere must be at least one extra keyword space for a simple insert.

In fact, b-tree insertion is an active insertion algorithm, because before we insert the new key, we split all the nodes that are already full, and the nice thing about splitting early is that we don’t have to go back and walk through the nodes twice. If we don’t split a full node first, but only when we insert a new keyword, we might end up traversing all the nodes again from the root, like splitting the leaves when we get to the leaves, Moving one of the keywords up causes the parent node to split (because moving up causes the parent node to exceed the number of keywords that can be stored). After the parent node is split, the new keyword continues to move up may cause the new parent node to split, resulting in a large number of backtracking operations. But with b-trees, which are active insertion algorithms, there is no cascade effect. Of course, the downside of this kind of active insertion is obvious, we may do a lot of unnecessary splitting.

Insert operation case



Let’s insert the keyword in the figure aboveITake as an example. Minimum degreet = 2, a node can be stored at most2t - 1 = 3A node.

Step 1: access the root node. It is found that the insert keyword I is less than P, but the root node is not full. It is not split and accesses its first child node directly.



Step 2: Access the nodePThe first child node of the[C, G, L], find the first child node is full, split the first child node into two:

Step 3: Combine the nodesIInsert into nodeLAmong the first left children, foundLHer first left child[H, J, K]If it is full, split it into two.

Step 4: Insert node I into the first child of node J. If it is found that the first child node [H] of L is not full and is a leaf node, insert I directly.

Insert operation code implementation:

The implementation of the B-tree insertion operation is a little more complicated, involving the movement of Pointers inside each node and the corresponding movement of Pointers in the parent node, but I’m sure you can make sense of it by looking at the diagram and the comments in the code.

// insert a new node into the b-tree k main function
void BTree::insert(int k) 
{ 
 // If the tree is empty
 if (root == NULL) 
 { 
  // Allocate space for the root node
  root = new BTreeNode(t, true); 
  root->keys[0] = k; // insert node k
  root->n = 1; // Update the number of keywords on the root node to 1
 } 
 else  
 { 
  // Grow the b-tree when the root is full
  if (root->n == 2*t- 1) 
  { 
   // Allocate space for the new root
   BTreeNode *s = new BTreeNode(t, false); 

   // The old root is the child of the new root
   s->C[0] = root; 

   // Split the old root into two and move a keyword up to the new root
   s->splitChild(0, root); 

   // The new root node has two children
   // Determine which child will have the newly inserted keyword
   int i = 0; 
   if (s->keys[0] < k) 
    i++; 
   s->C[i]->insertNonFull(k); 

   // The new root is changed to s
   root = s; 
  } 
  else // Call insertNonFull() when the root is not full
   root->insertNonFull(k); }}// insert the keyword k into an unfilled node
void BTreeNode::insertNonFull(int k) 
{ 
 // initialize I as the position of the last keyword in the node
 int i = n- 1; 

 // If the current node is a leaf
 if (leaf == true) 
 { 
  // The following loop does two things:
  // a) find the newly inserted keyword location and insert
  // b) move all values greater than the keyword k back one position
  while (i >= 0 && keys[i] > k) 
  { 
   keys[i+1] = keys[i]; 
   i--; 
  } 

  // Insert a new keyword. The number of keywords in the node is increased by 1
  keys[i+1] = k; 
  n = n+1; 
 } 
 else 
 { 
  // find the first child node where keys[I] > k
  while (i >= 0 && keys[i] > k) 
   i--; 

  // Check whether the child node is full
  if (C[i+1]->n == 2*t- 1) 
  { 
   // If it is full, the split operation is performed
   splitChild(i+1, C[i+1]); 

   // Split C[I];
   // C[I] split is called two child node
   // find the node where the new insert keyword should be inserted
   if (keys[i+1] < k) 
    i++; 
  } 
  C[i+1] - >insertNonFull(k); }}// if y is full, split y
void BTreeNode::splitChild(int i, BTreeNode *y) 
{ 
 // create a new node to store t-1 keywords
 BTreeNode *z = new BTreeNode(y->t, y->leaf); 
 z->n = t - 1; 

 // copy the last t-1 keyword of y into z
 for (int j = 0; j < t- 1; j++) 
  z->keys[j] = y->keys[j+t]; 

 // If y is not a leaf, copy the last t children of y into z
 if (y->leaf == false) 
 { 
  for (int j = 0; j < t; j++) 
   z->C[j] = y->C[j+t];  
 } 

 // Set the number of keywords in y to t-1
 // Because it is full, it is 2t-1, and node z contains t -1
 // A keyword needs to be moved up
 // so y contains the keyword 2t-1 - (t-1) -1
 y->n = t - 1; 

 // Allocate new space to pointer to current node,
 // The parent will have one more child because of the new keyword.
 for (int j = n; j >= i+1; j--) 
  C[j+1] = C[j]; 

 // The next child of the current node is set to z
 C[i+1] = z; 

 // Move back any keyword that is larger than the up-moved keyword in the parent node
 // Find the key position of the upnode
 for (int j = n- 1; j >= i; j--) 
  keys[j+1] = keys[j]; 

 // copy the middle keyword of y to its parent
 keys[i] = y->keys[t- 1]; 

 // The number of keywords in the current node is increased by one
 n = n + 1; 
} 
Copy the code

B- tree deletion operation

The deletion operation of b-tree is more complicated than the insertion operation. It is also very simple to delete the keywords in leaf nodes, but if the deletion is of internal nodes, the children of nodes have to be rearranged.

Similar to b-tree insertion, we must ensure that deletion does not violate b-tree characteristics. Just as the number of keywords per node in an insert cannot exceed 2T-1, the number of keywords per node in a delete operation must be at least T-1 (except the root node can contain fewer keywords than T-1).

The next step is to go through all the possible situations in the delete operation.

The initial B-tree is shown here, with the minimum degreet = 3Each node can contain a maximum of five keywords and at least two keywords (excluding the root node).

1. The keyword k to be deleted is in node X, and X is a leaf node. Delete keyword K

Delete keyword F from b-tree

2. The keyword k to be deleted is in node X, and X is an internal node

Case 1: If the first child node Y before the keyword K in node X has at least T keywords, find the precursor node of K in child node Y, delete the keyword K 0 K_0 k0 recursively, and replace the keyword K in node X with k0 k_0 k0

Delete the keyword G in the B-tree, the previous child node Y of G is [D, E, F], containing three keywords, satisfying case 1, the direct precursor of keyword G is key F, delete F, and then replace G with F.

Situation 2: If y contains less than T keywords, check the number of keywords in node Z, the child of keyword K in node X, if the number of keywords in node Z is at least T, Find k 0 K_0 K0 in z, then delete k 0 K_0 K0, and replace the key k with k 0 K_0 K0.

Delete keyword C in the B-tree, the number of keywords in y is 2, less than t = 3, the next child Z of keyword C in node [C, G, L] is [D, E, F], which contains 3 keywords, and the direct successor of keyword C is D, delete D, And then I’m going to replace C with D.

Case 3: If y and z both contain only t-1 keywords, merge keyword K and all keywords in Z into node Y, node X will lose keyword K and child node Z, y now contains 2T-1 keywords, Free the space of node Z and recursively remove the keyword k from node Y.

To illustrate this situation, we will use the following illustration.

Delete keyword C, node Y contains 2 keywords, node Z contains 2 keywords, both equal to t-1 = 2, merge keyword C and node Z into node Y:

At this point, node Y is a leaf node, and keyword C is directly deleted

3. If the keyword k is not currently in the internal node x, determine the root of the subtree that must contain kx.c(i)(if K is indeed in the B-tree). ifx.c(i)Only t-1 keyword must be processed in one of the following ways :(see here confused)

First we need to confirm what is the current internal node x and x.c(I), as shown below, P is now not the root, but the root of a subtree of the complete B-tree:

Case two: Ifx.c(i)x.c(i)If all of the neighboring brothers ofx.c(i)Merge with a sibling, move one of the key words of X to the newly merged node, make it the middle key of the node, and make the merged node the new X node.

Don’t be surprised why case two comes first and case one comes after, because it helps you understand.

In case 2, the icon above indicates the corresponding x and x.c(I). We take deleting keyword D as an example. At this time, the current internal node x does not contain keyword D, which is confirmed to be the third case. The following node of the subtree where the first child of node x resides is x.c(I), that is, [C, L]. Where node [C, L] and its adjacent siblings [T, W] contain only 2 nodes (t-1), then combine [C, L] with [T, W], and merge the only keyword P in node X into the new node. Then, the merged node is taken as the new X node, and the keyword D is recursively deleted. It is found that D is now in the leaf node Y, and it is directly deleted, which is 1. In the case. (Much clearer now)

Case 1:x.c(i)Contains only t-1 keywords andx.c(i)If a sibling of x contains at least T keywords, a keyword of x is moved down tox.c(i),x.c(i)A keyword in the adjacent left or right sibling of is moved up into x, and the corresponding child pointer in that sibling is moved tox.c(i)In thatx.c(i)Add an extra keyword. (Confused)

To get rid of the confusion, let’s continue with the deleted result of case 2 above:

Take node B in node [A, B] as an example. In the figure above, x.c(I) contains two keywords, namely, t-1 keywords; A sibling node [H, J, K] of x.c(I) contains three keywords (meeting the requirement of at least t keywords). Then, the keyword H in sibling nodes [H, J, K] is moved up to x, and the keyword C in x is moved down to x.c(I). Delete keyword B.

At this point, all major b-tree operations are complete, and the complete engineering code for b-tree lookup, traversal, insertion, and deletion is placed at the end of the article.

Comparison of binary tree and B-tree

B- tree is a multi – path balanced tree with ordered results. Unlike binary trees, where a node in a B-tree can have multiple children, a binary tree can only have two children. The height of the B-tree is (where M is the order of the b-tree, that is, the maximum number of keywords a node can contain, and N is the number of nodes). The height is automatically adjusted with each update. Keywords within nodes in a B-tree are arranged in ascending order from left to right. Inserting a node or keyword into a B-tree is also more complicated than a binary tree.

A binary tree is a typical ordinary tree. Unlike b-trees, nodes in binary trees can have up to two children. The topmost root of a binary tree contains only one left and right subtree. Similar to b-tree, middle-order traversal results are ordered, but the pre-order traversal results and post-order traversal results of binary tree can also be ordered, and the insertion and deletion operations of nodes in binary tree are simple.

N B-Tree Binary Tree
1 In a B-tree, one node can contain at mostMChild node In a binary tree, a node can contain at most two children
2 B- tree is an ordered tree whose middle order traversal results in order A binary tree is not a sort tree and can be sorted by traversal before, middle, and back
3 The height of the B-tree is L ogMN log_MN logMN The height of the binary tree is log2N log2N log2N
4 B-tree loads data from disk Binary trees load data from RAM
5 B-tree application to DBMS(code index, etc.) Binary trees are used in Huffman coding, code optimization, etc
6 Inserting a node or keyword into a B-tree is more complicated Binary tree insertion tree is simple

I wish you a happy study, the article may be a little longer, more patience to have a good look! Come on!!

Complete engineering code

Baidu cloud links: link: pan.baidu.com/s/1HecdeeFh… Extraction code: E3BB