A: background

A B-tree (or B-tree) is a self-balancing Tree that keeps data in order. This data structure enables the search, sequential access, insertion, and deletion of data to be performed in logarithmic time O(l O gn) “role=”presentation”> O(logn). Unlike other generalized binary search trees, it can have more than two children.

B-tree is optimized for reading and writing large chunks of data in the system, reducing the intermediate process of locating records, thus speeding up access. This data structure can be used to describe external storage, so it is often used to implement databases and file systems.

A B-tree has the following properties:

  1. All the leaves are in the same layer;
  2. Each B-tree has a Minimum Degree, called T (the value of t depends on the size of the disk block);
  3. Every node except the root node contains at least T-1 keys. The root node can contain only 1 key.
  4. Each node (including the root node) contains a maximum of 2T-1 keys;
  5. The number of children of a node is equal to the number of keys of the node + 1;
  6. The keys of each node are arranged in ascending order;
  7. For each key, all keys of the child to the left are smaller than it, and all keys of the child to the right are larger than it.

Here is a b-tree with a Minimum Degree of 3:

Two: algorithm process and analysis

First, take a look at the outline of the program:

struct Node { bool leaf; Int n; Int * keys; // Save keys Node ** siblings; // Save child pointer Node(int t, bool leaf) {this->leaf = leaf; this->n = 0; this->keys = new int[2 * t - 1]; this->siblings = new Node*[2 * t]{ 0 }; }}; class BTree { private: int t; // Minimum Degree Node * root; private: void destroy(Node * node); void split_child(Node * parent, int i, Node * child); void insert_non_full(Node * node, int key); bool find_real(Node * node, int key); void merge(Node * node, int i); void erase_real(Node * node, int key); public: BTree(int t); ~BTree(); void insert(int key); bool find(int key); void erase(int key); void print(); };Copy the code

Note that the default Minimum Degree = 3.

2.1 Insert Operations

Let’s take an example to illustrate exactly how insertion is done.

(1)

For an empty tree, just insert it.

(2)

Insert 20,30,40, and 50, at which point the number of keys reaches the upper limit 5 (2t-1 = 6-1 = 5).

(3)

Insert another 60, and the root will have more keys than the upper limit, so what we do is we split it in two by moving the middle key up, applying for a new node, and moving the last T-1 keys into the new node. See figure 4 on the left;

And then I plug in 60. See figure 4 on the right.

(4)

Insert 70 and 80. Now the number of keys in the rightmost child of the root node has reached the upper limit.

(5)

Insert 90, move the middle key (60) to the top, apply a new node to store the last T-1 keys, and insert 90.

The code for the insert operation is as follows:

Void BTree::split_child(Node * parent, int I, Node * child) {Node * z = new Node(t, child->leaf); z->n = t - 1; for (int j = 0; j < t - 1; j++) z->keys[j] = child->keys[j + t]; if (! child->leaf) { for (int j = 0; j < t; j++) z->siblings[j] = child->siblings[j + t]; } child->n = t - 1; for (int j = parent->n; j >= i + 1; j--) parent->siblings[j + 1] = parent->siblings[j]; parent->siblings[i + 1] = z; for (int j = parent->n - 1; j >= i; j--) parent->keys[j + 1] = parent->keys[j]; parent->keys[i] = child->keys[t - 1]; parent->n++; } void BTree::insert_non_full(Node * node, int key) { int i = node->n - 1; if (node->leaf) { while (i >= 0 && node->keys[i] > key) { node->keys[i + 1] = node->keys[i]; i--; } node->keys[i + 1] = key; node->n++; } else { while (i >= 0 && node->keys[i] > key) i--; if (node->siblings[i + 1]->n == 2 * t - 1) { split_child(node, i + 1, node->siblings[i + 1]); if (node->keys[i + 1] < key) i++; } insert_non_full(node->siblings[i + 1], key); } } void BTree::insert(int key) { if (! root) { root = new Node(t, true); root->keys[0] = key; root->n = 1; } else { if (root->n == 2 * t - 1) { Node * s = new Node(t, false); s->siblings[0] = root; split_child(s, 0, root); int i = 0; if (s->keys[0] < key) i++; insert_non_full(s->siblings[i], key); root = s; } else insert_non_full(root, key); }}Copy the code

2.2 Search Operations

The search is simple, just look at the code.

bool BTree::find_real(Node * node, int key)
{
    int i = 0;
    while (i < node->n && node->keys[i] < key)
        i++;

    if (node->keys[i] == key)
        return true;

    if (node->leaf)
        return false;

    return find_real(node->siblings[i], key);
}

bool BTree::find(int key)
{
    if (root)
        return find_real(root, key);

    return false;
}
Copy the code

2.3 Deleting a vm

Let’s say the value we want to delete is key, which exists in the current B-tree and is located at node T.

  • T is the leaf.
    • In this case, you can simply delete it.
  • T is not a leaf. There are three cases, we agree that the child to the left of the key is left, and the child to the right is right,
    • If the number of keys left is greater than t-1, find the “precursors” of the left key, replace the key with “precursors”, and continue to delete the “precursors”.
    • If the number of keys in the right key is greater than t-1, select the successor of the key from the right key.
    • If the number of keys left and right is equal to t-1, merge left, key and right into one node, and continue to delete keys from this node.

If t is a leaf node (not a root node) and the number of keys is exactly equal to t-1, then if we delete the key directly, it will definitely make the b-tree violate property 3: Every node except the root node contains at least t-1 keys. The root node can contain only 1 key. So how do you solve this problem?

Deletion is accompanied by a search, which starts at the root node and moves the pointer to the right or down to find the key’s location. We can determine in advance which node we want to drop (let’s say cur), and if cur has the same number of keys as T -1, we can fill it so that it has more keys than T -1. If there are keys greater than t-1 in cur’s left and right siblings, take a key from there. If they’re both equal to t minus 1, merge the left and right siblings with the key that’s sandwiched between them.

Ok, here are some pictures to illustrate the above steps and practices:

(1) In Figure 1, delete 2.

Starting from the root node A, the node to be dropped is B, and it is found that the number of keys of B is T-1, while the number of keys of its right sibling node C is only T-1. After the merger, see Figure 7 below.

The key of D is t-1, and the key of e is larger than t-1. Then we take a key from E: put 3 into D and 4 into B. At this point, the keys number of node B remains unchanged, D becomes 3, and E loses one, but it is still greater than or equal to t-1. After deleting 2, see Figure 8 below.

If the node to drop has keys equal to T-1, we populate it (either by taking one from its sibling, or simply by merging it) so that it has more keys than T-1. This is useful, and it must be done.

Why “must”?

Look at Figure 1. Assuming that we do not perform the fill operation on the way down, now if we delete 13 in node G, the problem occurs. The number of keys in node G will be less than t-1, which violates b-tree property 3. So how do you solve this problem? Take one of the siblings of H, right? No, h has t minus one keys. Merger? The so-called merge is to combine G, 15, and H into one node, but the number of keys of C is also t-1, so it doesn’t work either. Therefore, it is necessary to fill the node where the number of keys to drop is equal to T-1.

How does that make it “useful”?

After “delete 2” is observed, the change from Figure 1 to Figure 8 shows that the height of the whole B-tree decreases, which means the improvement of search efficiency. The analysis found that it was reduced because of the merge operation. This is the only way to reduce the height of the b-tree: the root node has one key, and the left and right children have t-1 keys, combined into a node with 2T-1 keys.

(2) In Figure 8, delete 4.

4 is in node B, and B is not a leaf. Observe the keys number of the left and right child nodes (D and E) of 4, and find that the keys number of the right child node E is larger than t-1. Therefore, find the successor of 4 in E (i.e. 5), replace 4 with 5, and then delete 5 in E. After deletion, see Figure 9 below.

(3) In Figure 9, delete 5.

5 is in node B, and B is not a leaf. Observe the keys number of the left and right child nodes (D and E) of 5, which are exactly equal to t-1, then merge the nodes D, 5 and E into one node, and delete 5 from this node. After deletion, see Figure 10 below.

Finally, take a look at the code implementation of the delete operation:

void BTree::merge(Node * node, int i) { Node * cur = node->siblings[i]; Node * next = node->siblings[i + 1]; cur->keys[t - 1] = node->keys[i]; for (int j = 0; j < next->n; j++) cur->keys[j + t] = next->keys[j]; if (! cur->leaf) { for (int j = 0; j <= next->n; j++) cur->siblings[j + t] = next->siblings[j]; } for (int j = i + 1; j < node->n; j++) node->keys[j - 1] = node->keys[j]; for (int j = i + 2; j <= node->n; j++) node->siblings[j - 1] = node->siblings[j]; cur->n += next->n + 1; node->n--; delete next; } void BTree::erase_real(Node * node, int key) { int i = 0; While (I < node->n && node->keys[I] < key) // find the first place where key is greater than or equal to I ++; If (I < node->n && node->keys[I] == key) {if (node->leaf) {for (int j = I + 1; j < node->n; j++) node->keys[j - 1] = node->keys[j]; node->n--; } else // If (node->siblings[I]->n > t-1) // If (node->siblings[I]->n > t-1) // if (node->siblings[I]->n > t-1) {Node * left = Node ->siblings[I]; while (! left->leaf) left = left->siblings[left->n - 1]; int precursor_key = left->keys[left->n - 1]; node->keys[i] = precursor_key; erase_real(node->siblings[i], precursor_key); } else if (node->siblings[I + 1]->n > t-1) // If (node->siblings[I + 1]->n > t-1) {Node * right = Node ->siblings[I + 1]; while (! right->leaf) right = right->siblings[0]; int successor_key = right->keys[0]; node->keys[i] = successor_key; erase_real(node->siblings[i + 1], successor_key); } else {merge(node, I); merge(node, I); erase_real(node->siblings[i], key); If (node->leaf) {if (node->leaf) {if (node->leaf) {if (node->leaf) {if (node->leaf); bool flag = (i == node->n) ? true : false; If (node->siblings[I]->n == t-1) // If (node->siblings[I]->n == t-1) = 0 && node->siblings[i-1]->n > t-1) // The number of keys is greater than t-1 {node * cur = node->siblings[I]; Node * pre = node->siblings[i - 1]; for (int j = cur->n - 1; j >= 0; j--) cur->keys[j + 1] = cur->keys[j]; if (! cur->leaf) { for (int j = cur->n; j >= 0; j--) cur->siblings[j + 1] = cur->siblings[j]; cur->siblings[0] = pre->siblings[pre->n]; } cur->keys[0] = node->keys[i - 1]; node->keys[i - 1] = pre->keys[pre->n - 1]; cur->n++; pre->n--; } else if (i ! = node->siblings[I + 1]-> siblings[I]; = node->siblings[I]; Node * next = node->siblings[i + 1]; for (int j = 1; j < next->n; j++) next->keys[j - 1] = next->keys[j]; if (! next->leaf) { for (int j = 1; j < next->n; j++) next->siblings[j - 1] = next->siblings[j]; cur->siblings[cur->n + 1] = next->siblings[0]; } cur->keys[cur->n] = node->keys[i]; node->keys[i] = next->keys[0]; cur->n++; next->n--; } else // If the keys of the left and right siblings are equal to t-1, then merge them {if (I! = node->n) merge(node, i); else merge(node, i - 1); } } if (flag && i > node->n) erase_real(node->siblings[i - 1], key); else erase_real(node->siblings[i], key); } } void BTree::erase(int key) { if (! root) return; erase_real(root, key); if (root->n == 0) { Node * t = root; if (root->leaf) root = nullptr; else root = root->siblings[0]; delete t; }}Copy the code

2.4 Printing Operations

The following implementation is the code for hierarchical traversal.

void BTree::print() { if (root) { queue<Node*> Q; Q.push(root); while (! Q.empty()) { Node * t = Q.front(); Q.pop(); for (int i = 0; i < t->n; i++) cout << t->keys[i] << " "; cout << endl; if (! t->leaf) { for (int i = 0; i < t->n + 1; i++) Q.push(t->siblings[i]); } } cout << endl; }}Copy the code

Three: complete code

/**
 *
 * author : 刘毅(Limer)
 * date   : 2017-09-28
 * mode   : C++
 */

#include <iostream>
#include <queue>

using namespace std;

struct Node
{
    bool leaf;        // 是否是叶子结点
    int n;            // keys 的数目
    int * keys;       // 保存 keys
    Node ** siblings; // 保存孩子指针

    Node(int t, bool leaf)
    {
        this->leaf = leaf;
        this->n = 0;
        this->keys = new int[2 * t - 1];
        this->siblings = new Node*[2 * t]{ 0 };
    }
};

class BTree
{
private:
    int t; // Minimum Degree
    Node * root;
private:
    void destroy(Node * node);
    void split_child(Node * parent, int i, Node * child);
    void insert_non_full(Node * node, int key);
    bool find_real(Node * node, int key);
    void merge(Node * node, int i);
    void erase_real(Node * node, int key);
public:
    BTree(int t);
    ~BTree();
    void insert(int key);
    bool find(int key);
    void erase(int key);
    void print();
};

int main()
{
    BTree btree(3);

    // test "insert"
    btree.insert(1);
    btree.insert(3);
    btree.insert(7);
    btree.insert(10);
    btree.insert(11);
    btree.insert(13);
    btree.insert(14);
    btree.insert(15);
    btree.insert(18);
    btree.insert(16);
    btree.insert(19);
    btree.insert(24);
    btree.insert(25);
    btree.insert(26);
    btree.insert(21);
    btree.insert(4);
    btree.insert(5);
    btree.insert(20);
    btree.insert(22);
    btree.insert(2);
    btree.insert(17);
    btree.insert(12);
    btree.insert(6);
    btree.print();

    // test "find"
    cout << ((btree.find(0) == true) ? 1 : -1) << endl;
    cout << ((btree.find(1) == true) ? 1 : -1) << endl;
    cout << ((btree.find(20) == true) ? 1 : -1) << endl << endl;

    // test "erase"
    btree.erase(6);
    btree.print();
    btree.erase(21);
    btree.print();
    btree.erase(20);
    btree.print();
    btree.erase(19);
    btree.print();
    btree.erase(22);
    btree.print();

    return 0;
}

void BTree::destroy(Node * node)
{
    if (node->siblings[0])
    {
        for (int i = 0; i <= node->n; i++)
            destroy(node->siblings[i]);
    }

    delete node;
}

/* 一分为二 */
void BTree::split_child(Node * parent, int i, Node * child)
{
    Node * z = new Node(t, child->leaf);
    z->n = t - 1;

    for (int j = 0; j < t - 1; j++)
        z->keys[j] = child->keys[j + t];

    if (!child->leaf)
    {
        for (int j = 0; j < t; j++)
            z->siblings[j] = child->siblings[j + t];
    }

    child->n = t - 1;

    for (int j = parent->n; j >= i + 1; j--)
        parent->siblings[j + 1] = parent->siblings[j];

    parent->siblings[i + 1] = z;

    for (int j = parent->n - 1; j >= i; j--)
        parent->keys[j + 1] = parent->keys[j];

    parent->keys[i] = child->keys[t - 1];

    parent->n++;
}

void BTree::insert_non_full(Node * node, int key)
{
    int i = node->n - 1;

    if (node->leaf)
    {
        while (i >= 0 && node->keys[i] > key)
        {
            node->keys[i + 1] = node->keys[i];
            i--;
        }

        node->keys[i + 1] = key;
        node->n++;
    }
    else
    {
        while (i >= 0 && node->keys[i] > key)
            i--;

        if (node->siblings[i + 1]->n == 2 * t - 1)
        {
            split_child(node, i + 1, node->siblings[i + 1]);
            if (node->keys[i + 1] < key)
                i++;
        }

        insert_non_full(node->siblings[i + 1], key);
    }
}

bool BTree::find_real(Node * node, int key)
{
    int i = 0;
    while (i < node->n && node->keys[i] < key)
        i++;

    if (node->keys[i] == key)
        return true;

    if (node->leaf)
        return false;

    return find_real(node->siblings[i], key);
}

void BTree::merge(Node * node, int i)
{
    Node * cur = node->siblings[i];
    Node * next = node->siblings[i + 1];

    cur->keys[t - 1] = node->keys[i];

    for (int j = 0; j < next->n; j++)
        cur->keys[j + t] = next->keys[j];

    if (!cur->leaf)
    {
        for (int j = 0; j <= next->n; j++)
            cur->siblings[j + t] = next->siblings[j];
    }

    for (int j = i + 1; j < node->n; j++)
        node->keys[j - 1] = node->keys[j];

    for (int j = i + 2; j <= node->n; j++)
        node->siblings[j - 1] = node->siblings[j];

    cur->n += next->n + 1;
    node->n--;

    delete next;
}

void BTree::erase_real(Node * node, int key)
{
    int i = 0;
    while (i < node->n && node->keys[i] < key) // 找到第一个大于等于 key 的位置
        i++;

    if (i < node->n && node->keys[i] == key) // 如果在当前结点找到 key
    {
        if (node->leaf) // 当前结点是叶子的话,直接删除
        {
            for (int j = i + 1; j < node->n; j++)
                node->keys[j - 1] = node->keys[j];

            node->n--;
        }
        else // 当前结点不是叶子的话,分为三种情况
        {
            if (node->siblings[i]->n > t - 1) // 如果 key 的左边那个孩子结点 keys 数大于 t-1 ,就用 key 的前驱替换 key,问题转化为删除前驱
            {
                Node * left = node->siblings[i];

                while (!left->leaf)
                    left = left->siblings[left->n - 1];

                int precursor_key = left->keys[left->n - 1];
                node->keys[i] = precursor_key;

                erase_real(node->siblings[i], precursor_key);
            }
            else if (node->siblings[i + 1]->n > t - 1) // 如果 key 的右边那个孩子结点 keys 数大于 t-1 ,就用 key 的后继替换 key,问题转化为删除后继
            {
                Node * right = node->siblings[i + 1];

                while (!right->leaf)
                    right = right->siblings[0];

                int successor_key = right->keys[0];
                node->keys[i] = successor_key;

                erase_real(node->siblings[i + 1], successor_key);
            }
            else // 如果左右两个孩子结点 keys 都等于 t-1,那就进行合并操作,再删除
            {
                merge(node, i);
                erase_real(node->siblings[i], key);
            }
        }
    }
    else // 如果未在当前结点找到 key
    {
        if (node->leaf) // 是叶子的话,说明根本不存在该 key
            return;

        bool flag = (i == node->n) ? true : false;

        if (node->siblings[i]->n == t - 1) // 每次下降的时候,都要检查下降的那个结点的 keys 数是不是等于下限 t-1,如果是就要做填充处理,分为三种情况
        {
            if (i != 0 && node->siblings[i - 1]->n > t - 1) // 左兄弟结点 keys 数大于 t-1
            {
                Node * cur = node->siblings[i];
                Node * pre = node->siblings[i - 1];

                for (int j = cur->n - 1; j >= 0; j--)
                    cur->keys[j + 1] = cur->keys[j];

                if (!cur->leaf)
                {
                    for (int j = cur->n; j >= 0; j--)
                        cur->siblings[j + 1] = cur->siblings[j];

                    cur->siblings[0] = pre->siblings[pre->n];
                }

                cur->keys[0] = node->keys[i - 1];
                node->keys[i - 1] = pre->keys[pre->n - 1];
                cur->n++;
                pre->n--;
            }
            else if (i != node->n && node->siblings[i + 1]->n > t - 1) // 右兄弟结点 keys 数大于 t-1
            {
                Node * cur = node->siblings[i];
                Node * next = node->siblings[i + 1];

                for (int j = 1; j < next->n; j++)
                    next->keys[j - 1] = next->keys[j];

                if (!next->leaf)
                {
                    for (int j = 1; j < next->n; j++)
                        next->siblings[j - 1] = next->siblings[j];

                    cur->siblings[cur->n + 1] = next->siblings[0];
                }

                cur->keys[cur->n] = node->keys[i];
                node->keys[i] = next->keys[0];
                cur->n++;
                next->n--;
            }
            else // 如果左右兄弟结点 keys 数都等于 t-1,则对它们进行合并
            {
                if (i != node->n)
                    merge(node, i);
                else
                    merge(node, i - 1);
            }
        }

        if (flag && i > node->n)
            erase_real(node->siblings[i - 1], key);
        else
            erase_real(node->siblings[i], key);
    }
}

BTree::BTree(int t)
{
    this->t = t;
    this->root = nullptr;
}

BTree::~BTree()
{
    if (root)
        destroy(root);
}

void BTree::insert(int key)
{
    if (!root)
    {
        root = new Node(t, true);
        root->keys[0] = key;
        root->n = 1;
    }
    else
    {
        if (root->n == 2 * t - 1)
        {
            Node * s = new Node(t, false);
            s->siblings[0] = root;
            split_child(s, 0, root);

            int i = 0;
            if (s->keys[0] < key)
                i++;

            insert_non_full(s->siblings[i], key);
            root = s;
        }
        else
            insert_non_full(root, key);
    }
}

bool BTree::find(int key)
{
    if (root)
        return find_real(root, key);

    return false;
}

void BTree::erase(int key)
{
    if (!root)
        return;

    erase_real(root, key);

    if (root->n == 0)
    {
        Node * t = root;

        if (root->leaf)
            root = nullptr;
        else
            root = root->siblings[0];

        delete t;
    }
}

void BTree::print()
{
    if (root)
    {
        queue<Node*> Q;
        Q.push(root);

        while (!Q.empty())
        {
            Node * t = Q.front();
            Q.pop();

            for (int i = 0; i < t->n; i++)
                cout << t->keys[i] << " ";
            cout << endl;

            if (!t->leaf)
            {
                for (int i = 0; i < t->n + 1; i++)
                    Q.push(t->siblings[i]);
            }
        }

        cout << endl;
    }
}
Copy the code

The operation screenshot is as follows:

Iv. References

  • Wikipedia. B tree.
  • GeeksforGeeks. B-Tree | Set 1 (Introduction).
  • GeeksforGeeks. B-Tree | Set 2 (Insert).
  • GeeksforGeeks. B-Tree | Set 3 (Delete).