GitHub source code sharing

Wechat search: code nong StayUp

Home address: goZhuyinglong.github. IO

Source: github.com/gozhuyinglo…

1. Introduction

So far, we have introduced binary Lookup trees and AVL trees, always assuming that you can store the entire data structure in memory. However, if there is too much data to fit in memory, this means that the data must be put on disk, and obviously these data structures are no longer suitable.

The problem is that disk I/O is nowhere near as fast as memory access, but to find an element in a tree, you have to work your way down from the root node, and each search is an I/O operation. To improve performance, you must reduce the number of lookups.

Reducing the height of the tree and increasing the number of elements per node is an effective solution. One way to implement this idea is to use b-trees.

2. The term

There are some terms that will be used to introduce B trees, but here’s a look:

  • Root node: A node that has no parent is called the root node
  • Leaf node: A node with no children is called a leaf node
  • Internal nodes: Nodes other than root and leaf nodes are called internal nodes. They have both parent and child nodes.
  • Keys: The storage elements in the B-tree are keys, which are Pointers to data records. The value of the key is used to store the real data record. You can have multiple keys in a node.
  • Order: The order of the B-tree is the maximum number of child nodes, which is 1 larger than the number of keys. We generally call a B-tree a b-tree of order M, so that the b-tree has at most M child nodes and at most M-1 keys in the nodes.

For more terminology, please see our previously shared tree!

3. B Tree

Binary Tree B Tree is not the same as Binary Tree, you can translate it as Balance Tree, or Bayer Tree.

B tree is a self-balancing tree that can keep data in order. This data structure allows data lookup, sequential access, insertion, and deletion to occur in logarithmic time.

Different from AVL trees, B trees can have more than two child nodes, and each node can have multiple key values. These attributes reduce the intermediate process of locating records and speed up access. B trees are more suitable for reading and writing relatively large data block storage systems, such as disks. This data structure is often used in database and file system implementations.

For an m-order B-tree, it has the following properties:

  • Each node has at mostMChild node; At least one internal node⌉ ⌈ M / 2Child node (⌈x⌉ is rounded up symbol); If the root node is not a leaf node, it has at least two child nodes.
  • withNNon-leaf nodes of child nodes are ownedN-1A key.
  • All leaf nodes must be on the same layer.

Note: The figure below shows two drawing methods of a 5th order B-tree. The most accurate drawing method should be the left B-tree in the figure. However, for convenience, the key empty position in the node is usually omitted, as shown in the right B-tree. Just because the most keys on the right are 3 doesn’t mean that this is a b-tree of order 4, and the order of the B-tree is predefined.

4. B tree operations

During operations on b-trees, b-tree features, such as the minimum number of minor nodes and the minimum number of keys per node, may be violated. To maintain these properties of b-trees, trees may split or merge.

We will describe the search, insert, and delete operations as a fifth-order B-tree. First, let’s take a look at the characteristics of the fifth-order B-tree:

  • An internal node has at least three children (⌈5/2⌉ = 3) and a maximum of five children
  • Each node has at least 2 keys (3-1=2) and a maximum of 4 keys

4.1 the search

The search of a B tree is similar to a binary search tree, traversing the tree recursively from top to bottom, starting at the root node. At each layer of nodes, binary lookup is used to match the target key, or the subtree is determined by the range of the key.

4.2 insert

The insertion of new elements takes place on the leaf node. All inserts start at the root node, search the tree and find the node where the element should be inserted. To insert a new element into the node, do the following:

  • If the number of elements on the node is not full, a new element is inserted into the node, keeping the order of the elements in the node.
  • If the node is full, split the node evenly into two nodes:
    • A median is drawn from the element and the new element in the node
    • Elements less than the median are placed on the left node, and elements greater than the median are placed on the right node, with the median as the separator value.
    • The split value is inserted into the parent (increasing the height of the tree), which may cause the parent to split, which may cause its parent to split, and so on. If the split goes all the way up to the root node, a new root node is created with a split value and two child nodes. (This is why the root node does not have a minimum number of children as the inner node does.)

The figure below is a 5-order B-tree. We observed the node splitting process by inserting 1 to 17 in sequence.

4.3 delete

The deletion of B tree is much more complicated and can be divided into the following situations:

  • Delete an element from a leaf node (1) Search for the element to be deleted (2) Delete it if it is on the leaf node (3) If a drop overflow occurs (the number of keys is less than the minimum), borrow the element from its sibling node. Move the element of the parent node down to the current node and the element of the sibling node up to the parent node (if the left node, move the largest element up; If the right node, move up the smallest element) (4) If the sibling node also reaches the lower limit, merge the sibling node and the split key.

  • Delete the element in the internal node (1) The element in the internal node is the partition value of its left and right child nodes. A new separator should be selected from the maximum element of the left child node or the minimum element of the right child node. The selected separator is removed from the atomic node, replacing the deleted element with the new separator value. (2) In the previous step, if the number of elements of the left and right child nodes reaches the lower limit, the left and right child nodes are merged. (3) If the number of node elements is less than the lower limit after deleting elements, continue merging.

The figure below is a 5-order B-tree. We observe the deletion process by deleting keys 15, 14, 17 and 5 (basically covering all cases).

5. Code implementation

This code realizes B tree search, insert, delete operations, see the details below.

5.1 the key value class

This class is used to store key-value data in each node of the B-tree.

  • Key: A key in a node that points to a data record.
  • Value: indicates the data record pointed to by the key.

The Comparable interface is implemented because the keys are stored sequentially in the node

class Entry implements Comparable<Entry> {

    final int key;
    String value;

    @Override
    public int compareTo(Entry o) {
        returnInteger.compare(key, o.getKey()); }}Copy the code

5.2 node class

Node class is each node in the B tree, which mainly includes:

  • Entrys: for all keys in this node, stored using the List type
  • ChildNodes: indicates all childNodes of this node. The storage type is List
  • ParentNode: indicates the parentNode of the node.

Since childNodes needs to be sorted, the class also implements the Comparable interface.

One thing to understand is that the left and right children of each key in the current node can be determined by the subscript of the List collection. For example, the left and right childNodes of entrys[0] are childNodes[0] and childNodes[1]. The left and right childNodes of entrys[1] are childNodes[1], childNodes[2], and so on.

class Node implements Comparable<Node> {

    private final List<Entry> entrys; // The key-value pair of the current node
    private final List<Node> childNodes; // Use an array to store child nodes
    private Node parentNode; / / the parent node

    public Node(a) {
        entrys = new ArrayList<>();
        childNodes = new ArrayList<>();
    }
   
    public Node add(Entry entry) {
        entrys.add(entry);
        Collections.sort(entrys);
        return this;
    }

    public Node addChild(Node node) {
        childNodes.add(node);
        Collections.sort(childNodes);
        return this;
    }

    @Override
    public int compareTo(Node o) {
        return Integer.compare(entrys.get(0).getKey(), o.getEntrys().get(0).getKey()); }}Copy the code

5.3 B tree class

The implementation of B-tree class is relatively complex, and its main attributes include:

  • M: is the order of B tree
  • Min: is the minimum value of elements in B tree, which can be calculated by order
  • Root: indicates the root node of the tree
class BTree {

    private final int m; // The order of the B tree
    private final int min;// The minimum value of the element
    private Node root; / / the roots

    public BTree(int m) {
        this.m = m;
        this.min = (int) Math.ceil(m / 2.0) - 1;
    }

    public Node getRoot(a) {
        returnroot; }}Copy the code

5.4 the search

B tree search implementation is relatively simple, in the BTree class to add the following two methods.

The binarySearch method is implemented through the binarySearch method in the Collections class. If you are interested, look at the source code. We’ll talk about binary search later, but I won’t go into that.

    / / search
    public Entry searchEntry(int key) {
        return searchEntry(root, key);
    }

    // search - recursive
    private Entry searchEntry(Node node, int key) {
        if (node == null) {
            return null;
        }
        // Use binary to find the legal bit subscript
        int index = Collections.binarySearch(node.getEntrys(), new Entry(key, null));
        if (index >= 0) {
            return node.getEntrys().get(index);
        } else {
            if (node.getChildNodes().size() == 0) {
                return null;
            }
            return searchEntry(node.getChildNodes().get(-index - 1), key); }}Copy the code

5.5 insert

Insert B tree is also easy to implement, add the following three methods to the BTree class.

Important is the split method, if after adding an element, the element value in its node exceeds the limit value, then the split operation, look at the code in detail.


    // Add elements
    public void add(Entry entry) {
        // If root is empty, it can be created directly
        if (root == null) {
            Node node = new Node();
            node.add(entry);
            root = node;
            return;
        }
        add(root, entry);
    }

    // Add elements - recursion
    private void add(Node node, Entry entry) {

        // The current node is leaf node
        if (node.getChildNodes().size() == 0) {

            // If the current node element is not full, add the element directly
            if (node.getEntrys().size() < m - 1) {
                node.add(entry);
                return;
            }
            // If the current node element is full, split the current node
            node.getEntrys().add(entry);
            split(node);
            return;
        }

        // The current node is an internal node.
        // Use binary search to find the branch to insert
        int index = Collections.binarySearch(node.getEntrys(), entry);
        if (index < 0) {
            add(node.getChildNodes().get(-index - 1), entry); }}// Detach the current node
    private void split(Node node) {
        int mid = node.getEntrys().size() / 2;
        / / separated values
        Entry separateEntry = node.getEntrys().get(mid);
        // Separate the left node
        Node leftNode = new Node();
        leftNode.getEntrys().addAll(node.getEntrys().subList(0, mid));
        // Separate the right node
        Node rightNode = new Node();
        rightNode.getEntrys().addAll(node.getEntrys().subList(mid + 1, node.getEntrys().size()));
        // Separate the child node
        if (node.getChildNodes().size() > 0) {
            List<Node> leftChildNode = node.getChildNodes().subList(0, mid + 1);
            for (Node temp : leftChildNode) {
                temp.setParentNode(leftNode);
            }
            leftNode.getChildNodes().addAll(leftChildNode);

            List<Node> rightChildNode = node.getChildNodes().subList(mid + 1, node.getEntrys().size() + 1);
            for (Node temp : rightChildNode) {
                temp.setParentNode(rightNode);
            }
            rightNode.getChildNodes().addAll(rightChildNode);
        }

        // Current node is the root node
        if (node.parentNode == null) {
            Node newRoot = new Node();
            newRoot.add(separateEntry);
            root = newRoot;
            leftNode.parentNode = root;
            rightNode.parentNode = root;
            root.addChild(leftNode).addChild(rightNode);
        } else {
            node.parentNode.add(separateEntry);
            leftNode.parentNode = node.parentNode;
            rightNode.parentNode = node.parentNode;
            node.parentNode.addChild(leftNode).addChild(rightNode);
            node.parentNode.getChildNodes().remove(node);
            // If the parent overflows, continue splitting
            if (node.parentNode.getEntrys().size() > m - 1) { split(node.parentNode); }}}Copy the code

5.6 delete

B tree deletion is the most troublesome, there are too many cases. The nodes to be deleted are mainly internal nodes and leaf nodes. After each deletion, it is determined whether the number of elements in the current node exceeds the lower limit. If the lower limit is exceeded, judge according to the situation.

If a leaf node is deleted, the possible operations are left rotation, right rotation, and merge (merge current node, sibling node, and intermediate value). After the merge, the parent node exceeds the lower limit……

If an internal node is deleted, the possible operations are to merge the left and right siblings, merge the left and right siblings with the median, and borrow elements from the sibling. Similarly, the parent node exceeds the lower limit……

public void remove(int key) {
        // Query the node where the current element resides
        Node node = searchNode(key);
        if (node == null) {
            return;
        }

        // Delete a node
        int keyIndex = Collections.binarySearch(node.getEntrys(), new Entry(key, null));
        node.getEntrys().remove(keyIndex);
        // If the number of keys on the current node is smaller than the threshold, delete the node for post-deletion processing
        if(node.getEntrys().size() < min) { afterDeletingHandler(node, keyIndex); }}/** * Post-deletion processing: If the number of elements on the current node is less than the limit value, merge, rotate left, rotate right ** according to different scenarios@param node
         */
    private void afterDeletingHandler(Node node, int deletedKeyIndex) {

        // This node is an internal node
        if (node.getChildNodes().size() > 0) {
            // Gets the left and right child nodes of the deleted element
            Node leftChideNode = node.getChildNodes().get(deletedKeyIndex);
            Node rightChildNode = node.getChildNodes().get(deletedKeyIndex + 1);

            int leftEntrysSize = leftChideNode == null ? 0 : leftChideNode.entrys.size();
            int rightEntrysSize = rightChildNode == null ? 0 : rightChildNode.entrys.size();
            int maxSize = Math.max(leftEntrysSize, rightEntrysSize);

            // Borrow from left and right child nodes
            if (maxSize <= min) {
                // If the left and right child nodes reach the limit, merge the left and right child nodes
                merge(leftChideNode, rightChildNode);
                return;
            }
            // Borrow from the left child node
            Entry entry;
            if (maxSize == leftEntrysSize) {
                entry = leftChideNode.getEntrys().get(leftChideNode.getEntrys().size() - 1);
                leftChideNode.getEntrys().remove(entry);
            } else {
                // Borrow from the right child
                entry = rightChildNode.getEntrys().get(0);
                rightChildNode.getEntrys().remove(entry);
            }
            node.add(entry);
            return;
        }

        // This node is a leaf node
        Node parentNode = node.parentNode;
        // Subscript of the current node in the parent node
        int nodeIndex = parentNode.getChildNodes().indexOf(node);
        // Left sibling node
        Node leftNode = nodeIndex > 0 ? parentNode.getChildNodes().get(nodeIndex - 1) : null;
        // Right sibling node
        Node rightNode = nodeIndex == parentNode.getChildNodes().size() - 1 ? null : parentNode.getChildNodes().get(nodeIndex + 1);

        int leftEntrysSize = leftNode == null ? 0 : leftNode.entrys.size();
        int rightEntrysSize = rightNode == null ? 0 : rightNode.entrys.size();
        int maxSize = Math.max(leftEntrysSize, rightEntrysSize);

        // The number of elements of the left and right sibling nodes reaches the limit value
        if (maxSize <= min) {
            if(leftNode ! =null) {
                // merge with the left sibling
                merge(node, leftNode, nodeIndex - 1);
            } else if(rightNode ! =null) {
                // merge with the right sibling node
                merge(node, rightNode, nodeIndex);
            }
            return;
        }

        if (maxSize == leftEntrysSize) {
            // Borrow the left node --> rotate right
            rightRotate(node, nodeIndex, leftNode);
        } else {
            // Borrow from the right node --> rotate leftleftRotate(node, nodeIndex, rightNode); }}/** * Merges the current node with its siblings and intermediate values **@paramNode Current node *@paramBrotherNode brotherNode *@paramParentEntryIndex Intermediate value */
    private void merge(Node node, Node brotherNode, int parentEntryIndex) {
        // Create new nodes
        Node parentNode = node.getParentNode();
        Node newNode = new Node();
        newNode.getEntrys().addAll(node.getEntrys());
        newNode.getEntrys().addAll(brotherNode.getEntrys());
        newNode.add(parentNode.getEntrys().get(parentEntryIndex));
        // Delete the original node and relationship
        parentNode.getEntrys().remove(parentEntryIndex);
        parentNode.getChildNodes().remove(node);
        parentNode.getChildNodes().remove(brotherNode);
        if (node.getChildNodes().size() > 0) {
            for(Node childNode : node.getChildNodes()) { newNode.addChild(childNode); childNode.setParentNode(newNode); }}if (brotherNode.getChildNodes().size() > 0) {
            for(Node childNode : brotherNode.getChildNodes()) { newNode.addChild(childNode); childNode.setParentNode(newNode); }}// If the original node is the root node, the current node is the new root node
        if (parentNode.getEntrys().size() == 0) {
            root = newNode;
            return;
        } else {
            parentNode.addChild(newNode);
            newNode.setParentNode(parentNode);
        }


        // After merging, if the parent node has less than the limit value, merge again (in principle, merge with the minimum number of elements node)
        if (parentNode.getEntrys().size() < min) {
            Node grandfatherNode = parentNode.getParentNode();
            if (grandfatherNode == null) {
                return;
            }
            int nodeIndex = Collections.binarySearch(grandfatherNode.getChildNodes(), parentNode);
            // Left sibling node
            Node leftNode = nodeIndex > 0 ? grandfatherNode.getChildNodes().get(nodeIndex - 1) : null;
            // Right sibling node
            Node rightNode = nodeIndex >= grandfatherNode.getChildNodes().size() - 1 ? null : grandfatherNode.getChildNodes().get(nodeIndex + 1);
            int leftEntrysSize = leftNode == null ? 0 : leftNode.entrys.size();
            int rightEntrysSize = rightNode == null ? 0 : rightNode.entrys.size();
            int minSize = Math.min(leftEntrysSize, rightEntrysSize);
            if (minSize > 0) {
                if (leftEntrysSize == minSize) {
                    merge(parentNode, leftNode, nodeIndex - 1);
                } else{ merge(parentNode, rightNode, nodeIndex); }}else if (leftEntrysSize > 0) {
                merge(parentNode, leftNode, nodeIndex - 1);
            } else if (rightEntrysSize > 0) { merge(parentNode, rightNode, nodeIndex); }}}/** * merge two sibling nodes **@paramNode Current node *@paramBrotherNode brotherNode */
    private void merge(Node node, Node brotherNode) {
        Node parentNode = node.getParentNode();
        Node newNode = new Node();
        newNode.getEntrys().addAll(node.getEntrys());
        newNode.getEntrys().addAll(brotherNode.getEntrys());
        Collections.sort(newNode.getEntrys());

        newNode.setParentNode(parentNode);
        parentNode.getChildNodes().remove(node);
        parentNode.getChildNodes().remove(brotherNode);
        parentNode.addChild(newNode);
    }

    /** * Rotate left * (1) remove the intermediate element from the parent node and add it to the current node * (2) Remove the smallest element from the right sibling node and add it to the parent node **@paramNode Current node *@paramNodeIndex Intermediate index *@paramRightNode rightNode */
    private void leftRotate(Node node, int nodeIndex, Node rightNode) {
        Entry parentEntry = node.getParentNode().getEntrys().get(nodeIndex);
        node.add(parentEntry);
        node.getParentNode().getEntrys().remove(parentEntry);
        Entry rightEntry = rightNode.getEntrys().get(0);
        node.getParentNode().add(rightEntry);
        rightNode.getEntrys().remove(rightEntry);
    }

    /** * Rotate right * (1) remove the intermediate element from the parent node and add it to the current node * (2) Remove the largest element from the left sibling and add it to the parent node **@paramNode Current node *@paramNodeIndex Intermediate index *@paramLeftNode the leftNode */
    private void rightRotate(Node node, int nodeIndex, Node leftNode) {
        Entry parentEntry = node.getParentNode().getEntrys().get(nodeIndex - 1);
        node.add(parentEntry);
        node.getParentNode().getEntrys().remove(parentEntry);
        Entry leftEntry = leftNode.getEntrys().get(leftNode.getEntrys().size() - 1);
        node.getParentNode().add(leftEntry);
        leftNode.getEntrys().remove(leftEntry);
    }
Copy the code

6. Complete code

The complete code is too long, welcome to visit my GitHub to view, if it is helpful to you, please also give a Star!

Code address: github.com/gozhuyinglo…

Graphic production is not easy, your attention, praise will be my biggest power! Thank you ~ ~ 🌹 🌹 🌹