AVL

AVL (Balanced binary tree), which is also a binary search tree. It is characterized by no more than 1 difference between the left and right subtrees of each node. An ordinary binary search tree may degenerate into a linked list in a particular case, such as adding elements in order 1,2,3,4,5. At this point, the efficiency of the query degrades from O(logn) to O(n). To solve this particular situation, we need to use balanced binary tree to solve this problem.

The definition of AVL

The left subtree of each node of AVL is less than (greater than) this node, and the right subtree is greater than (less than) this node. The depth difference between the left and right subtrees of each node is less than 1. The difference between the depths of the two branches is called balanced silver.

As shown in figure:

The operation of the AVL

Add, delete, alter, search is similar to binary search trees, but with more consideration for node balance, if a string of numbers are inserted in the order of 3,4,5. Then the tree structure is reduced to a linked list. And then AVL will rotate the tree to balance it, so we know that the rotation will rotate in three places: add, delete, and modify. Rotation operations fall into the following four categories

LL single turn right

As shown in the figure, the left subtree of 8 has degenerated into a linked list, and the two nodes 5 and 8 are no longer balanced. At this time, we first find the unbalanced node 5 with the deepest depth and perform LL rotation operation on node 5. In this case, the structure of the figure on the right is obtained

RR single turn left

As shown in the figure, when the insertion sequence is 8, 3, 10, 13, 15, the structure of the tree becomes the one on the left. At this time, the 10 nodes and the 8 nodes are unbalanced. In order to maintain the balance of AVL, we need to perform RR rotation for the 10 nodes, as shown in the figure on the right

LR first left then right

As shown in figure. 5 and 8 nodes are unbalanced. In this case, balance the 5 nodes. First, rotate the RR of 5 left, and the left node of 7 becomes the right node of 5.

At this time, 7 and 8 are still unbalanced. Rotate 8 to the right, and the right node of 8 also becomes the left node of 8, as shown in the figure.

RL first right then left

As shown in the figure on the left, node 8 and 13 are unbalanced. The figure on the right is obtained by right rotation of node 13

At this point, 8 and 10 are unbalanced. RR is rotated left for 8 nodes to obtain the figure on the right.

That’s how you keep your balance.

insert

The insertion operation is similar to balancing a binary tree, except that the tree is kept balanced after the insertion for the above four cases.

delete

Deletion is similar to balancing a binary tree in that it is possible to rotate the tree to balance when the child node is moved to the deleted node.

The average time complexity of AVL is O(logn). The advantage of AVL over binary search tree is that it can balance the tree structure without degenerating into a linked list

So how do you tell if a node is balanced? Here’s a look at the structure of each node in AVL

private class Node{ public K key; public V value; public Node left, right; public int height; // Current height of node}Copy the code

To judge whether a node is balanced is to calculate the balance factor of a node

    private int getBalanceFactor(Node node){
        if(node == null)
            return0; // Return the balance factor if the result is less than -1. A result greater than 1 indicates left-leaning; Results between -1 and 1 indicate node equilibriumreturn node.left.height - node.right.height;
    }
Copy the code

In general add and remove is more of a consideration than binary search tree is add and remove functionality

2-3 tree

A 2-3 tree is an absolutely balanced tree. Its nodes can have 1 or 2 elements. As shown in the picture, it is a 2-3 tree:

2-3 In the tree, 2 represents two children per node and 3 represents three children per node.

2-3 tree operations

insert

Here’s an example to see how a 2-3 tree can achieve absolute balance. For example, now we want to add the seven elements 1,2,3,4,5,6,7, as shown in the figure

As shown in the figure, the following step by step analysis:

  1. Insert 1 to determine that there is no root node. Encapsulate 1 as the node and set it as the root node
  2. Insert 2. Since node 1 has no children and only has 1 node, 2 is directly added to node 1
  3. Insert 3, like 2, into the root node, which now has 3 elements and needs to change to step 4. It can be understood as extracting the intermediate elements of 1, 2, and 3 into the root node, that is, taking 2 out, 1 as the left child and 3 as the right child
  4. Insert 4,4 is larger than 2, and increase to node 3
  5. Insert 5,5 is larger than 2, and add it to node 3 and 4. Then node 3 and 4 become nodes 3,4 and 5, and extract the middle element as parent node according to the third step. The 4 extracted from node 4 goes back to parent node 2, and node 2 has only one element, so 4 is added to node 2
  6. Insert 6, 6, 2 and 4 greater than the root node, go to the far right, 5 has no children and only one element, add 6 to 5 nodes
  7. Insert 7, at which point 5, 6, and 7 extract 6 from the parent of 6, and the parent of 6 (the root) becomes 2, 4, and 6. Two, four, and six pull out the four to get the final look

In general, the insertion method is similar to binary search trees. However, each node can have 1 or 2 elements. When the number of node elements is 3, it is divided into 3 nodes and merged upward until the merge is complete.

To find the

The search of a 2-3 tree is similar to a binary search tree, but since a node may have 2 elements, the two elements need to be compared. Go to the left, middle, and right children of the node to continue the comparison

delete

2-3 The deletion of trees is a little more complicated. Deletion can be divided into two cases, namely, deletion of leaf nodes and non-leaf nodes.

I’m just talking about the theoretical case, not the code implementation, but actually the code implementation gets complicated just because there are more things to think about.

Delete leaf nodes
  • If the current node contains three nodes, delete the node directly
  • If the current node has two nodes, delete the node and check
    • If the parent is 2 nodes, judge the sibling node
      • A sibling is a 3-node node. Move the sibling to the parent node, and then move another element of the parent node to the current node
      • The sibling node is a 2-node node. Move the middle order traversal of the sibling node to directly drive the sibling node to make the sibling node become a 3-node node, and then delete the sibling node
    • The parent node is 3 nodes, split the parent node to make it 2 nodes, then merge the split key closest to the parent node with the middle child, and take the merged node as the current node
  • If the 2-3 tree is a full binary tree, delete the node, reduce the layer of the 2-3 tree, and merge the sibling node into the parent node. At the same time, merge all the sibling nodes of the parent node into the parent node. If the parent node becomes 4 nodes, the decomposition operation is performed
Delete non-leaf nodes

Overwrite the current node key with the direct successor node key under the sequence traversal, and then delete the successor node key used for overwriting

After understanding 2-3 trees, we can easily understand and implement red-black trees and B-trees

Red and black tree

Before we get into red-black trees, we need to know that red-black trees are not the only way to implement a 2-3 tree, although it’s easier to implement a 2-3 tree. The fact is that any tree that meets all five of these definitions is a red-black tree. Here’s what a red-black tree is:

  • Each node is red or black
  • The root node is black
  • Each leaf node is black
  • If a node is red, its child node is black
  • From any node to the leaf, the black nodes are the same

Each node of a red-black tree has two colors, red and black. As shown in the two figures below, which are two different implementations of red-black trees, we can see that when the last leaf node is added NIL, the leaf node is uniformly black. Figure 2 just does not add the last black leaf node

In fact, the second of the two diagrams above is a red-black tree implemented as a 2-3 tree

The relationship between a 2-3 tree and a red-black tree

As shown in the figure, we can see that the left element of the 3 nodes in the 2-3 tree can be changed into a new node, which is the red node in the red-black tree, and the red nodes are uniformly tilted to the left to get the red-black tree on the right. Such red-black tree is also called left-leaning red-black tree

Now that we understand the relationship between red black trees and 2-3 trees we can see how red black trees are implemented.

Red black tree operation

Node structure of red-black tree

private class Node{ public K key; // Sort by key public V value; public Node left, right; public boolean color; / / redtrue, the black forfalse, the default node is red}Copy the code

In general, for adding a node, the operation logic is the same as for the 2-3 tree, except that the left element of the 3 nodes in the 2-3 tree becomes the new node, which is red and left-leaning.

A red-black tree maintains an insert operation by rotating it left, rotating it right, and flipping the color, as shown below

Because we newly added by default when a node is red, we want to diplomats to satisfy the above five points in the definition of a red-black tree, first of all, we need to figure 2 as AVL tree form rotation is 3 form, and then wanted to AVL right into four state diagram, then balance but reverse colours (the parent node is red, the child node is black), Finally, turn the root node black.

Left rotation

// node x // / T1 x ---------> node T3 // / \ // T2 T3 T1 T2 private node leftRotate(node node){node x  = node.right; // Rotate node.right = x.left; x.left = node; x.color = node.color; node.color = RED;return x;
    }
Copy the code

Right rotation

// x T2 -------> y node // / \ // y T1 T1 T2 private node rightRotate(node node){node x = node.left; // Rotate node.left = x.right; x.right = node; x.color = node.color; node.color = RED;return x;
    }
Copy the code

Color flip

Private void flipColors(Node Node){node.color = RED; node.left.color = BLACK; node.right.color = BLACK; }Copy the code

Why do you need a red black tree with AVL?

Red-black trees do not pursue balance like AVL, which requires that the absolute value of the balance factor of each node must be less than or equal to 1. In this way, compared with AVL, there are fewer rotation operations in red-black trees, such as deleting and inserting nodes. AVL is to balance rotation (O(logn)) of all nodes from deleting and adding nodes to the root node, while red-black trees only need 3 times at most to balance O(1)(although the above red-black tree cannot achieve this, But implementations are possible), so red-black trees are more suitable for scenarios with lots of additions and deletions.

Therefore, the red-black tree is suitable for the scene with many additions and deletions, and the AVL is suitable for the scene with many lookups.

B tree

B trees and 2-3 trees work in the same way. B trees can also be 2-4 trees. 2-m trees have the following definition of B trees, if a B tree has M levels (layers) :

  • The root node has at least two child nodes
  • Each node has m-1 key (each element in the node is called a key) and is arranged in ascending order
  • Excluding the leaf node and the root node, all nodes have at least M/2 children

The figure below is a third-order B-tree:

B tree operations

The corresponding third-order tree can be viewed as a 2-3 tree, and the operation is similar

What’s a B tree good for?

B trees are mostly used on disks to search for disk addresses. Because disk will have a large amount of data, may have no way to once all the data will need to join into memory, so only one page loading disk, each disk page corresponds to a node, and for B tree, the tree B good to reduce the height of the tree, it will decrease The Times of IO query though a loaded into memory data more, But it’s definitely faster than AVL or red black trees.

B + tree

B+ trees are similar to B trees, but with a few more rules

  • The number of Pointers to subtrees of non-leaf nodes is the same as the number of keywords (the number of elements in the node)
  • Nonleaf node subtree pointer P[I], pointing to the subtree whose keyword value belongs to [K[I], K[I +1]) (b-tree is the open interval)
  • All leaf nodes have a chain pointer
  • All the keywords appear in the leaf node
  • Only leaf nodes have Data fields

As shown in figure

The main difference you can see is in the leaf node, which links all the leaf nodes into a linked list through an SQT, and only the leaf node has a data field (the data field is the disk address that the index points to).

B+ tree operations

The advantage of B+ tree and B tree operations is the efficiency of B+ tree search, so let’s look at the query

  • Each node in the B tree has a data field for its keywords, while the B+ tree has only indexes except leaf nodes. That is to say, the same disk page B+ tree can hold more elements.
  • B+ tree range query is more convenient, you can first find the lower limit of range, and then through the linked list of leaf nodes, until you find the upper limit. B tree can only find the lower limit, through the middle order traversal search, until it finds the upper limit.

The other operations are similar to 2-3 trees

Mysql’s Innodb engine uses B+ tree indexing

Why innoDB storage engine in mysql uses B+ tree as index

Why not use AVL or red black trees?

Assuming that a node of a B+ tree can have 100 keywords, a 3-level B tree can hold approximately 1,000,000 keywords (100+101*100+101*101*100). A red-black tree would need at least 20 layers to store that much. So using B-trees versus red-black trees and AVL can reduce IO operations

Why not use arrays or linked lists?

Linked list query is slow. It’s impossible to create contiguous space in an array.

Why not use a hash table?

Innodb can do range queries, hash tables can’t. LIKE, for example, is hard to implement with a hash table

Why not use B tree?

A B+ tree has only leaf nodes for Data, while the other nodes only store indexes, while a B tree has Data fields for each node. So a B+ tree of the same size contains more indexes than a B tree (because each B tree has a Data field).

In addition, the leaf nodes of B+ tree are connected through a linked list, so the interval query can be carried out quickly after finding the lower limit, which is faster than the sequential traversal in B tree

To sum up, mysql’s Innodb engine uses B+ tree indexing

Finally, the implementation of each of the previous data structures github.com/PatrickLee6…