This article is published under a SIGNATURE 4.0 International (CC BY 4.0) license.

404_Not_Found published at juejin.cn/post/704509…

preface

In some knowledge related to trees, we always hear about B tree and red-black tree. You must know that TreeMap and Hashmap in Java use red-black tree. In fact, in the implementation of IO multiplexing epoll, timer management in Ngnix, And front-end setTimeout both use the concept of red-black tree, so what are the advantages of red-black tree compared with other tree structures? Next, red-black trees will be extracted step by step from binary trees.

The directory structure of this article is as follows:

  • Prepare knowledge
  • Binary tree
  • Binary search tree
  • Balanced tree
  • AVL tree
  • The 2-3-4 trees
  • Red and black tree
  • conclusion
  • Reference documentation

Prepare knowledge

Before we look at red-black trees, let’s review some tree concepts.

  1. Precursor node: The precursor node of a node is the node with the largest value less than the median value of the node
  2. Successor node: The precursor node of a node is the smallest of all nodes whose value is greater than this node
  3. Degree: The number of subtrees of a node is the degree of the node
  4. Layer: Starting from the root node, the root node is the first layer, the children of the root are the second layer, and so on
  5. Height: from bottom to top
  6. Depth: from top to bottom
  7. Balance factor: the height (depth) difference between the left and right subtrees of a node is the balance factor of the node

Binary tree

A Binary tree is a tree with at most two branches per node (i.e., no node with a branching degree greater than 2).

traverse

Every data structure has traversal operation, of course, binary tree is no exception, binary tree traversal commonly has the following five types:

  • breadth-first
  • Depth first
  • The former sequence traversal
  • In the sequence traversal
  • After the sequence traversal

Breadth first (hierarchy traversal)

Breadth First Search algorithm (BFS), also translated as width First Search, or horizontal First Search, is a graphic Search algorithm. Simply put, BFS starts at the root node and traverses the tree nodes along the width of the tree. If all nodes are accessed, the algorithm aborts.

Implementation method (GIF demonstration) :

  1. The root node is first put into the queue.

  2. Extract the first node from the queue and verify that it is the target.

    • If the target is found, the search ends and the results are returned.
    • Otherwise, all of its immediate children that have not yet been verified are enqueued.
  3. If the queue is empty, the entire graph has been checked — that is, there is no object in the graph to search for. End the search and return “target not found”.

  4. Repeat Step 2.

Depth first

A depth-first-search algorithm (DFS) is an algorithm used to traverse or Search a tree or graph.

The algorithm searches the branches of the tree as deep as possible. When all edges of node V have been explored, the search goes back to the starting node of the edge where node V was found. This process continues until all nodes reachable from the source node have been discovered. If there are undiscovered nodes, one of them is selected as the source node and the process is repeated until all nodes have been accessed.

Implementation method:

  1. First place the root node on the stack.
  2. Remove the first node from the stack and verify that it is the target. If the target is found, the search ends and the results are returned. Otherwise, add one of its immediate children to the stack that has not yet been verified.
  3. Repeat Step 2.
  4. If there are no undetected direct child nodes. Add the upper-layer node to the stack. Repeat Step 2. (back)
  5. Repeat Step 4.
  6. If the stack is empty, the entire graph has been checked — that is, there is no object in the graph to search for. End search and return “target not found”.

Differences between BFS and DFS: Visualization

  • BFS breadth-first search: Implement FIFO using queues.

  • DFS depth-first search: use stack (recursive) to implement, FILO.

DFS can be further divided by the order in which the root node is accessed relative to the left and right child nodes. If the positions of the left and right nodes are fixed, then the root node is placed to the left of the left node, which is called pre-order. The root node is placed between the left node and the right node, which is called in-order. The root node is placed to the right of the right node and is called post-order.

The former sequence traversal

void pre_order_traversal(TreeNode *root) {
    // Do Something with root
    if(root->lchild ! =NULL)// If one of the subtrees is not empty, the subtrees are read
        pre_order_traversal(root->lchild);
    if(root->rchild ! =NULL)// The subtree on the other side does the same
        pre_order_traversal(root->rchild);
}
Copy the code

In the sequence traversal

void in_order_traversal(TreeNode *root) {
    if(root->lchild ! =NULL)// If one of the subtrees is not empty, the subtrees are read
        in_order_traversal(root->lchild);
    // Do Something with root
    if(root->rchild ! =NULL)// The subtree on the other side does the same
        in_order_traversal(root->rchild);
}
Copy the code

After the sequence traversal

void post_order_traversal(TreeNode *root) {
    if(root->lchild ! =NULL)// If one of the subtrees is not empty, the subtrees are read
        post_order_traversal(root->lchild);
    if(root->rchild ! =NULL)// The subtree on the other side does the same
        post_order_traversal(root->rchild);
    // Do Something with root
}
Copy the code

Binary search tree

A sorted Binary Tree is an empty Tree or a sorted Binary Tree that has the following properties:

  1. If the left subtree of any node is not empty, the values of all nodes in the left subtree are less than the values of its root node.
  2. If the right subtree of any node is not empty, the values of all nodes in the right subtree are greater than the values of its root node.
  3. The left and right subtrees of any node are binary search trees respectively.

Compared with other data structures, binary search tree has the advantage of low time complexity of search and insert. Is O (log n). Binary search trees are basic data structures, which are used to construct more abstract data structures, such as sets, multiple sets, associative arrays, etc.

To find the

The process of finding x in binary search tree B is as follows:

  1. If b is an empty tree, the search fails; otherwise:
  2. If x is equal to the value of the data field of the root node of B, the search succeeds. Otherwise:
  3. If x is less than the value of the data field of the root node b, search the left subtree; Otherwise:
  4. Find the right subtree.

insert

The algorithm of inserting a node S into a binary search tree B is as follows:

  1. If B is an empty tree, insert the node referred to by S as the root node, otherwise:
  2. If s->data is equal to the value of the data field of the root node of B, otherwise:
  3. If s->data is less than the value of the data field of the root node of B, insert the node referred to by S into the left subtree, otherwise:
  4. Insert the node referred to by S into the right subtree. (The newly inserted node is always a leaf node)

delete

Delete a node in the binary search tree and discuss in three cases:

  1. If p is a leaf, PL (left subtree) and PR (right subtree) are both empty trees. Since deleting leaf nodes does not destroy the structure of the whole tree, it is only necessary to modify the Pointers of its parent nodes (no child nodes).
  2. If p has only a left subtree PL or a right subtree PR, then PL or PR can be the left subtree (when P is the left subtree) or the right subtree (when P is the right subtree) of its parent node F. This modification does not break the binary search tree property (only one child).
  3. ifpThe nodeThe left subtreeandThe right subtreeIs not empty. In the deletepThen, in order to keep the relative positions of other elements unchanged, adjustments can be made in order by traversing in middle order. There are two ways:
    • First, let the left subtree of P be the left/right subtree of F (depending on whether P is the left or right subtree of F), s is the lower-right node of the left subtree of P, and the right subtree of P is the right subtree of S.
    • The second is to replace P with its direct predecessors or direct successors (with two child nodes) and then delete its direct predecessors (or direct successors) from the binary search tree.

The following are two examples of removing node 8:

In the figure above, we can see that the right subtree of node 8 is mounted to the precursor node of node 8 after deletion using method A. Of course, you can also point the right subtree pointer of the parent node of 8 to the right subtree of 8, and then mount the left subtree of 8 to the successor node of node 8.

Using method B, the deletion operation can also be completed in the form of finding a precursor (successor) replacement. The main difference between these two operations is that using method A will increase the height of the number, which is not conducive to the search of subsequent nodes.

As we know, the deletion operation will change the structure of the tree. If the nodes are frequently deleted, each node may have only one child node in the worst case. Then, the tree structure will degenerate into a linked list structure, and the time complexity of the query will become O(log n) to O(n). So, how to ensure that the tree query efficiency remains at ORDER (log n) under frequent operations? So that brings us to the concept of a balanced tree.

Balanced tree

A Balance Tree (BT) is one in which the height difference (Balance factor) of the subtrees of any node is less than or equal to 1. Balanced tree is more of a state concept, we can say that the tree is balanced in a certain state and unbalanced in a certain state, and how to transform the unbalanced state into a balanced state is the realization process of our algorithm. Common self-balancing trees are AVL, heap tree, 2-3 tree, red black tree and so on.

AVL tree

AVL Tree (Adelson-Velsky and Landis Tree) is the first self-balanced binary search Tree invented in computer science. In an AVL tree, the maximum height difference between the two subtrees corresponding to any node is 1, so it is also called a height balanced tree. Search, insert, and delete take order (log n) on average and in the worst case. Adding and removing elements may require one or more tree rotations to rebalance the tree.

Rotation is divided into left rotation and right rotation: demo animation

left-handed

Let Q be the right subtree of P. Set the right subtree of P to the left subtree of Q (set the parent of the left subtree of Q to P); Set the left subtree of Q to P(and the parent of P to Q).

Right hand

Let P be the left subtree of Q. Set the left subtree of Q to the right subtree of P (set the parent of the right subtree of P to Q); Set the right subtree of P to Q(and the parent of Q to P).

Let’s look at the following example:

In the figure on the left, the height of the left subtree of node 3 is 2 and the height of the right subtree is 0. At this time, the balance factor is 2, greater than 1, and it is in an unbalanced state. In order to achieve balance, dextral operation is required for node 3, which can be transformed into the equilibrium state in the figure on the right.

In fact, the above rotation is only a relatively simple case, there are roughly the following four cases of rotation:

LL

RR

LR

RL

It can be seen that LL and RR only need one rotation to reach the equilibrium state, while LR and RL need to be transformed into LL or RR first, and then left or right rotation operation.

Disadvantages of AVL trees: Add and remove operations may require one or more rotation operations.

The 2-3-4 trees

A two-three-four tree is a B tree of order four.

According to two important properties of B-trees:

  1. Each node has a maximum of m* child nodes
  2. Each non-leaf node (except root node) has at least ⌈m/2⌉ Child node

As a fourth-order B-tree, we can list all the nodes: 2 nodes (1 keyword), 3 nodes (2 keywords), and 4 nodes (3 keywords). Here’s a two-three-four tree.

Of course, the operation of adding, deleting, changing and searching for 2-3-4 trees is the same as that of B tree:

The query

Start at the root and recursively traverse the tree from top to bottom. At each level, the scope of the search is reduced to a subtree containing the search value. The range of subtree values is determined by the keys of its parent node.

insert

All inserts start at the root node. To insert a new element, first search the tree to find the corresponding node to which the new element should be added. To insert a new element into this node, do the following:

  1. If the node has less than the maximum number of elements, there is room for new elements. Inserts a new element into the node, keeping the elements in the node in order.

  2. Otherwise the node is full, split it evenly into two nodes:

    1. Selects the median from the old and new elements of the node
    2. Elements less than this median go to the left node, and elements greater than this median go to the right node, with the median as the separator value.
  3. The split value is inserted into the parent node, which may cause the parent node to split, which may cause its parent node to split, and so on. If there is no parent node (which is the root node), a new root node is created (increasing the height of the tree).

If the split goes all the way up to the root node, a new root node is created with a split value and two child nodes. This is why the root node does not have the same minimum number of children as the inner node. The maximum number of elements per node is U-1. When a node splits, an element is moved to its parent node, but a new element is added. So the maximum number of elements, U-1, must be able to be split into two legitimate nodes. If U minus 1 is odd, then U is equal to 2L, there are 2L minus 1 elements, one new node has L minus 1 elements, and the other one has L elements, which are all valid nodes. If U minus 1 is even, then U is equal to 2L minus 1, so there are 2L minus 2 elements. Half of it is L minus 1, which is exactly the minimum number of elements a node allows.

delete

Deletion can be divided into the following two cases:

  1. Removal of non-leaf nodes

    The deletion of non-leaf nodes can be achieved through the replacement of precursor or successor nodes. For example, in a 4-order B-tree above, if we want to delete node 9 and 9 as an internal node, if we directly delete 9, then the node where 9 is located has only one keyword [7]. There are three child nodes [(6),(8),(10,11,12)], which obviously cannot meet the requirements of b-tree. At this point, it is only necessary to find the precursor (or successor) of 9, replace 9 with the precursor node 8, and delete the precursor node, thus realizing the transformation from deleting non-leaf nodes to leaf nodes.

  2. Delete the leaf node

    For the deletion of leaf nodes, there are the following two cases:

    1. The number of node keywords exceeds the minimum value

      Can be deleted directly

    2. The number of node keywords is smaller than the minimum value

      Need to adjust

To adjust:

Rebalancing proceeds from the leaf node to the root node until the tree is rebalanced.

If deleting an element in a node causes the number of elements in that node to fall below the minimum, some elements must be reassigned. In general, move an element in a sibling node that has more than the minimum number of elements. If none of the sibling nodes has any extra elements, the missing node must be merged with its sibling. Merging can cause the parent node to lose the separator value, so the parent node may be missing elements and need to be rebalanced. Merging and rebalancing can go all the way to the root node, which becomes the only node missing elements. The algorithm to rebalance the tree is as follows:

  • If the right sibling of the missing element node exists and has extra elements, rotate to the left

    1. Copies the separated value of the parent node to the end of the missing element node (the separated value is moved down; Nodes missing elements now have the minimum number of elements)
    2. Replace the separated value of the parent node with the first element of the right brother (the right brother lost a node but still has the minimum number of elements)
    3. The tree rebalances
  • Otherwise, rotate to the right if the left sibling of the missing element node exists and has extra elements

    1. Copies the separated value of the parent node to the first node of the missing element node (the separated value is moved down; Nodes missing elements now have the minimum number of elements)

    2. Replace the separated value of the parent node with the last element of the left brother (the left brother lost a node but still has the minimum number of elements)

    3. The tree rebalances

  • Otherwise, if both of its immediate siblings have the minimum number of elements, merge it with one of its immediate siblings and their separated values in the parent node

    1. Copy the delimited value to the left node (the left node can be a missing element or a sibling node with the minimum number of elements)

    2. Move all elements from the right node to the left node (the left node now has the maximum number of elements, the right node is empty)

    3. Removes the separated value and empty right subtree from the parent (the parent loses an element)

      • If the parent node is the root node and there are no elements, release it and make the merged node the new root node (tree depth reduced)
      • Otherwise, if the number of elements on the parent node is less than the minimum, the parent node is rebalanced

Because 2-3-4 trees are relatively difficult to implement in most programming languages, operations on trees involve a large number of special cases. A two-three-four tree is the equivalent of a red-black tree (RBT), which is simpler to implement, so it can be used instead.

Red and black tree

Red black tree definition:

  1. Nodes are red or black.
  2. The roots are black.
  3. All leaves are black (leaves are NIL nodes).
  4. Each red node must have two black children. (There cannot be two consecutive red nodes on all paths from each leaf to the root.)
  5. All simple paths from any node to each of its leaves contain the same number of black nodes (black balance).

Equivalence relation between 2-3-4 trees and RBT

Demo Tree

Here is a two-three-four tree with twelve nodes and the corresponding binary tree

Since red-black trees are the equivalent of 2-3-4 trees, the operations of red-black trees are abstracted from B-trees to rotation and coloring.

insert

The main point of discussion is to find the insertion position, and then self-balance (left or right) and the insertion node is red (if yes)

If it is black, there will be one more black node on the current branch, thus breaking the black balance

The tree is an example, and the right subtree is the opposite.

  1. The root node is inserted

    If the first node (root node) is inserted, the red becomes black.

  2. Insert the second node

    If the parent node is black, insert it directly without changing color.

  3. Insert into the third node

    If the parent node is red and there is no uncle node or the uncle node is black (it can only be NIL node), then the grandparent node is right-handed as the fulcrum, and after rotation the original grandparent node turns red and the original parent node turns black (this is the case of LL, note LR).

  4. Insert into four nodes (split)

    If the parent node is red, uncle node is red (this grandpa node must be black), is uncle and father node becomes black, grandpa nodes become red (if grandpa node is the root node, and then into a black), grandpa node at this time need recursive (grandpa node as the newly inserted node compare again).

delete

Similarly, deletion of red-black trees occurs at leaf nodes (non-leaf nodes are replaced by precursor/successor nodes).

For the deletion of red-black tree, the deletion corresponding to 2-3-4 can be classified into three situations: it can be directly deleted (3-node and 4-node), borrow and merge nodes from siblings (the sibling nodes here refer to the sibling nodes in the 2-3-4 tree, rather than the literal sibling nodes)

  1. Delete directly

    If the deleted node corresponds to three or four nodes in a 2-3-4 tree, it can be deleted directly without borrowing the node from the brother or father

    If red nodes are deleted, delete them directly. If the black node is deleted, the red node will replace it and become black

    Can be.

  2. Borrow from the “brother” node

    The premise is to find the true “brother” node.

    Sibling nodes have borrowings (the sibling node must be black, if it is red then it is not a true sibling node

    Point, need to go back to the previous step to find the true sibling node).

    The sibling node has two children (2 children must be red, if black is equivalent to the sibling

    The younger node corresponds to the two-three-four tree is two nodes, so there can’t be any extra elements to borrow.

    If the sibling node has only one child node, rotation is required.

  3. merge

    If you find a “real” sibling node.

    The sibling node has no extra elements to borrow (the sibling node must be black 2 node), and the branch where the sibling node resides is also

    The quickest way to achieve black balance by losing a black node is for the sibling node to become red

    One less black node), and the subtree whose parent is root is balanced again (both sides are one less black than before

    Color). But the tree with the grandparent as root is still unbalanced and requires recursive processing.

conclusion

Above, we know that although the binary search tree can reduce the query complexity from O(n) to O(log N), the binary search tree may degenerate into a structure similar to a linked list after frequent operations, so the AVL tree appears. However, the self-balancing of THE AVL tree may require multiple node rotations. Because the insertion and deletion of 2-3-4 trees will not appear imbalance in most cases, and because it is relatively difficult to achieve in most programming languages, the operation on the tree involves a large number of special cases, generally using his equivalent red-black tree to achieve.

Reference documentation

  1. Wikipedia binary tree

  2. Breadth-first search

  3. Depth-first search

  4. Data structure visualization

  5. Binary search tree

  6. Baidu Encyclopedia balanced tree

  7. Wikipedia balance tree

  8. Wikipedia tree rotation

  9. Wikipedia AVL tree

  10. Wikipedia two-three-four tree

  11. Wikipedia B tree

  12. Wikipedia red black tree

  13. Bilibili Red black tree video

  14. jdk TreeMap