Red-black tree is a complex data structure, and many of you know it by its name, because it takes a lot of work to understand how it works. I wrote this article to better understand the source code for TreeMap in Java.

Before writing, searched the article on the net, to tell the truth, read a little meng, most come up to give you its five properties, and then is an insert, delete, rotation operation, finished, understand up quite difficult.

This article will combine two-three-four trees, step by step to introduce the origin and principle of red black tree, I believe that after reading, you will have a clearer understanding of it. Also, it is important to note that what is described here is a normal red-black tree, not its variant, the Left-leaning red-black tree (LLRB).

Introduction of red-black trees

Binary search tree

The origin of red-black trees starts with binary search trees. A binary search tree is a binary tree in which the value of each node is larger than any node of its left subtree and smaller than any node of its right subtree. It combines the flexibility of linked list insertion with the efficiency of ordered array search (binary search).

For binary tree search algorithms, the running time depends on the shape of the tree, which in turn depends on the order in which nodes are inserted. As shown in the figure above, in the best case, a tree of N nodes is perfectly balanced, with each empty link having a distance of ~lgN from the root node; In the worst case, the search path can have N nodes, degenerating into a linked list.

So, in order to guarantee the running time is always in the log level, in the dynamic construction of bintree, wants to keep its balance, which is to reduce the height of the tree, make it possible for ~ lgN, these are all looking for can end in ~ lgN times more like binary search, this tree is called a balanced binary search trees.

AVL tree

The first self-balanced binary search tree is the AVL tree, which stipulates that the height difference between the left and right subtrees of each node is no more than 1. After nodes are inserted or removed to break the balance, it is rebalanced by one or more tree rotations.

AVL trees are strictly balanced and are suitable for look-intensive applications because they cost too much in tree rotation in scenarios where nodes are frequently inserted or deleted.

Red-black trees, on the other hand, are a compromise. Instead of being perfectly balanced, they are only partially balanced, thus reducing the number of tree rotations during adjustments.

The 2-3-4 trees

When WE talk about red-black trees, we have to mention two-three-four trees, because red-black trees are kind of a special implementation of it, and knowing something about them is very helpful to understand red-black trees.

If you want to keep the balance, you want to reduce the height of the tree, and if you want to generalize the binary search tree to allow a node to hold multiple values, you can also think of it as reducing the height.

To be precise, the nodes in a standard binary search tree are called 2-a-nodes (two children for one value), and now we introduce 3-a-nodes (three children for two values) and 4-a-nodes (four children for three values) to get a 2-A-3-4 tree (also known as a 2-4 tree).

A two-three-four tree is a fourth-order B-tree, where all the data is stored in sort order, and all the leaves are at the same depth. For most programming languages, it is difficult to directly implement 2-3-4 trees, while red-black trees are relatively simple and easy to implement, which is part of the reason why red-black trees are widely used.

Red-black tree is a binary tree, and all nodes are 2-nodes. Therefore, in order to represent 3-node and 4-node, color attributes are introduced for nodes:

  • Black, ordinary node
  • Red indicates that it can be combined with the parent node as a multi-valued node

As you can see in the figure above, if you flatten the red node of a red-black tree with its parent, it has the same structure as the two-three-four tree on the left.

Red and black tree

Now, look at the properties of red-black trees:

  1. Every node is red or black
  2. The root is black ** (red will eventually turn black) **
  3. All leaf nodes are black, and leaf nodes are null nodes, usually NIL
  4. If the node is red, then all its children are black **
  5. Each path from a given node to any of its descendants NIL contains the same number of black nodes ** (converted into 2-4 trees with all leaf nodes at the bottom) **

These properties don’t have to go back, even if you remember it will definitely forget, should be combined with the two-three-four tree understanding memory.

In addition, rotations and color flips in red-black trees are the equivalent of splits and merges in two-three-four trees, and splits and merges of nodes in two-three-four trees are fairly simple to understand. The comparative analysis and understanding of the operation of red-black trees will definitely make your eyes shine.

Tree rotation

Before we dive into insert and delete, let’s understand what tree rotation is. Tree rotation is an operation of adjusting subtrees in binary trees. It is often used to adjust the local balance of the tree. It includes two ways, left rotation and right rotation.

The rotation is easy to understand: a left rotation changes the root from the smaller of the two nodes to the larger one, and a right rotation does the opposite, as shown in the figure above:

  • To the right, you take the smaller L as the root, and you adjust the subtrees of L and P
  • To rotate to the left is to take the larger P as the root, and then adjust the subtrees of P and L

In fact, the rotation of red-black trees is to ensure the one-to-one correspondence of two-three-four trees with the same structure, and to ensure the order and balance of red-black trees.

insert

Next, the insertion of nodes is analyzed by combining 2-4 trees. First, the insertion logic of 2-4 trees is as follows:

  • If it’s a 2-node, just insert it into a 3-node
  • If it’s a 3-node, just insert it to a 4-node
  • If you have a 4-node, you split it, you make it 2-node, and then you insert it

2-4 trees insert leaf nodes, and red-black trees insert red nodes, because in 2-4 trees, the nodes to be inserted are supposed to be inserted into a multi-valued node.

Here, it is assumed that the node to be inserted is N, P is the parent of N, G is the grandfather of N, and U is the uncle of N (i.e., the sibling of the parent), then the red-black tree can be inserted in the following ways:

  1. N is the root, the first node in a red-black tree
  2. The parent of N (P) is black
  3. P is red (not the root), and its sibling U is also red
  4. P is red and U is black

Case 1,2,3

These three cases are relatively simple, but taken together, they don’t involve rotation, they just involve color flipping, in other words, just merging nodes without splitting them.

Cases 1 and 2, which do not affect the properties of the red-black tree and do not break the balance, can be inserted directly:

  • For 2-4 trees, empty tree insertion is a 2-node -> 3-node -> 4-node conversion process
  • A red-black tree is a standard 2-node whose root is black

In case 3, P is red (not the root) and U is also red. The two tree inserts are as follows:

  • In a 2-4 tree, that means it’s a 4-node, it splits into 2-nodes first, and then inserts
  • For a red-black tree, it’s split and discolored: P and U turn black, G turns red, if G is the root, turn black, otherwise recursively check up to see if there’s an imbalance

Taking the input sequence [7, 5, 9, 3] as an example, the two tree construction processes are as follows:

Scenario 4

In case 4, P is red and U is black, and in this case, this node is a 3-node from the point of view of the 2-4 tree, and you insert it directly into a 4-node; For red-black trees, in order to be consistent with the 2-4 tree structure, it will rotate according to the following four possibilities:

  • P is the left child of G. If N is the left child of P, rotate the grandparent G right
  • P is the left child of G, and if N is the right child of P, then P is rotated left, and then the grandparent G is rotated right

On the contrary:

  • P is the right child of G. If N is the right child of P, rotate the grandparent G left
  • P is the right child of G, and if N is the left child of P, then P rotates to the right, and then the grandparent G rotates to the left

Take the input sequence [7, 5, 9, 3, 4] as an example, that is, on the basis of the above figure, insert 4 to demonstrate the tree rotation process where P is left and N is right:

Other cases, the left and right can be exchanged, can try to analyze. Here is the dynamic construction process of red-black trees and 2-3-4 trees provided at the beginning. The input sequence is [7, 5, 9, 3, 4, 8, 10, 11, 12, 13, 14, 15]. The first is the construction of 2-3-4 trees:

There will be a root adjustment during red-black tree construction, please note:

delete

The nodes of the binary search tree have no more than two children, one child and one leaf, among which the deletion logic of M node with two children is as follows:

  1. First, find the node X with the largest left subtree or the smallest right subtree of node M
  2. And then copy the value of X to M
  3. Finally, delete node X, which is either a leaf node or has only one child

So, delete any node of the problem can be simplified: to delete a most have only one child nodes, and all of the delete operations are done in the leaf nodes, delete nodes is no longer just want to delete the node at the beginning, but in the end the value of the node is deleted, and the change of the tree structure compared to simplify the problem, it is not important.

Before we look at the deletion of red-black trees, let’s look briefly at the deletion of two-three-four trees.

Delete the two-three-four tree

It is similar to the deletion of binary search tree. The actual deletion operation is also completed at leaf nodes, but the deletion process involves the merging of nodes. There are mainly three different situations:

  1. If the element K is an internal node, and inside a multi-valued leaf node with at least 2 keys, you simply remove K from the node
  2. If element K is an internal node with left and right children, then: 2.1 If the left child has at least 2 keys, replace K with a maximum value and then delete this maximum value. 2.2 If the right child has at least 2 keys, replace K with a minimum value and then delete this minimum value. 2.3 If both children have only 1 key, then drop K. Merge with its children to form a node with at least two keys, and finally delete K
  3. If the element K is not an internal node and only has one key, then it will eventually be either case 1 or case 2, depending on the following: 3.1 If its sibling node has at least 2 keys, select one to be pushed into the parent node, and then merge the old parent node with K. 3.2 If its sibling node also has only 1 key, then sink the parent node, merge it with its children, and then delete K, so, At this point, the parent must have at least 2 keys, if not then recurse on the parent as in case 3

In these cases, one of the preconditions is that when traversing the nodes to be deleted, you must ensure that all the nodes passing by have at least two keys, otherwise you need to merge the nodes. This is a bit difficult to understand. When inserting, it will split the 4-nodes encountered in the traversal process. In contrast, when deleting, it needs to ensure that the traversal nodes have at least 2 keys, which is equivalent to merging the previously split nodes.

The following illustration illustrates each of the above possible deletions:

In short, the key to understanding the deletion of a 2-3-4 tree is:

  • If the node to be deleted is a multi-valued node, delete it directly
  • Otherwise, an extra node is fetched from the sibling to fill the gap
  • Otherwise, a node is obtained from the parent node to fill the gap. If the parent node has no extra nodes, the problem is recursed to the parent node.

Delete red black tree

The deletion of a red-black tree is also like a binary search tree, but it’s a little bit more difficult to think about balance, which is the color of the nodes.

So first of all, I’m going to say that the leaves of the red and black tree are the same as the leaves of the binary search tree, and if I want to emphasize that the leaves of the red and black tree are NIL I’m going to say it’s going to be a black box.

Assume that the node to be deleted is M, and if there are non-leaf nodes, called C, there are two simple deletion cases:

  1. M is a red node, so it must be a leaf node, so you can just delete it, because if it has a black non-leaf node, then it violates property 5, and the black nodes are not equal if you go left or right through M
  2. M is black and C is red, so you just replace C with M and make it black, or swap C and M and delete C (the first simple case).

Note: M has and only one non-leaf left or right child node, equivalent to case 1 of the 2-3-4 tree deletion.

In both cases, you’re essentially removing a red node, so it doesn’t affect the overall balance. Taking the red-black tree constructed by the [7, 5, 9, 3, 4] input sequence as an example, the above two relatively simple cases are demonstrated:

The more complicated deletion is the case where M and C are both black, in which case M must be a leaf node and C must be NIL. Otherwise, property 5 will be violated.

If a black node is deleted, it will break the balance, and you need to find a node to fill the gap. Let’s say the node to be deleted is M, and then it becomes NIL. For the sake of description, this node is called N, P is the parent of N, S is the brother of N, and if there are left and right children of S, If SL and SR are used respectively, then deletion can occur in the following cases:

Case 1 minus N is the root

After deletion, N becomes the root node, which means there is only one node, M, which can be deleted directly.

Case 2 – P black and S red

S is red, so it must have two children, both black, and P must also be black. At this point, switch the colors of P and S, then rotate P left as follows:

Now, the parent of node N is red and the sibling is SL, and we can proceed with cases 4, 5, 6.

Case 3-P black, S black

P is black, S is black, and S has no non-empty child node. At this point, if S is directly changed to red, there will be one less black node in the path passing through S, and on the whole, there will be one less black node in the path passing through P, and the imbalance state will be transferred from node N to node P. P can be treated according to case 1 until it encounters the root node, thus forming recursion:

Case 4 – P red, S black

P is red, S is black, and S has no non-empty child node. In this case, by switching the colors of P and S, the missing black node will be filled and the state of equilibrium will be restored:

Case 5 – P any S black SL red

P any color, S is black, S’s left child is red, (S has right child is also red). At this point, rotate S right and switch the colors of S and SL:

So we’re essentially taking this case and turning it into case 6.

Case 6 minus P any S is black, SR is red

P any color, S is black, S’s right child is red, (S has left child is also red). At this point, rotate P left, swap the colors of P and S, and turn SR black:

And then the equilibrium, whatever color P was before, N has one more black parent than before. Let’s say P was red, and now it’s black; Let’s say it was black, but now P has a black parent, S, so, anyway, we’re adding a black node to the path through node N.

In each of these six cases, node N is a left child, and if it’s a right child, you just switch left and right. By analogy with the deletion of 2-3-4 trees, the key to understand the deletion of black nodes is:

  • If there are red children, one of the children is chosen to fill the deleted gap
  • Otherwise, one of the siblings is selected to fill the gap
  • Otherwise, a node is selected from the parent node to fill the gap, and the problem is recursed to the parent node for processing

Finally, let’s look at the dynamic deletion process of two-three-four trees and red-black trees. The input sequence is [7, 5, 9, 3, 4, 8, 10, 11, 12, 13, 14, 15, 16]. The deletion order is [16, 13, 11, 7, 15, 14, 8, 4, 9, 10, 5, 3, 12], first is the dynamic deletion process of 2-3-4 trees:

Red black tree dynamic deletion process:

summary

The red-black tree is more complex, the simple analysis of the properties and spinning, meaning is not big, but the 2-3-4 tree is different, its insertion and deletion easier, and the rotation of the red-black tree and discoloration and to eventually and isomorphism is consistent with the 2-3-4 trees, this paper is the combination of analysis, support each other, believe that will be relatively easy to understand.

GIF from the website: www.cs.usfca.edu/~galles/vis… It supports single step debugging, if you are interested in giving it a try.

Search the public account “Epiphany source code” for more source code analysis and build wheels.