Data structures in Java: Why red-black trees

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Life is too short to have a dog

preface

HashMap has added a red-black tree to HashMap 1.8. Why is a red-black tree so good?

The basic concept

1. What is a red-black tree

Red black tree is a balanced binary search tree. It has both the properties of binary balanced tree and binary search tree. (Balanced binary tree is born to solve the binary search tree may have only left or right subtree)

2. Characteristics of red-black trees

In red-black trees, each node is associated with an additional attribute: a color of red or black. At the same time, the red-black tree makes a restriction:

  1. Root properties: The root node is black

  2. Node properties: Each node is either black or red

  3. External properties: Every leaf node (NIL) is black

  4. Internal properties: the children of red nodes are black

  5. Depth property: For each node, the number of black nodes along the path from that node to all descendant leaf nodes is the same

    This leads to the lemma: a red-black tree with n nodes is at most 2log(n+1) in height.

    Below is a simple red-black tree.

how old are you

After more than 20 years of development, Java has grown to Java 13, with countless improvements and new features that are dizzyingly confusing. Like red black tree, open JDK1.8 collection source code, feel like everywhere is the shadow of red black tree, let a person can not help but sigh: how old you?

1. Red-black tree in HashMap

First, there is the most familiar HashMap. In addition to the array + list structure used in HashMap before JDK1.8, red-black trees have been added to improve the efficiency of the list query. We can see the differences in JDK1.8 by inserting:

Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; // Handle hash conflicts: (3) If the node is not a red-black tree, judge whether the length of the current linked list reaches 7. If so, change the original linked list to a red-black tree and insert the node. If not reached 7, according to the list way to insert the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null;Copy the code

In the code above, when dealing with hash conflicts, the list is changed to a red-black tree when its length is greater than or equal to 7. This is because when the linked list length is less than 8, although the linked list has lower query efficiency (worst time complexity O(n)) than red black tree (worst time complexity O(logn)), the insert and delete operation costs are far less than the cost of red black tree to complete the same operation, overall better than red black tree. As the length of the list increases, the overhead of query, insert, and delete increases significantly. The overall efficiency of the red-black tree is higher than that of the linked list. Let’s look at how the red-black tree in HashMap is implemented:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }
Copy the code

As you can see, the structure of the red-black tree is as given in the definition, with parent, left child, right child, and precursor (to be considered when deleting), and a Boolean variable indicating whether the node is red.

2. Red-black tree in TreeMap

TreeMap is another important implementation class in Map. TreeMap is described in the source code as follows:

A Red-Black tree based {@link NavigableMap} implementation. The map is sorted according to the {@linkplain Comparable natural ordering} of its keys, or by a {@link Comparator} provided at map creation time, depending on which constructor is used.
Copy the code

You can see that TreeMap is implemented based on red-black trees. At the same time, TreeMap is a naturally ordered set of keys. Of course, we can also specify the sort.

/** * The comparator used to maintain order in this tree map, or * null if it uses the natural ordering of its keys. * * @serial */private final Comparator<? super K> comparator; private transient Entry<K,V> root; /** * The number of entries in the tree */private transient int size = 0; /** * The number of structural modifications to the tree. */private transient int modCount = 0;Copy the code

The member variable comparator is used to specify the sort method. If the member variable comparator is not required, it is set to null when the constructor is used. The default is natural sorting. The member variable root is one of the nodes of the red-black tree. We can see the structure of the specific Entry<K,V> :

static final class Entry<K,V> implements Map.Entry<K,V> { K key; V value; Entry<K,V> left; Entry<K,V> right; Entry<K,V> parent; boolean color = BLACK; . }Copy the code

Let’s look at the logical structure of Entry<K, V>, which is similar to the structure in HashMap, except that the color is black by default and the precursor nodes are removed. The use of red-black trees as implementation logic in TreeMap, I think, avoids the problems of using pure binary search trees. Of course, this is where balanced binary search trees come in. There are many other places in Java where a red-black tree is used as a data structure. Here is not a list, you can look at the JDK source code if you are interested.

Why are you so exceptional

It must be acknowledged that red-black trees are excellent, but why not use other balanced binary search trees such as AVL trees to improve retrieval efficiency? The key lies in the restriction of red-black tree on node coloring. Exist because of any one from the root to the leaves of the limitations of the various nodes on the path to chromatic, red and black tree will ensure that there is no one path than other paths grow twice as much, as a result, the red-black tree is a kind of weak balanced binary tree (because it is a weak balance, as you can see, in the case of the same node, the height of the AVL tree < = the red-black tree). Compared with the height balance of AVL tree (HB (k), that is, its balance factor is 1), red black tree requires fewer rotations to maintain balance. Therefore, red-black trees are more suitable for search, insert, and delete scenarios. Although both are O(logn) in terms of time complexity for query, insert, and delete,

conclusion

After the above exploration, can not help but sigh with emotion, big shot must have its reason. Of course, as we can see from the JDK changes, each data structure has its own scope and characteristics, and we need to use the appropriate data structure for different scenarios to improve the performance of data processing.