preface

For more information on HashMap:

An in-depth look at HashMap in one article will suffice

Self-balancing Binary Search Tree

The running time of binary search tree operation is closely related to the height of the tree. The height of a tree is the length of the longest path that can be reached from the root of the tree. First, set the height of the leaf node to 0, calculate the height of the parent node along the path of the tree, and so on until the height of all nodes in the tree is marked, then the height of the root node is the height of the tree.

Eg:

The figure below shows several BST with calculated height

If the number of nodes in the tree is N, then the height of a BST tree satisfying O(log2n) progressive running time should be close to the largest integer less than log2n.

Of the three trees in the figure above, tree B has the best ratio of height to number of nodes. Tree B has height of 3 and n (number of nodes) is 8, so log28 = 3 is exactly equal to the height of the tree.

The n of tree A is 10, the height is 4, log210 = 3.3219, the largest integer less than 3.3219 is 3, so the ideal height of tree (a) should be 3. We can reduce the height of the tree by moving the farthest node to a non-leaf node in the middle. The ratio of the height of the tree to the number of nodes is optimized.

Tree C has the worst ratio. It has 5 nodes, so log25 = 2.3219, and the ideal height is 2, but it’s actually 4

So our problem is how to ensure that the topology of a BST always maintains the optimal ratio of the height of the tree to the number of nodes. We need to keep the BST balanced after new nodes are inserted, and the BST that is always balanced is a balanced binary search tree.

There are many BST with this self-balancing function, eg:

  • AVL tree
  • Red and black tree
  • 2-3 tree
  • The 2-3-4 trees
  • Stretch the tree
  • B tree
  • … …

This paper mainly introduces two kinds: red black tree and AVL tree

AVL

In 1962, Russian mathematicians G. M. Andel ‘son-vel-skii and E. M. Landis invented the first self-balanced binary search tree, called the AVL tree. The AVL tree must maintain the following equilibrium conditions for each node N:

  • The difference between the height of the left subtree of node N and the height of the right subtree is at most one.

The following figure shows some BST, where (a) and (b) are legal AVL trees and © (d) are not legal because not all nodes in the tree meet the AVL equilibrium property requirements. (The picture is from the Internet



When creating an AVL tree, the focus is on how to ensure the balance of the tree on the basis of BST, no matter how to add/remove nodes to the tree, always maintain the balance of the tree. AVL trees are balanced by “rotation” operations.

When adding a node to the AVL tree, there are two steps:

  • First, the operation of inserting a new node is the same as that of inserting a node in a BST tree. The new node is placed in the appropriate position to meet the BST nature requirements
  • Based on the nature of BST, AVL may be violated at this time. The height of left and right subtrees of each node is checked by traversing the access path. If the height difference between the left and right subtrees of a node is greater than 1, rotation operation is needed to deal with it.

Red black Tree (R-B Tree)

In 1972, Rudolf Bayer, a computer scientist at the Technical University of Munich, created the red-black Tree data structure. In addition to containing data and left and right child nodes, nodes in a red-black tree contain one particular piece of information – color. This color contains only two colors, red and black. Also, red-black trees add a special type of node called NIL node. NIL nodes will appear as pseudo-leaf nodes in red-black trees. That is, all nodes with critical data are called inner nodes, and all other outer nodes point to NIL nodes.

Red-black trees have the following properties:

  • Nodes are red/black
  • The root node must be black
  • Each leaf node (NIL node, empty node) is black
  • Two children of each red node are black (there cannot be two consecutive red nodes on all paths from each leaf to the root)
  • All paths from any node to each of its leaves contain the same number of black nodes

If a new node is inserted into a red and black number, the following operations are performed:

  • The BST insertion algorithm is used to insert nodes to meet the nature requirements of BST
  • By recoloring the elements and rotating the Tree structure to satisfy the properties of r-B Tree

Similarities and differences between AVL and red black trees

We can clear contrast, AVL tree and red and black tree insert node operation have similarities and differences, the same is the need to meet the requirements of the properties of the BST, again to do adjustment according to the property of their respective, AVL by rotation operation to meet their own properties, and the red-black tree to coloring and rotate the two operations to meet the requirements of their own nature.

Let’s compare the performance of AVL and R-B Tree:

The query performance of a red-black tree is slightly inferior to that of an AVL tree because it is slightly more unbalanced than an AVL tree (at most one level), meaning that the query performance of a red-black tree is only compared to an AVL tree with the same content at most one time. However, the performance of red-black trees is better than that of AVL trees in insertion and deletion operations. AVL trees perform a lot of balance calculation for each insertion/deletion operation, and the overhead of coloring and rotation for red-black trees to maintain the properties of red-black trees is much less than that of AVL to maintain strict balance. This is why Hashmap is more widely used than AVL.

So, if you have a program where the number of queries is much greater than insert/delete, go with AVL, and if the number of queries, insert/delete is similar, go with a red-black tree.

Application scenarios

Application scenarios of AVL tree:

  • Windows process address space management

Application scenarios of red-black trees:

  • Implementation of epoll in kernel with red-black tree to manage event blocks (file descriptors)
  • TreeMap implementation of Java
  • Nginx uses a red-black tree to manage timers
  • The Linux process Scheduler Completely Fair Scheduler uses a red-black tree to manage the process control block

summary

This article is some of the author’s insights, if you are interested in Java collections follow this column.