Introduction to ordered tables
The essential difference between an ordered table and a Hash table is that the keys of a Hash table are organized by the Hash function, while the keys of an ordered table are organized sequentially.
In addition to supporting all operations on hash tables, ordered tables also provide other operations that can be implemented due to the orderality of keys. For example, find the Value of the smallest or largest Key. Given a Key, find the Value of the nearest and smaller Key……
Ordered tables have O(logN) time complexity for all operations, which is very efficient.
Many structures can implement ordered tables. For example, Red Black tree, AVL tree, SB tree (Size Balanced Tree), Skip List…… The ordered tables implemented by these data structures all have the same performance metric, O(logN). Due to the different principles of their respective implementation of ordered tables, the difference in time complexity is only a difference in constant time, and the difference in constant time is relatively small.
Therefore, as long as the master of a data structure for ordered table implementation is ok.
In competitions and general problems, SB trees are generally used to achieve ordered tables, not because SB trees are much better than other data structures to achieve ordered tables, but because SB trees are much easier to rewrite than other data structures.
Logical relationship
Different data structures implement ordered tables divided into two major series:
- Balanced search binary tree series:
- Red and black tree
- AVL tree
- SB tree
- Skip table series: skip table
Search binary trees
Balanced search binary tree, first of all, is a search binary tree, and then it involves the problem of balance.
Let’s start with a question: how do you add, delete, and change data in a search binary tree without considering balance?
1. Search process
Search binary tree has a fixed search process, add, delete, change and search all operations need to be based on the search process to complete the specific implementation.
The search process is as follows: The target Key is compared with the root Key. If the target Key is smaller than the root Key, the search continues in the left subtree of the root. If it is larger than the root’s Key, the search continues into the right subtree of the root, and the process continues.
Since AVL trees, SB trees, and red-black trees are binary search trees, they also follow the search process to add, delete, change, and search, but they will automatically adjust if their balance is broken after each operation.
2. Query data
The target Key enters the search process. If the node exists, the system returns. If not, a unified error is returned.
3. Add data
If you want to add a piece of data, you can directly find the corresponding location to add it through the search process.
If you want to add a group of data, you should arrange the data to be organized, find the corresponding position to add through the search process, and then add.
By default, a search binary tree has no duplicate keys, so will the functionality of the search binary tree be lost?
No, because it is possible to add data items as adjoint data (data compression) to each node of the search binary tree, no matter how complex the data needs to be organized. In this way, the data will not be lost and the entire search binary tree will not have duplicate nodes.
4. Update data
If the target Key can find the corresponding node in the tree through the search process, the node data will be updated. If no node can be found, create a node and add it to the corresponding position in the binary tree.
5. Delete data
Deleting data is not so easy. Why?
Because deleting data needs to be discussed in different cases:
-
The node to be deleted has neither left nor right children.
-
The node to be deleted has no left or right child, and its only child can replace it (location and environment).
-
The node to be deleted has both left and right children. Who can replace it to keep the environment stable?
- It can be replaced by the right-most node in the left subtree. If the rightmost node has a left child, let the left child inherit its original position.
- It can be replaced by the leftmost node in the right subtree. If the leftmost node has a right child, let the right child inherit its original position.
Balanced search binary tree
One problem with searching binary trees is that there is no balance, and without balance the cost of searching binary trees cannot be maintained at order (logN).
Search time complexity of the operation of the binary tree depends on the data, assuming that need organization [6] now these data into a binary search tree, then in the binary search tree to add and delete the price will be bigger than O (logN), to O (N), because the search of the binary tree is equivalent to a single table.
Thus, the question arises: how can binary tree search be balanced?
1. The balance
Balance is divided into narrow sense balance and broad sense balance.
- Narrow sense of balance: AVL trees, red black trees, SB trees and other balance trees defined by their own balance. The absolute value of the difference between the height of the left and right subtrees of any node is no more than 1.
- Generalized balance: the size of the left subtree and the size of the right subtree of any node do not differ too much. This volume can be the depth of the tree or the number of nodes.
The narrow sense balance is a concrete realization of the broad sense balance.
As long as the search for binary trees can take care of the general balance, then the operation cost can be maintained at order (logN) level, let alone narrow balance.
2. Left and right rotation
Definition:
Two actions applied to the search binary tree, the search binary tree relies on the use of left and right rotation of the two actions to achieve its own balance adjustment.
L:
A does A left-handed motion, which means A flips to the left, or A flips to the left.
The right child C node of node A replaces the original position of node A, and the left subtree of node C acts as the right subtree of node A (no matter if there is none). The left subtree of node A and the right subtree of node C remain unchanged.
Right:
A does A dextral motion, which means A flips to the right, or A flips to the right.
The left child B node of node A replaces the original position of node A, and the right subtree of node B acts as the left subtree of node A (no matter if there is no one). The right subtree of node A and the left subtree of node B remain unchanged.
Example:
In order to describe the whole process of left-right rotation, the above two cases do not reflect the obvious effect of left-right rotation on balance adjustment.
In the figure above, we take the tree after right-rotation of node A as left-rotation of node B, as shown below:
3. Balanced search binary tree
Balanced search binary trees are also known as search binary trees with self-balanced operations.
Balanced search binary tree is only a model at the core level, which only defines the basic criterion of whether a tree is a balanced search binary tree.
The criteria are as follows: first, the tree is a search binary tree; second, the search binary tree is generalized balanced and includes both left and right rotation actions. However, there are no rules on how to use the left and right rotation movements to adjust the balance, only the two movements to adjust the balance are defined.
AVL trees, red-black trees, and SB trees are all concrete implementations of balanced search binary trees, each of which has a different definition of its own balance and a different way of using left and right rotation to achieve balance. It can be said that they maintain their own definition of different balance through the specific use of left-handedness and right-handedness.
AVL tree
1. The balance
The balance of AVL trees is defined as: the absolute value of the height difference between the left and right subtrees of any node is not more than 1.
2. Four types of imbalance
There are four situations in AVL tree where the balance is broken:
-
LL: The left subtree of the left child of the current node is very deep, so the balance of the node is damaged. You can perform right-handed rotation on the current node.
-
RR: The right subtree of the child on the right of the current node is deep, so the balance of the node is damaged. Perform left rotation on the current node.
-
LR: The right subtree of the left child of the current node is deep, resulting in the balance of the node being damaged.
At this point, the general direction of our adjustment is to adjust the left child of the current node to the right child of the current node. Only in this way can we ensure the overall balance.
The specific method is: first to the left child of the current node to do a left turn, and then to the current node to do a right turn.
-
RL: The left subtree of the right child of the current node is very deep, so the balance of the node is damaged.
At this point, the general direction of our adjustment is to adjust the right child of the current node to the left child of the current node. Only in this way can we ensure the overall balance.
The specific method is: first do a right turn to the right child of the current node, and then do a left turn to the current node.
In all four cases, the adjustment costs are O(1) constant. So even if all the nodes along the path from one location to the root have to be adjusted, the total cost is order logN.
3. Balance check mechanism
Only when nodes are added or deleted, the unbalanced state may appear in balanced search binary tree. The first problem to be solved is how to establish a self-balancing mechanism, without discussing how to balance.
AVL trees have their own set of balance checks.
Overall inspection process:
- After adding to the AVL tree, the balance of each node is checked upward from the newly added node, and left – or right-handed adjustments are performed until the root node is balanced.
- After the AVL tree is deleted
- If the deleted node has no children or only one child, the balance of each node is checked upward, starting with the node that inherits the location of the original deleted node.
- If the deleted node has two children, the balance of each node is checked upward, starting with the parent of the node that inherited the location of the original deleted node.
For each node, if that node is unbalanced, then how do we determine which of the four imbalances it is?
Node check process:
Calculate the depth difference between left and right subtrees of the current node (left subtree depth – right subtree depth).
-
If the depth difference is 2, determine the type LL or LR:
-
If the left child of the current node is not empty, the left child is LL.
-
If the left child of the current node is empty, then it is LR.
-
-
If the depth difference is -2, determine the type RR or RL:
-
If the right child of the current node is not empty, it is RR.
-
If the right child of the current node is empty, it is of type RL, and RL adjustment is performed.
-
SB tree
1. The balance
The balance of SB trees is defined as: no tree has less nodes than its sibling’s subtrees.
As shown in the figure above, given that T1 and T2 subtrees have 10 nodes each, neither T3 nor T4 can have more than 22 nodes if the subtree rooted with node B does not violate the balance of SB tree. In the worst case, T3 and T4 both have 22 nodes, so the size of the left and right subtrees of the tree rooted in node A is at most double.
From the point of view of human natural intelligence, one can sense without proof that if the balance of SB tree is not destroyed, the left and right subtrees must be about the same size, and in the worst case, the left and right subtrees must be twice as big.
2. Four types of imbalance
There are also four cases in the SB tree where the balance is broken:
-
LL: The number of nodes in the left subtree of the left child of the current node is greater than the number of nodes in the right subtree of the current node. As a result, the balance of the node is damaged.
At this point, the general direction of our adjustment is: first let the left child of the current Node go up, and then let the Node whose child number changes call a recursive check operation M(Node).
The adjustment method is: rotate the current node once, then call M(the current node), and finally call M(the original left child of the current node).
-
RR: The number of nodes in the right subtree of the current node is greater than the number of nodes in the left subtree of the current node. As a result, the balance of the node is damaged.
At this point, the general direction of our adjustment is: first let the right child of the current Node go up, and then let the Node whose child Node number changes call a recursive check operation M(Node).
The adjustment method is: rotate the current node once, then call M(the current node), and finally call M(the former right child of the current node).
-
LR: The number of nodes in the right subtree of the left child of the current node is greater than the number of nodes in the right subtree of the current node, resulting in the imbalance of the node.
At this time, the general direction of our adjustment is: just like AVL tree, adjust the left child of the current Node to the right child of the current Node, and then let the Node whose child Node number changes call a recursive check operation M(Node).
In this way, the left child of the current node is left-handed, then the right child of the current node is right-handed, and finally M(the original left child of the current node), M(the current node), and M(the right child of the original left child of the current node) are called.
-
RL: The number of nodes in the left subtree of the right child of the current node is greater than that of the left subtree of the current node. As a result, the balance of the node is damaged.
At this time, the general direction of our adjustment is: just like AVL tree, adjust the right child of the current Node to the left child of the current Node, and then let the Node whose child Node number changes call a recursive check operation M(Node).
To do this, perform a right rotation for the right child of the current node, then perform a left rotation for the current node, and finally call M(the original right child of the current node), M(the current node), and M(the left child of the original right child of the current node).
The adjustment of SB tree is not as straightforward as AVL tree, but the cost is very low, both are O(logN) level, the proof is very complicated, so I will not prove.
3. Balance check mechanism
SB trees also have their own set of balance checks.
The overall check flow of SB tree is the same as that of AVL tree, but the node check flow is different from that of AVL tree.
Node check process:
- If the left child of the current node, the left child of the current node, and the right child of the current node are not empty, and the number of nodes in the left subtree of the left child of the current node is greater than the number of nodes in the right subtree of the current node, the type is LL.
- The left child of the current node, the right child of the left child of the current node, and the right child of the current node are not empty, and the number of nodes in the right subtree of the left child of the current node is greater than that of the right subtree of the current node, then the type is LR, and LR adjustment is performed.
- If the right child of the current node, the right child of the current node, and the left child of the current node are not empty, and the number of nodes in the right subtree of the current node is greater than the number of nodes in the left subtree of the current node, the type is RR.
- The right child of the current node, the left child of the right child of the current node, and the left child of the current node are not empty, and the number of nodes in the left subtree of the right child of the current node is greater than that of the left subtree of the current node, then the type is RL, and RL adjustment is performed.
Red and black tree
Rules of 1.
-
Nodes are either red or black.
-
Root nodes and leaves must be black.
The leaf node in a red-black tree is not a node with no left or right child, but a node with the lowest level of null.
-
No two red nodes can be adjacent.
-
For any subtree, the number of black nodes in each path from the root to the leaf must be the same.
What is the sum total of these four rules?
There are only two things that can happen to all the paths of a red-black tree:
- All nodes are black (short path).
- A red node alternates with a black node (long path).
Balance 2.
Because of the red-black tree configuration, there can only be two cases, one is all black, one is alternating red and black. Rule 4 requires the same number of black nodes in each path, so the worst case is that one of the left and right subtrees of the root is composed of all-black nodes, and the other is composed alternately of red-black nodes. Even so, the left and right subtrees are twice as big.
3. Balance adjustment
Balance tuning is not covered here, because it is too complex and has limited performance, and is purely an intellectual feast. As a result, red-black trees are rarely used today and will be replaced by other structures, such as SB trees.
However, it should be noted that red black trees are very similar to AVL trees and SB trees, but the criteria for checking each node are different, which are still adjusted only by left and right rotation.
Jump table
1. Node structure
public class SkipListNode<K extends Comparable<K>, V> {
// key
public K key;
// Accompanying data
public V value;
// Lower pointer linked list
publicArrayList<SkipListNode<K, V>> nextNodes; . }Copy the code
Since the ordered table is implemented by the hop table, the Key of the hop table node must be able to be compared.
2. Search process
Find the right-most node whose Key is less than or equal to the Key of the node to be operated from the highest level of the hop table, and determine whether the node to be operated can reach the current height:
- If not, jump to the next level.
- If so, the node is operated on.
In one of the search process to accelerate the process, because the finding of the current layer Key less than or equal to after waiting for the right of the Key operation node, if not up to the current search operation node height, you need to jump a layer, and after the jump to the next layer directly start from a layer to find the right node judges whether operation node to the current search height. This avoids the need for each layer to look up the right-most node of the current layer from scratch, thus skipping nodes that are definitely not right-most, saving multiple traversals.
2. Add data
There is a default node initialized with a null pointer to the child node. The Key of the default node has the minimum sense of ordering, meaning that the Key of all nodes that need to be organized is larger than the Key of the default node.
Adding process:
-
Roll Determines the height of the node to be added.
Before adding the data to be organized into the hop table, roll dice is needed to determine the height of the node composed of the data, that is, the way of roll dice is used to determine the number of Pointers to the lower nodes formed by the data. The minimum height of each node is 1, which means that each node is initialized with a pointer to the child node that points to NULL.
Roll a die will roll a 0 50 percent of the time and a 1 50 percent of the time. If you roll to 1, the height of the node increases by 1, and you can continue to roll until you end up at 0.
-
Compare the height of the node to be added with that of the default node. If the default node height is small, the default node height needs to be expanded.
Only the default node can expand height, a normal node does not have to expand height, when a normal node after the roll die, its height is determined.
The height of the default node is always the same as that of the highest common node in the hop table
-
Find the node insertion position by Key sort.
-
Enter the search process:
- If the current search height is reached, the child pointer of the rightmost node in this layer points to the node to be added, and the child pointer of the node to be added in this layer points to the leftmost node in this layer whose Key is greater than the Key of the node to be added.
- If the current search height is not reached, no action is performed. The node jumps to the next layer of the current layer and continues to search for the rightmost node whose Key is less than or equal to the Key of the node to be added.
Example:
Suppose you want to organize data like: [3,5,10,20] and roll the data to heights like: 2,3,1,4.
The diagram below:
3. Query data
Enter the search process:
- If the current search height is not reached, jump to the next layer. If the search height is not reached, the query fails.
- If the current search height is reached and the current layer is found, return directly.
4. Delete data
Enter the search process:
- If the current search height is not reached, jump to the next layer, if the last layer is still not found, the deletion fails.
- If the current search height is reached, the lower pointer of the rightmost node whose Key is less than or equal to that of the node to be deleted points to the leftmost node whose Key is greater than that of the node to be deleted.
5. Time complexity
So we know that there’s a 50% probability of rolling out 1 and a 50% probability of rolling out 0. The probability of rolling 1 x times in a row is 1/2 to the x.
If you need to organize N data, there are N nodes on layer 0, almost N / 2 nodes on layer 1, and almost N / 4 nodes on layer 2……
If N is large enough, then the number of indexes at level 0 up to the top of the table is about the same as the node size of a complete binary tree, so all operations on the table are O(logN) level.
6. Advanced thinking
The reason why skip tables are particularly advanced is that instead of using hard rules to design structures of O(logN) complexity, they rely on probability to do O(logN) complexity.
The order of the data is arbitrary, but the height of each data node is completely random, and the efficiency sum of the hop table is related to the distribution of nodes at the bottom. Thus, a skip table takes advantage of random probability to completely decouple data order from operation costs.
This is why a hoptable is intellectually much more advanced than some old balanced search binary tree.