Chapter 13 Binary Search tree
Translator: Flying Dragon
Protocol: CC BY-NC-SA 4.0
Proudly using Google Translate
This chapter introduces the solution to the previous exercise and then tests the performance of tree maps. I showed an implementation problem and explained how Java’s TreeMap solves it.
13.1 simpleMyTreeMap
In the last exercise, I gave you an outline of MyTreeMap and asked you to fill in the missing methods. Now I’ll show the results, starting with findNode:
private Node findNode(Object target) { // some implementations can handle null as a key, but not this one if (target == null) { throw new IllegalArgumentException(); } // something to make the compiler happy @SuppressWarnings("unchecked") Comparable<? super K> k = (Comparable<? super K>) target; // the actual search Node node = root; while (node ! = null) { int cmp = k.compareTo(node.key); if (cmp < 0) node = node.left; else if (cmp > 0) node = node.right; else return node; } return null; }Copy the code
FindNode is a private method used by containsKey and GET; It is not part of the Map interface. The target argument is the key we are looking for. I explained the first part of this approach in the last exercise:
- In this implementation,
null
Not a valid value for the key. - In we can in
target
On the callcompareTo
Before, we had to cast it to some form ofComparable
. The “type wildcard” used here will allow as much as possible; That is, it applies to any implementationComparable
Type, and itscompareTo
acceptK
Or any andK
The superclass.
After that, the actual search is simpler. We initialize a loop variable node to reference the root node. In each loop, we compare the target to Node.key. If the target is smaller than the current key, we move to the left subtree. If it’s bigger, we move to the right subtree. If equal, we return the current node.
If we reach the bottom of the tree without finding the target, I assume it’s not in the tree and return NULL.
13.2 search value
As I explained in the previous exercise, the findNode runtime is proportional to the height of the tree, not the number of nodes, because we don’t have to search the entire tree. But for containsValue, we have to search for the value, not the key; The properties of BST do not apply to values, so we have to search the entire tree.
My solution is recursive:
public boolean containsValue(Object target) {
return containsValueHelper(root, target);
}
private boolean containsValueHelper(Node node, Object target) {
if (node == null) {
return false;
}
if (equals(target, node.value)) {
return true;
}
if (containsValueHelper(node.left, target)) {
return true;
}
if (containsValueHelper(node.right, target)) {
return true;
}
return false;
}Copy the code
ContainsValue takes the target value as an argument and immediately calls containsValueHelper, passing in the root node of the tree as an additional argument.
Here’s how containsValueHelper works:
- The first one
if
Statement checks recursive boundary cases. ifnode
isnull
That means we’ve recursed to the bottom of the tree and haven’t found ittarget
So we should go backfalse
. Note that this only means that the target does not appear on a path in the tree; It could still be found along another path. - The second case checks if we found what we were looking for. If so, we return
true
. Otherwise, we must continue. - The third case is a recursive call that searches the left subtree
target
. If we find it, we can go back immediatelytrue
, instead of searching the right subtree. Otherwise we move on. - The fourth case is searching the right subtree. Similarly, if we find what we are looking for, we return
true
. Otherwise, we search the entire tree and returnfalse
.
The method “visits” every node in the tree, so the time it takes is proportional to the number of nodes.
13.3 implementationput
The put method is a little more complicated than get because it handles two cases :(1) if the given key is already in the tree, it replaces and returns the old value; (2) Otherwise a new node must be added to the tree, in the right place.
In the last exercise, I provided this starting code:
public V put(K key, V value) {
if (key == null) {
throw new IllegalArgumentException();
}
if (root == null) {
root = new Node(key, value);
size++;
return null;
}
return putHelper(root, key, value);
}Copy the code
And let you fill the putHelper. Here are my answers:
private V putHelper(Node node, K key, V value) {
Comparable<? super K> k = (Comparable<? super K>) key;
int cmp = k.compareTo(node.key);
if (cmp < 0) {
if (node.left == null) {
node.left = new Node(key, value);
size++;
return null;
} else {
return putHelper(node.left, key, value);
}
}
if (cmp > 0) {
if (node.right == null) {
node.right = new Node(key, value);
size++;
return null;
} else {
return putHelper(node.right, key, value);
}
}
V oldValue = node.value;
node.value = value;
return oldValue;
}Copy the code
The first parameter, Node, is originally the root of the tree, but every time we make a recursive call, it points to a different subtree. Just like get, we use the compareTo method to figure out which tree path to follow. If CMP < 0 and we add a key smaller than Node. key, we go to the left subtree. There are two cases:
- If the left subtree is empty, that’s if
node.left
isnull
We have reached the bottom of the tree without finding itkey
. At this point, we knowkey
It’s not in the tree. We know where it belongs. So we create a new node and add it tonode
The left subtree of. - Otherwise we make a recursive call to search the left subtree.
If CMP > 0 and we add a key greater than Node. key, we go to the right subtree. The two cases we deal with are the same as the last branch. Finally, if CMP == 0 and we find the key in the tree, we change it and return the old value.
I wrote this method recursively to make it easier to read, but it can be directly rewritten iteratively, which you might want to leave as an exercise.
Sequence Traversal in 13.4
The last method I ask you to write is keySet, which returns a Set containing the keys in the tree in ascending order. In other Map implementations, keysets return keys in no particular order, but a feature of tree implementations is simple and efficient sorting of keys. So we should use it.
Here’s my answer:
public Set<K> keySet() {
Set<K> set = new LinkedHashSet<K>();
addInOrder(root, set);
return set;
}
private void addInOrder(Node node, Set<K> set) {
if (node == null) return;
addInOrder(node.left, set);
set.add(node.key);
addInOrder(node.right, set);
}Copy the code
In keySet, we create a LinkedHashSet, which is a Set implementation that keeps elements in order (unlike most other Set implementations). We then call addInOrder to traverse the tree.
The first parameter, Node, is originally the root of the tree, but as you might expect, we use it to recursively traverse the tree. AddInOrder performs a classic “middle-order traversal” on the tree.
If node is null, that means the subtree is empty, so we return without adding anything to the set. Otherwise we:
- Traverse the left subtree sequentially.
- add
node.key
. - Traverse the right subtree in order.
Remember that the BST nature guarantees that all nodes in the left subtree are smaller than Node.key, and all nodes in the right subtree are larger. So we know that Node. keys have been added in the correct order.
Recursively applying the same parameters, we know that the elements in the left subtree are ordered, as are the elements in the right subtree. And the boundary case is correct: if the subtree is empty, no keys are added. So we can assume that this method adds all the keys in the right order.
Because the containsValue method accesses every node in the tree, the time required is proportional to n.
13.5 Log time method
In MyTreeMap, the get and PUT methods take time proportional to the height h of the tree. In the last exercise, we showed that if the tree is full – if each level of the tree contains the maximum number of nodes – the height of the tree is in cross arm log n.
I also said get and put are logarithmic time; In other words, their time is proportional to logn. But for most applications, the tree is not guaranteed to be full. In general, the shape of the tree depends on the keys and the order in which they are added.
To see how this works in practice, we’ll test our implementation with two sample data sets: a list of random strings and an ascending list of timestamps.
Here is the code to generate a random string:
Map<String, Integer> map = new MyTreeMap<String, Integer>();
for (int i=0; i<n; i++) {
String uuid = UUID.randomUUID().toString();
map.put(uuid, 0);
}Copy the code
UUID is a class in java.util that generates random “universal Unique Identifiers.” UUID is useful for a variety of applications, but in this example, we use a simple method to generate random strings.
I ran the code with n=16384 and measured the running time and height of the final tree. Here is the output:
Time in milliseconds = 151
Final size of MyTreeMap = 16384
log base 2 of size of MyTreeMap = 14.0
Final height of MyTreeMap = 33
Copy the code
I included “logarithm base 2 of MyTreeMap size” to see how tall the tree would be if it was full. The results show that the complete tree with height 14 contains 16,384 nodes.
The tree height of random strings is actually 33, which is much higher than the theoretical minimum, but not too bad. To find one of the 16,384 bonds, we only need to make 33 comparisons. Nearly 500 times faster than linear search.
This performance is usually random strings, or other keys that are not added in a particular order. The final height of the tree may be 2 or 3 times the theoretical minimum, but it is still proportional to log n, which is much less than n. In fact, logn slowly increases as n increases, and in practice it may be difficult to distinguish logarithmic time from constant time.
However, binary search trees do not always perform well. Let’s see what happens when we add keys in ascending order. Here is an example that measures the timestamp in microseconds and uses it as a key:
MyTreeMap<String, Integer> map = new MyTreeMap<String, Integer>();
for (int i=0; i<n; i++) {
String timestamp = Long.toString(System.nanoTime());
map.put(timestamp, 0);
}Copy the code
System.nanotime returns an integer of type long representing the startup time in microseconds. Every time we call it, we get a bigger number. When we convert these timestamps to strings, they increment lexicographically.
Let’s see what happens when we run it:
Time in milliseconds = 1158
Final size of MyTreeMap = 16384
log base 2 of size of MyTreeMap = 14.0
Final height of MyTreeMap = 16384Copy the code
It takes more than seven times as long to run. Longer. If you want to know why, look at the last height of the tree: 16,384!
Figure 13.1: Binary search tree, balanced (left) and unbalanced (right)
If you think about how PUT works, you can figure out what’s going on. Every time we add a new key, it’s bigger than all the keys in the tree, so we always choose the right subtree, and we always add the new node as the right child of the rightmost node. The result is an “unbalanced” tree containing only the right child nodes.
The height of this tree is proportional to n, not logn, so the performance of get and put is linear, not logarithmic.
Figure 13.1 shows an example of a balanced and unbalanced tree. In a balanced tree, the height is 4 and the total number of nodes is 2^ 4-1 = 15. In an unbalanced tree with the same number of nodes, the height is 15.
13.6 Self-balancing Tree
There are two possible solutions to this problem:
You can avoid adding keys sequentially to a Map. But this is not always possible. You can make a tree that will handle the keys better if it happens to process them in order.
The second solution is better, and there are several ways to do it. The most common is to modify PUT so that it detects when the tree is becoming unbalanced and, if so, rearranges the nodes. Trees with this ability are called “self-balancing trees”. Common self-balancing trees include AVL trees (” AVL “is the inventor’s abbreviation), and red-black trees, which are used by JavaTreeMap.
In our sample code, if we replace it with Java’s MyTreeMap, the random string and timestamp run roughly the same time. In fact, timestamps run faster, even though they are ordered, probably because they take less time.
In summary, binary search trees can achieve get and PUT in logarithmic time, but only add keys in an order that makes the tree sufficiently balanced. Self-balancing trees avoid this problem by doing some extra work each time a new key is added.
You can read more about self-balancing trees at thinkdast.com/balancing.
13.7 Practice more
In the last exercise, you didn’t have to implement Remove, but you might want to try. If a node is removed from the middle of the tree, the remaining nodes must be rearranged to restore the BST characteristics. You can figure out how to do it yourself, or you can read the instructions at thinkdast.com/bstdel.
Deleting a node and rebalancing a tree are similar operations: if you do this exercise, you will have a better understanding of how self-balancing trees work.