The previous two articles covered the source code and theory of hashMap, but today we’ll cover the rest of red-black trees. A good understanding of red-black trees will help us understand hashMaps and other data structures. For example, why did the JDK update the single-linked list in hashMap to a red-black tree?
To understand red-black trees, you need to understand some basic concepts of ordinary binary trees
Parent node and child node, which I’m not going to talk about. I think everyone knows.
If the parent of several child nodes is the same node, they are siblings.
If a node has no parent, it is the root node.
If a node has no children, it is a leaf node
Look at the following binary tree:
Height of node A: is the number of edges from node A to the furthest leaf node. Let’s say the height of node A is 3
The depth of node E: is the number of edges from root node A to node E, which is obviously 2, so the depth of node E is 2
The number of layers of node E: is the depth of node E +1. So here the number of layers of node E is 3.
Look at this picture again:
Figure 1 is called a full binary tree:
Condition 1: All leaf nodes are at the bottom
Condition 2: Each node except the leaf node has left and right child nodes.
Full binary trees are easy to understand, right? I’m not going to say more. Moving on to figure 2,
Figure 2 is called a complete binary tree:
Condition 1: Leaf nodes can only be in the bottom 2 layers.
Condition 2: The leaves of the last layer are arranged to the left.
Condition 3: Except for the last layer, the number of child nodes of other layers must reach the maximum. If the last layer is removed, it must be a full binary tree.
Well, it looks like a complete binary tree is a little bit more complicated, but it doesn’t matter, let’s see a couple of pictures to practice
Look at a few more pictures:
Figure 1: Not a complete binary tree, condition 1 does not meet. Because the second layer has a leaf node
Figure 2: Condition 2 is not met. The last layer of leaf nodes in Figure 2 all move to the right
Figure 3: Condition 3 is not met because it is not a full binary tree when we remove the last layer.
Why do we have a complete binary tree? What does it do? Why does condition 2 require the last layer of leaves to be left? Can’t you go to the right?
First, we need to clarify the concept that binary trees can be implemented using arrays in addition to linked lists.
In fact, the data structure is a data field, and then matched with a left pointer and a right pointer.
So if you do it in arrays, how do you store a binary tree? Take a look at the chart below:
It’s easy to understand. The number is the index of the array. For example, node A is placed at [1], node F at [6], node I at [9], and so on
So after mathematical induction, we can abstract the following rule:
If a binary tree is stored in an array, the following table rules for the nodes:
Rule 1: Subscript 2* I must be the left child node.
Rule 2: Subscript 2* I +1 must be the right child node.
Rule 3: a node with subscript I /2 must be the parent of node I.
Therefore, as long as we put the root node of a binary tree into any position of the array, the whole binary tree can be connected in series using the above rules, and the whole is the array storage method of binary tree.
The advantage of using an array to store binary trees is that you can save the left and right Pointers, which can save memory space
Here’s another picture:
If you look at the binary tree, our array is 1,2,3,4,7,8,9,13.
We wasted 5, 6, 10, 11, 12. We wasted these 5 array Spaces sitting there.
Therefore, the advantage of complete binary tree is that it does not waste any array space, so for complete binary tree, array storage is the most appropriate method, for example, the heap is to use array to do the binary tree storage structure
Sequential traversal: Print the node itself first, then the left subtree, then the right subtree
Middle order traversal: left subtree – node itself – right subtree
Back-order traversal: left subtree – right subtree – node itself
Then these three traversal methods I will not say more, a lot of online search.
With these concepts in mind, we can dig deeper.
What is binary search tree?
Binary search tree: for any node, the value of the left node is less than this node, and the value of the right node is greater than this node.
So, let’s do its lookup, insert operation on a binary search tree. Why do I want to learn this, because for most data structures, it’s a measure of what’s going on
Performance is all about how efficiently you can find and insert data. Arrays and linked lists, for example, are data structures with opposite efficiency of lookup and insertion.
Without further ado, let’s first look at the binary search tree search operation to complete:
In fact, it’s very simple. For a binary search tree, the value of the left node is parent node and the value of the right node.
So to find a value, we simply traverse the root node at the top of the binary search tree:
If the value is larger than the node, the recursion continues in the right subtree, if the value is smaller than the node, the recursion continues in the left subtree. Until I find it.
I’m not going to write this recursive function, but if you know what the algorithm is, you can practice it a little bit.
Binary search tree inserts are a little more complicated:
1. If the value to be inserted is greater than the value of the node, and the right subtree of the node is empty, place it directly on the right node of the node.
2. If it is not empty, continue with step 1 until we find the corresponding node position.
- If the value inserted is smaller than the value of the node, and the left subtree of the node is empty, it is placed directly on the left node of the node. And so on. This is actually the same idea as steps 1-2
It can be seen that recursion is the core idea of binary tree correlation algorithm.
In addition, the most important point, we review the characteristics of the binary search tree, you will find that: if we use the middle-order traversal method print the binary search tree, then the result must be an ordered. Time is order n.
So, binary search tree is also called binary sort tree.
What if there is duplicate data to insert in the binary search tree?
That’s a good question, and here are two solutions that you might be interested in implementing
Plan 1:
Normally, if we have a binary tree implemented as a linked list, then the underlying data structure must be the data field and the left and right Pointers, right? So
We can artificially add a field called the more pointer. What does this pointer do? This pointer is a string of repeated data.
Take the following example:
Isn’t this a bit like the way hash collisions are handled in hashMap?
So, again, what’s the point of inserting duplicate values into a binary search tree?
In actual production, the data structure we store cannot be a single int value, but a single object, so there must be many fields in each object, for example, the object of commodity can have many fields, such as price, name of commodity, picture of commodity, etc. If the price of the item is used as the key value of the binary search tree, then this situation will definitely occur, because the same price of the item can be different goods, such as iPhone and Mate20 are both sold for 7000 yuan, but they are obviously different things
Scheme 2:
When inserting data, if it is found that the value inserted is the same as the value of the node, the value is inserted to the right of the node. (That is, inserting a value greater than a node is the same as inserting a value equal to a node.)
So for this kind of situation, we find the method of is about to change, not find the same value and node withdrew from the recursion, but to find the value of the same node Continue to find his right subtree, find the value of the same and see if there are any values, until a leaf node (note that exit the conditions of the recursive
How efficient is binary search tree? It doesn’t look like a hash table. What’s the point?
In fact, if we look at the binary search tree algorithm, we can get a result that is O(logn) efficient, not nearly as efficient as O(1) efficient. In addition, binary search trees are easy to degenerate into single linked lists in extreme cases, so we all know that the search efficiency of single linked lists is O(N). This one is even slower.
But, even so, binary search trees have some advantages that hash tables don’t:
-
Binary search trees use middle-order traversal to easily sort data in a way that hash tables cannot.
-
The hash table needs to be expanded when it runs into hash conflicts. This expansion operation is time-consuming and the performance is sometimes unstable. Although the performance of binary search trees is also unstable, we have a more special balanced binary search tree that can stabilize the time complexity at O(logn).
-
Binary search trees are pretty simple, the data structure is pretty straightforward, recursively recursively recursively, but hash tables, you know, are super complicated to implement.
So you can see that hashMap and binary search tree have their own advantages. How to use them depends on the actual situation, but you should know what are the advantages and disadvantages of these two data structures and why?
What is this red-black tree that has been called out for so long? How does it relate to the balanced binary search tree we mentioned above?
As we mentioned earlier, ordinary binary search trees degenerate into single linked lists in extreme cases, for example:
If you look at this binary search tree, it’s very unlucky, it’s all on the left, it’s not balanced at all, it looks like a single linked list, it looks like a very slow data structure.
So the so-called balanced binary search tree is to find a way to ensure that in the process of insertion, so that the binary search tree becomes balanced, the so-called balanced is not all on the left or right, the best is to have both sides, and the left and right side of the subtree height is closer to the better the balance. Because only then will our time be stable at order logn.
There are many ways to achieve balanced binary search tree, one of the most famous is the red black tree, for our ordinary writing business code is not professional writing algorithm for the red black tree can almost represent balanced binary search tree, the two are infinitely close.
But we notice that the red-black tree is approximate balance balanced binary search tree, not strict balanced binary tree, interested students can check AVL tree This is strict balanced binary tree, but because this stuff too strict for each insert remove efficiency is too low, so the production environment with few people.
I will not say more about the specific implementation of red black tree, because the implementation of red black tree is more complex, I estimate that unless you are writing algorithms, otherwise most people will never write a red black tree, if you really need to use, in fact, jump table writing is much easier than red black tree. Easy to understand.
So interested students here can search for information about red black trees, open their eyes. For those of you who are not interested, you just need to know that red-black tree is a balanced binary search tree. It is designed to solve the problem that ordinary binary search tree tends to degenerate into a single linked list during data insertion, resulting in a significant decrease in efficiency. His height would be infinitely close to log base 2n, so his search efficiency would be stable at log base n. This is why the implementation of HashMap in older JDKS uses red-black trees instead of single-linked lists to resolve hash collisions.