From Tian Xiaobo’s blog

1. An overview of the

In this article we will take a look at a collection class that is commonly used in everyday development – HashMap. HashMap first appeared in JDK 1.2, and the underlying implementation is based on the hash algorithm. A HashMap allows null keys and null values, with a null key hash value of 0 when calculating the hash value of a hash key. A HashMap does not guarantee the order of key-value pairs, which means that the order of key-value pairs can change after certain operations. Also, it is important to note that HashMap is a non-thread-safe class, which can be problematic in a multithreaded environment.

In this article, I’ll look at common methods, important attributes, and related methods in HashMap. It should be noted that there are many points that can be analyzed in the HashMap source code. It is difficult to cover them one by one in this article, please forgive me.

Principle 2.

The last section mentioned that the underlying implementation of HashMap is based on hash algorithm, which is divided into hash redetection and chain pulling. HashMap, on the other hand, uses a pull-down hash algorithm and introduced red-black tree optimization for long linked lists in JDK 1.8. The data structure is shown as follows:

In the case of the chaining hash algorithm, the data structure is composed of arrays and linked lists (or trees). When performing operations such as adding or deleting, locate the bucket where the element resides first, and then locate the element from the linked list. For example, if we want to check whether the above structure contains element 35, the steps are as follows:

  1. Positioning elements35In the position of the barrel,index = 35 % 16 = 3
  2. in3Continue to search the list pointed to by bucket number and find 35 in the list.

This is how the underlying data structure of HashMap works. The basic operation of HashMap is a wrapper around the basic operation of the chain-pulled hash algorithm. The difference is that the red-black tree was introduced in JDK 1.8. The underlying data structure changed from array + list + red-black tree to array + list + red-black tree. Well, the principle part of the first talk about this, then talk about the source code implementation.

3. Source code analysis

The source version analyzed in this article is JDK 1.8. Compared to JDK 1.7, JDK 1.8 has some optimizations for HashMap. For example, red-black trees are introduced to solve the problem of low efficiency of long linked lists. Rewrite the resize method, remove alternative hashing related methods, avoid recalculating key hash, etc. This article does not attempt to analyze these optimizations, however, but rather the common methods used in HashMap and some important attributes and related methods. If you are interested in red-black trees, you can read my other article – red-black Tree analysis in detail.

3.1 Construction method

3.1.1 Analysis of construction methods

There are only four constructors for a HashMap. The HashMap constructor does a relatively simple job of initializing important variables such as loadFactor and threshold. The underlying data structure delays initialization until key-value pairs are inserted. HashMap constructs are as follows:

/** constructor 1 */ public HashMap() {this.loadfactor = DEFAULT_LOAD_FACTOR; Public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } /** public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); }Copy the code

Of the above four constructors, the first is probably the one you use most often. The first constructor is simple and simply sets the loadFactor variable to the default value. Constructor 2 calls constructor 3, and constructor 3 still just sets some variables. Constructor 4 copies a Map from another Map into its own storage structure, which is not very common.

This is a brief introduction to constructors. There is not much in constructors themselves, so I won’t mention it. Next, a few variables initialized by the constructor.

3.1.2 Initial Capacity, Load factor, and Threshold

In general, we create a HashMap using the no-argument constructor. However, when we have time and space complexity requirements, using the default value may not meet our requirements, in which case we need to manually tune parameters. In the HashMap constructor, we can adjust two parameters, initialCapacity and loadFactor. By setting these two parameters, you can further influence the threshold size. However, the initial threshold is only calculated by initialCapacity through the shift operation. Their roles are as follows:

The name of the use
initialCapacity HashMap initial capacity
loadFactor The load factor
threshold This parameter specifies the maximum number of key-value pairs that a HashMap can hold. If the number exceeds this value, you need to expand the number

The relevant codes are as follows:

/** The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; /** The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.6f; final float loadFactor; /** The next size value at which to resize (capacity * load factor). */ int threshold;Copy the code

If you look at the source code, you’ll see that initialCapacity is not defined in the HashMap. As you can see from the parameter name, this variable represents an initial capacity, which is only used once in the constructor. There is no need to define a variable to hold it. But if you look closely at the HashMap constructor above, you’ll see that the data structure that stores key-value pairs is not initialized in the constructor. This raises the question, since it is called initial capacity, but ultimately does not initialize the data structure, so what is the use of passing this parameter? I’m not going to explain that, but I’ll leave you in suspense and I’ll explain it later.

By default, the HashMap has an initial capacity of 16 and a load factor of 0.75. There is no default threshold because the threshold can be calculated by multiplying the capacity by the loadFactor (as described in the comment), i.e., threshold = capacity * loadFactor. But when you look closely at constructor 3, you see that the threshold is not calculated by the formula above, but by a method. Does this mean that the threshold variable is incorrectly commented? Or is it just a special treatment here, and the rest of the equation? About this question, here is not explained first, after the analysis of capacity expansion method, to explain this problem. Next, let’s take a look at the initialization threshold method looks like, source code is as follows:

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

The code above is a bit ugly, but I didn’t know what it was doing when I first looked at it anyway. But then I drew on paper and knew what it was for. It all boils down to one sentence: find the lowest power of two greater than or equal to cap. Why? I’ll explain that later. Let’s take a look at the tableSizeFor diagram:

The table above shows the calculation process of tableSizeFor method. Cap = 536,870,913 = 229 + 1. After multiple calculations, n + 1 = 1,073,741,824 = 230. It should be easy to understand the use of this method from the diagram, but I won’t go into it here.

After the calculation of the initial threshold, let’s talk about the loadFactor. For a HashMap, the load factor is an important parameter that reflects the usage of the HashMap bucket array (assuming that the key-value pairs are evenly distributed). By adjusting the load factor, a HashMap can behave differently in both time and space complexity. When we turn down the load factor, the HashMap can hold fewer logarithms of keys. When you re-store key-value pairs in a new bucket array, collisions between keys decrease and the list becomes shorter. At this point, the efficiency of HashMap operations such as add, delete, change and check will become higher. Here is a typical space for time. Conversely, if you increase the load factor (which can be greater than 1), the number of key-value logarithm that a HashMap can hold increases, resulting in high space utilization, but also a high collision rate. This means that as lists get longer, they become less efficient, in the case of time for space. As for how to adjust the load factor, this depends on the use scenario. In general, we use the default values.

3.2 find

The HashMap lookup operation is relatively simple. The lookup procedure is the same as described in the principles section, that is, locate the bucket where the key-value pair resides first, and then search the linked list or red-black tree. To complete the search, the code for this operation is as follows:

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // 1. Locate the bucket where the key-value pair resides if ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) {// first is of type TreeNode, If (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash, key); / / 2. To find the do list {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

The core logic of the search is encapsulated in the getNode method, getNode method source I have written some comments, should not be difficult to understand. Let’s take a look at the first step of the search process – determine the location of the bucket, which is implemented as follows:

// index = (n - 1) & hash
first = tab[(n - 1) & hash]
Copy the code

I’m going to hash out the position of the bucket in the bucket array by (n-1) &hash. The size of the bucket array in a HashMap is always a power of 2, in which case (n-1) &hash is equivalent to mod length. But mod is not as efficient as bit, so (n-1) &hash is also a small optimization. For example, let’s say hash is equal to 185 and n is equal to 16. The calculation process is shown as follows:

The above calculation is not complicated, so I won’t go into it here.

In the source code above, in addition to finding the relevant logic, there is also a method to compute the hash. The source code for this method is as follows:

Static final int hash(Object key) {int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }Copy the code

The logic of this method seems to be to recalculate the hash using bitwise operations, so why do we do that here? Why not just use the hash generated by the key’s hashCode method? So if you think about it, let me write the answer down here.

There are two advantages to doing this, and I’ll explain them briefly. Let’s look again at the graph above, where the hash is generated by the hashCode of the key. When calculating the remainder, since n is relatively small, only the lower 4 bits of hash are involved in the calculation, and the calculation of the higher bits can be considered invalid. As a result, the calculation result is only related to the low level information, and the high level data does not play a role. To deal with this flaw, we can xOR the upper four bits of the hash in the figure above with the lower four bits, namely hash ^ (hash >>> 4). In this way, the high level data and the low level data are xor, so as to increase the randomness of the low level information and make the high level data participate in the calculation in a disguised way. The calculation process is as follows:

In Java, the hashCode method produces a hash of type int, 32 bits wide. The first 16 bits are high, the last 16 bits are low, so we have to move 16 bits to the right.

While this is one benefit of recalculating the hash, another benefit of recalculating the hash is that it can increase the complexity of the hash. When we overwrite a hashCode method, we might write a hashCode method that is poorly distributed, resulting in a high hash collision rate. By shifting and xOR operations, you can make the hash more complex, affecting the distribution of the hash. This is why a HashMap does not directly use the original hash of the key object.

3.3 traversal

Traversal, like find, is one of the most frequently used operations. To traverse a HashMap, we typically do the following:

for(Object key : map.keySet()) {
    // do something
}
Copy the code

or

for(HashMap.Entry entry : map.entrySet()) {
    // do something
}
Copy the code

As you can see from the code snippet above, you generally iterate over the key or Entry collection of a HashMap. The collection generated by foreach traversing the keySet method in the above code snippet is converted to iterator traversal at compile time, equivalent to:

Set keys = map.keySet();
Iterator ite = keys.iterator();
while (ite.hasNext()) {
    Object key = ite.next();
    // do something
}
Copy the code

In the process of traversing a HashMap, you will find that the traversal results are in the same order for multiple times. But this order and insertion order are generally inconsistent. What are the reasons for this behavior? Think about why. Let me first post the traversal related code as follows:

public Set<K> keySet() { Set<K> ks = keySet; if (ks == null) { ks = new KeySet(); keySet = ks; } return ks; } /** * KeySet extends AbstractSet<K> {public final int size() {return size; } public final void clear() { HashMap.this.clear(); } public final Iterator<K> iterator() { return new KeyIterator(); } public final boolean contains(Object o) { return containsKey(o); } public final boolean remove(Object key) { return removeNode(hash(key), key, null, false, true) ! = null; } /** * final class implements Iterator<K> {public final K next() { return nextNode().key; } } abstract class HashIterator { Node<K,V> next; // next entry to return Node<K,V> current; // current entry int expectedModCount; // for fast-fail int index; // current slot HashIterator() { expectedModCount = modCount; Node<K,V>[] t = table; current = next = null; index = 0; if (t ! Do {} while (index < t.length && (next = t[index++]) == null); } } public final boolean hasNext() { return next ! = null; } final Node<K,V> nextNode() { Node<K,V>[] t; Node<K,V> e = next; if (modCount ! = expectedModCount) throw new ConcurrentModificationException(); if (e == null) throw new NoSuchElementException(); if ((next = (current = e).next) == null && (t = table) ! Do {} while (index < t.length && (next = t[index++]) == null); } return e; } // omit some code}Copy the code

To iterate over all keys, first obtain the KeySet object and then iterate through the iterator of the KeySet. The KeyIterator class inherits from the HashIterator class, and the core logic is encapsulated in the HashIterator class. The logic of a HashIterator is not complicated. At initialization, the HashIterator first finds the bucket containing the list node reference from the bucket array. It then iterates through the list to which the bucket points. Once traversal is complete, continue looking for the next bucket that contains the list node reference and continue traversal. If not found, end traversal. For example, suppose we walk through the structure of the following graph:

When the HashIterator is initialized, it first iterates through the bucket array to find the bucket that contains the list node reference, which is bucket number 3. The nextNode method then iterates through the list to which the bucket points. After iterating through bucket 3, the nextNode method continues to find the next bucket that is not empty, which corresponds to bucket 7 in the figure. The process is similar until the last bucket is iterated. This is the flow of the core logic of HashIterator, as shown below:

The final result is 19 -> 3 -> 35 -> 7 -> 11 -> 43 -> 59. The test code is as follows:

/** * Should be tested under JDK 1.8, Public class HashMapTest {@test public void testTraversal() {HashMap<Integer, String> map = new HashMap(16); map.put(7, ""); map.put(11, ""); map.put(43, ""); map.put(59, ""); map.put(19, ""); map.put(3, ""); map.put(35, ""); System.out.println(" result: "); for (Integer key : map.keySet()) { System.out.print(key + " -> "); }}}Copy the code

The traversal results are as follows:

At the end of this section, two questions for you. In JDK 1.8, red-black tree optimization was introduced to avoid the impact of excessively long linked lists on HashMap performance. But in the above source code is not found in the red black tree traversal related logic, this is why? How do you iterate over lists that are converted to red-black trees? You can think about it first and then go to the source code or later sections of this article to find out.

3.4 insert

3.4.1 Insert Logical analysis

From the analysis of the previous two sections, you should have a good idea of the underlying data structure of the HashMap. Without me telling you, you should get an idea of what the HashMap insertion process looks like. You must first locate the bucket to which the key-value pair to be inserted belongs, and then check whether the bucket is empty. If null, store the key-value pair. If it is not empty, you either need to attach the key value to the last position in the list or update the key value pair. This is the process of inserting a HashMap. Of course, don’t cheer up yet. This is a simplified version of the insert process; the real insert process is much more complicated. First, a HashMap is a variable-length collection, so you need to consider scaling. Second, in JDK 1.8, HashMap introduced red-black tree optimization of long lists. Here we need to consider how long lists need to be optimized and what the optimization process is. After introducing these two problems, you can see that what was once a simple operation is now a little bit more complicated. In this section, I will first analyze the source code for insert operations, expansion, treification, and other tree-structure-related operations, followed by two separate summaries. Next, take a look at the source of the insert operation:

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / initialize the bucket array table, the table is delayed to insert new data to initialize the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If the bucket does not contain key-value node references, If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / if the key in the hash values, and node is equal to the first key values in the list of nodes, will point to the key/value pair if e (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the bucket reference type is TreeNode, Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); if (e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeIFy_threshold-1) // -1 for 1st treeifyBin(TAB, hash); break; } / / condition is true, according to the current list contains to insert the key/value pair, termination of traverse the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; If (e! = null) { // existing mapping for key V oldValue = e.value; // onlyIfAbsent Indicates whether to update the key-value pair's value if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); // If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

The entry method for the insert operation is put(K,V), but the core logic is in the V putVal(int, K,V, Boolean, Boolean) method. The putVal method does a few things:

  1. If the bucket array table is empty, initialize the table by expanding the capacity
  2. Check whether the key-value pair to be inserted already exists, if so, determine whether to replace the old value with the new value according to the condition
  3. If not, the key-value pairs are linked to the list, and the list length determines whether to turn the list into a red-black tree
  4. Check whether the number of key-value pairs exceeds the threshold. If yes, expand the capacity

That’s the logic for HashMap inserts, and it’s not very complicated, so I won’t go into it here. Next, let’s analyze the expansion mechanism.

3.4.2 Capacity Expansion Mechanism

In Java, arrays are fixed in length, which means they can only store a fixed amount of data. But a lot of times during development, we don’t know how big the array should be. Build small enough, build big enough, resulting in waste. It would be nice if we could implement a variable-length array and allocate space on demand. Fortunately, we don’t have to implement variable-length arrays ourselves; the Java Collections framework already implements variable-length data structures. Examples include ArrayList and HashMap. For such array-based variable-length data structures, scaling is a very important operation. Let’s talk about the HashMap expansion mechanism.

Before going into detail, here’s some background information about capacity expansion:

In a HashMap, the bucket array is a power of 2, and the threshold size is the product of the bucket array length and the load factor. If the number of key/value pairs in a HashMap exceeds the threshold, expand the number.

The capacity expansion mechanism of a HashMap is different from that of other varium-length sets. The HashMap is expanded by twice the length of the current bucket array, and the threshold is also doubled. (If the threshold overflow returns to zero during calculation, the threshold formula is used to recalciate.) After expansion, the key/value pairs are recalculated and moved to their proper positions. This is the general process of HashMap expansion. Let’s take a look at the implementation:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {// If the table capacity exceeds the maximum capacity, If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; Double threshold} else if (oldThr > 0) // Initial capacity was placed in threshold To assign the value of threshold to newCap, * HashMap uses the threshold variable to temporarily save the value of the initialCapacity parameter */ newCap = oldThr; // Zero initial threshold using defaults /* * When the non-parameter constructor is called, the amount of the bucket array is the default. * The threshold is the default capacity multiplied by the default load factor */ newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// If the old bucket array is not empty, the bucket array is iterated and the key-value pair is mapped to the new bucket array. j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; Else if (e instanceof TreeNode) // Split the red-black tree ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Walk through the list and group the list nodes in the same order do {next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // Map the grouped list to the new bucket if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

The above source code is a bit long, I hope you patience to understand its logic. The above source code does three things:

  1. Calculates the capacity of the new bucket array, newCap, and the new threshold, newThr
  2. Create a new bucket array based on the calculated newCap. This is where the bucket array table is initialized
  3. Remap the key-value pair node to the new bucket array. If the node is of TreeNode type, you need to split the bonus black tree. If it is a common node, the nodes are grouped in the same order.

Of the three points in the column above, creating a new bucket array is a single line of code, needless to say. Next, let’s talk about the first and third points, starting with the newCap and newThr calculations. The calculation process corresponds to the first and second conditional branches of resize source code, as follows:

If (oldCap >= MAXIMUM_CAPACITY) {if (oldCap >= MAXIMUM_CAPACITY) { } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) {... } } else if (oldThr > 0) {... } else {... } // The second conditional branch if (newThr == 0) {... }Copy the code

Through these two condition branches to judge different situations, and then calculate different capacity value and threshold value. They cover the following situations:

A branch:

conditions coverage note
oldCap > 0 The bucket array table has been initialized
oldThr > 0 Threshold > 0 and the bucket array is not initialized This happens when the HashMap(int) and HashMap(int, float) constructors are called, in which case newCap = oldThr, and newThr is evaluated in the second conditional branch
oldCap == 0 && oldThr == 0 The bucket array is not initialized and threshold is 0 This happens when the HashMap() constructor is called.

So let’s separate out the oldThr > 0 case. In this case, oldThr is assigned to newCap, equivalent to newCap = Threshold = tableSizeFor(initialCapacity). The initialCapacity parameter we passed in during initialization is passed through threshold and finally assigned to newCap. This answers the question: The initialCapacity parameter is not saved, so how does it participate in the initialization of the bucket array?

Nested branches:

conditions coverage note
oldCap >= 230 The bucket array capacity is greater than or equal to the maximum bucket capacity 230 In this case, no expansion is required
newCap < 230 && oldCap > 16 The capacity of the new bucket array is smaller than the maximum value, and the capacity of the old bucket array is greater than 16 In this case, the new threshold newThr = oldThr << 1, and the shift may lead to overflow

Here is a brief description of overflow caused by shift. When the loadFactor decimal is 0 and the integer bit is divisible by 2 and greater than or equal to 8, newThr overflow may return to zero in a calculation. See below:

Branches 2:

conditions coverage note
newThr == 0 The first conditional branch does not evaluate newThr or the nested branch causes newThr overflow to zero during the evaluation

After the calculation process of newCap and newThr, we will analyze the process of key-value node remapping.

In JDK 1.8, remapping nodes takes into account node types. For tree nodes, you need to remove the bonus black tree before mapping. For list type nodes, you need to group lists before mapping them. It should be noted that after grouping, the relative positions of nodes in the group remain unchanged. The logic of red-black tree splitting will be explained in the next section, starting with a look at how linked lists are grouped.

We all know that when inserting a node into an underlying data structure, we usually calculate the bucket position through modular arithmetic and then put the node into the bucket. In fact, we can think of remapping as an insertion operation. In JDK 1.7, that’s exactly what happened. But in JDK 1.8, the process has been optimized to make it a little more logically complex. Before going into detail, let’s review the hash mod process:

In the figure above, the bucket array size n = 16, hash1 is not equal to hash2. But because only the last four bits are complicating, it’s the same. When the bucket array is expanded and n changes from 16 to 32, remap the hash value above:

After expansion, the number of bits involved in modular operations has changed from 4 to 5. Since the 5th bit of the two hashes are different, the results of the two hashes are also different. The above calculation is not too hard to understand, so let’s move on.

Assume that the bucket array in the figure above is expanded, and the capacity after expansion is n = 16, and the remapping process is as follows:

Iterate through the linked list in turn and compute the values of nodes hash & oldCap. As shown in the figure below

If the value is 0, point loHead and loTail to the node. If followed by a node hash & oldCap of 0, the node is chained to the list to which loHead points and loTail points to that node. If the value is non-zero, make hiHead and hiTail point to the node. When the traversal is complete, you might end up with two linked lists, at which point you are done grouping linked lists:

Finally, store the two links in corresponding buckets to complete capacity expansion. The diagram below:

It can be seen from the figure above that the order of nodes in the two linked lists remains unchanged after remapping, and the order before expansion is still maintained. JDK 1.8 HashMap expansion code. In addition, HashMap expansion is more efficient in JDK 1.8 than in previous versions. If you look at the JDK 1.7 source code, it introduces random seeds in the hash process to prevent denial-of-service attacks caused by hash collisions. To increase the randomness of the hash so that the key-value pairs are evenly distributed in the bucket array. During capacity expansion, methods determine whether new random seeds need to be generated based on the capacity and recalculate the hash of all nodes. In JDK 1.8, this approach was replaced by the introduction of red-black trees. In this way, multiple hash computations are avoided and capacity expansion efficiency is improved.

That’s all for this section. Next, we’ll talk about the conversion process between linked lists and red-black trees.

3.4.3 Tree-linked list, red-black tree chain and separation

JDK 1.8 has improved the HashMap implementation. The biggest improvement is the introduction of red-black trees to handle frequent collisions, which increases code complexity. For example, previously it was only necessary to implement a set of methods for linked list operations. After introducing red-black tree, we need to implement red-black tree related operations. Red – black tree is a kind of self – balanced binary search tree. This article is not intended to cover red-black trees; it will only cover the considerations of linked list tree. As for the detailed introduction of red black trees, if you are interested, you can refer to my other article – detailed analysis of red black trees.

Before we expand the description, let’s post the code for treification as follows:

static final int TREEIFY_THRESHOLD = 8; Static final int MIN_TREEIFY_CAPACITY = 64; static final int MIN_TREEIFY_CAPACITY = 64; static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); }} /** * Final void treeifyBin(Node<K,V>[] TAB, int hash) {int n, index; Node<K,V> e; / / bucket array capacity is less than MIN_TREEIFY_CAPACITY, priority if an expansion rather than tree (TAB = = null | | (n = TAB. Length) < MIN_TREEIFY_CAPACITY) the resize (); else if ((e = tab[index = (n - 1) & hash]) ! TreeNode<K,V> hd = null, tl = null; TreeNode<K,V> hd = null; TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) ! = null); If ((TAB [index] = hd)! = null) // Convert the tree to a red-black tree hd.treeify(TAB); } } TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { return new TreeNode<>(p.hash, p.key, p.value, next); }Copy the code

During capacity expansion, treification must meet two conditions:

  1. The length of the linked list must be greater than or equal to TREEIFY_THRESHOLD
  2. The capacity of the bucket array must be greater than or equal to MIN_TREEIFY_CAPACITY

The first condition is easier to understand, but I won’t say it here. Here are the reasons for adding the second condition. Personally, I think the reasons are as follows:

When the bucket array size is small, the collision rate of key-value on node hash may be high, resulting in long list length. This is a time to prioritize scaling, not immediately tree. After all, the high collision rate is caused by the small size of the bucket array, which is the main cause. When the capacity is small, capacity expansion can be prioritized to avoid some unnecessary tree process of the column. In addition, if the bucket capacity is small, capacity expansion is frequent. During capacity expansion, the black tree needs to be removed and remapped. So turning a long list into a red-black tree is a thankless task when bucket capacity is small.

Back in the source code above, let’s take a look at the treeifyBin method. The main function of this method is to convert a common list into a list of TreeNode nodes, and finally call Treeify to turn the list into a red-black tree. TreeNode inherits from the Node class, so TreeNode still contains a next reference, and the order of the nodes in the original list is finally saved through the next reference. We assume that the structure of the linked list before tree is as follows:

HashMap was not designed with red-black tree optimization in mind. So there is no requirement that the key class implement the Comparable interface or provide a comparator, as TreeMap does. However, since the treification process requires the comparison of the sizes of two key objects, it becomes a tricky problem to compare the sizes of keys without implementing the Comparable interface for key classes. To solve this problem, HashMap does three steps to ensure that the two keys can be compared, as follows:

  1. Compares the size of hashes between keys, and continues if the hashes are the same
  2. Checks if the key class implements the Comparable interface, and if the implementation calls the compareTo method for comparison
  3. If you still can’t compare the size, you need to arbitrate using tieBreakOrder.

It’s a tie break. It’s a tie break.

Through the above three comparisons, we can finally compare which is bigger and which is smaller. After comparing the size, we can construct a red-black tree. The red-black tree is finally constructed as follows:

The orange arrow represents the TreeNode’s next reference. Prev references are not drawn due to space limitations. As can be seen, after the list is transformed into a red-black tree, the order of the original list will still be referenced and retained (the root node of the red-black tree will be moved to the first place in the list), and we can still traverse the red-black tree in the same way as traversing the list. This structure sets the stage for the following red-black tree segmentation and red-black tree transformation into a linked list, so let’s move on.

Red black tree split

After capacity expansion, common nodes need to be remapped, and red-black tree nodes are no exception. As a general idea, we can first turn a red-black tree into a linked list and then remap the linked list. This is the easiest way to think about it, but you lose some efficiency. Different from the above processing, HashMap implementation idea is good good (good good please call me the advertising fee). As mentioned in the previous section, when turning a regular list into a red-black tree, the HashMap preserves the node order of the original list with two additional references, Next and prev. So when you remap a red-black tree, you can do it like a linked list. In this way, the mapping of red-black tree into linked list is avoided, which improves efficiency virtually.

The above is the red black tree split logic, the following look at the concrete implementation of it:

Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { TreeNode<K,V> b = this; // Relink into lo and hi lists, preserving order TreeNode<K,V> loHead = null, loTail = null; TreeNode<K,V> hiHead = null, hiTail = null; int lc = 0, hc = 0; /* * The red-black tree node still retains the next reference, so you can still traverse the red-black tree as a linked list. */ for (TreeNode<K,V> e = b, next; e ! = null; e = next) { next = (TreeNode<K,V>)e.next; e.next = null; if ((e.hash & bit) == 0) { if ((e.prev = loTail) == null) loHead = e; else loTail.next = e; loTail = e; ++lc; } else { if ((e.prev = hiTail) == null) hiHead = e; else hiTail.next = e; hiTail = e; ++hc; } } if (loHead ! If (lc <= UNTREEIFY_THRESHOLD) TAB [index] = lohead.untreeify (map); if (lc <= UNTREEIFY_THRESHOLD) TAB [index] = loHead. else { tab[index] = loHead; /* * hiHead == null * all nodes remain in their original positions and the tree structure remains unchanged after capacity expansion */ if (hiHead! = null) loHead.treeify(tab); } // if (hiHead! = null) { if (hc <= UNTREEIFY_THRESHOLD) tab[index + bit] = hiHead.untreeify(map); else { tab[index + bit] = hiHead; if (loHead ! = null) hiHead.treeify(tab); }}}Copy the code

It can be seen from the source that the logic of remapping a red-black tree is basically the same as the logic of remapping a linked list. The difference is that, after remapping, the red-black tree is split into two linked lists of Treenodes. If the list length is less than UNTREEIFY_THRESHOLD, the list is converted to a normal list. Otherwise, re-tree the TreeNode list based on the conditions. For example, if the red-black tree in the figure above is remapped after capacity expansion, the mapping result is as follows:

Red black tree chain

As mentioned earlier, red-black trees retain the original list node order. With that in mind, it’s much easier to convert a Red-black tree to a linked list. You just need to convert the TreeNode list to a linked list of type Node. The relevant codes are as follows:

final Node<K,V> untreeify(HashMap<K,V> map) { Node<K,V> hd = null, tl = null; // Iterate over the TreeNode list and replace for (Node<K,V> q = this; q ! = null; Q = q.ext) {// Replace the Node type Node<K,V> p = map.replacementNode(q, null); if (tl == null) hd = p; else tl.next = p; tl = p; } return hd; } Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) { return new Node<>(p.hash, p.key, p.value, next); }Copy the code

The above code is not complicated and not difficult to understand, so I won’t go into it here. To this capacity expansion related content said over, DO not know you understand.

3.5 delete

If you’ve made it through this section, you can relax. That is, of course, if you don’t look at the red-black tree deletion. However, red-black trees are not the focus of this article and will not be covered in this section, so don’t worry.

Deleting a HashMap is not complicated and requires only three steps. The first step is to locate the bucket, the second step is to traverse the list and find nodes with equal key values, and the third step is to delete the nodes. The relevant source code is as follows:

public V remove(Object key) { Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) ! = null && (n = tab.length) > 0 && // 1. Locate the bucket (p = TAB [index = (n-1) & hash])! = null) { Node<K,V> node = null, e; K k; V v; / / if the key value and list the first node are equal, then the node to the node if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) {// If TreeNode is used, If (p instanceof TreeNode) node = ((TreeNode<K,V>)p).getTreenode (hash, key); The else {/ / 2. Traverse the list, find to do {if delete node (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); }} // 3. Delete the node and fix the linked list or red-black tree if (node! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) { if (node instanceof TreeNode) ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable); else if (node == p) tab[index] = node.next; else p.next = node.next; ++modCount; --size; afterNodeRemoval(node); return node; } } return null; }Copy the code

The delete operation itself is not complicated, with the previous foundation, it is not difficult to understand, there is no more to say here.

3.6 Other Details

The previous section looked at the common operations of HashMap and the associated source code, but this section adds a bit more.

Table variable modified by TRANSIENT

If you read the HashMap source code carefully, you will see that the bucket array table is declared transient. Transient means volatile. In Java, variables decorated with this keyword will not be serialized by the default serialization mechanism. Back in the source code, consider a question: The bucket array table is an important data structure underlying HashMap. How can someone restore it without serializing it?

For a brief explanation, HashMap does not use the default serialization mechanism. Instead, it customizes the serialization by implementing two methods: readObject/writeObject. There’s a reason for that, but what’s stored in a HashMap? Needless to say, you know it’s a key-value pair. So once we serialize the key-value pair, we can reconstruct the HashMap from the key-value pair data. Some friends may want to serialize the table can not be in one step, then directly restore not on the line? It is reasonable to think so. But there are two problems with serializing talbe:

  1. In most cases, tables cannot be fully stored, so serializing unused portions wastes space
  2. The same key-value pair may have different bucket locations under different JVMS, and deserializing the table under different JVMS may cause errors.

Of the above two questions, the first one is easier to understand, and the second one is explained. The first step in the get/put/remove methods of a HashMap is to find the bucket location of the key based on the hash. However, if the key does not overwrite the hashCode method, the hash calculation will eventually call the hashCode method of Object. However, the hashCode method in Object is native. Different JVMS may have different implementations and generate different hashes. In other words, the same key on different platforms may generate different hashes. In this case, operations on the same table may cause problems.

Given the above, you can see why HashMap does not serialize tables.

3.7 summarize

This chapter takes a close look at the code associated with common HashMap operations and adds some additional details at the end. In this chapter, the content of the insert operation section is the most, mainly because the insert operation involves a lot of points, one ring links one ring. Including but not limited to “table initialization, expansion, tree”, etc. Generally speaking, it is very difficult to analyze the insert operation. Fortunately, the analysis is done.

Although this chapter is quite large, not all points of HashMap have been analyzed. For example, add, delete and search operations of red-black trees. Of course, in my personal opinion, the above analysis is enough. After all, you are the user of the library, not the designer, so you don’t have to understand every detail. So skip the details if you don’t understand them, and for our development purposes, know the basics of HashMap.

Well, that’s the end of this chapter.

4. Write at the end

It’s a relief to write this article, which took me more than a week to write. Before I wrote this article, MY understanding of HashMap was limited to a theoretical level, not a deep one. At first, I didn’t think there was much to write about hashMaps, since hashmaps are something you already know. But when I read the HashMap source code, I realized I was wrong. It is not that there is nothing to write about, but that there are too many points to write about. The implementation of HashMap in JDK 1.8 is much more complex than in previous versions, and it is difficult to understand many of the details. Only understand their own is not enough, but also to write out, the difficulty is greater, this article is basically read the source code while writing the state of completion. Due to the limited time and capacity, and the relatively large length of the article, it is difficult to ensure that the analysis process and the illustration are not wrong. If there is any mistake, I hope you point it out, I will timely modify, here thank you first.

Well, that’s all for this article, thank you for reading!

From Tian Xiaobo’s blog