HashMap uses a linked list method to avoid hash collisions (with the same hash value) and converts the list to a red-black tree when its length is greater than TREEIFY_THRESHOLD (default: 8). When the value is less than or equal to UNTREEIFY_THRESHOLD (the default value is 6), the system degenerates back to the linked list to achieve performance balancing. The following figure shows the data structure of HashMap (array + linked list + red-black tree)


The data structure of the HashMap

Problems with HashMap concurrency

1. Multithreading put may cause element loss

The main problem is the new Entry (hash, key, value, e) of the addEntry method. If both threads get E at the same time, then their next element will be E, and then one will succeed and the other will lose when assigning to the table element.

2. Multithreading put may cause an infinite GET loop

The cause of the dead loop is that the expansion of HashMap (resize function) is triggered when the multi-thread performs the PUT operation, and the two nodes of the linked list form a closed loop, resulting in the dead loop. The following figure shows the PUT operation process in JDK8. For details, see the source code.


The PUT operation of JDK8

Single-threaded resize process


The Resize process for a HashMap

First let’s post the key code of transfer() in resize:

while(null ! = e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }Copy the code

And we can simplify it even more, because the I in the middle is where the new table is, so we can skip it. Simplified code:

while(null ! = e) { Entry<K,V> next = e.next; e.next = newTable[i]; NewTable [I] = e; newTable[I] = e; e = next; }Copy the code

Remove some redundant code from this process and the meaning is clear:

  1. Entry

    next = e.next; // If you want to transfer the head pointer, you must save the next node, otherwise the list will be lost after the transfer
    ,v>
  2. e.next = newTable[i]; // e is inserted at the head of the list, so use e.next to point to the first element of the new Hash. Because it’s order N.
  3. newTable[i] = e; // Now the new Hash table’s head pointer still points to the first element before e was moved, so we need to change the new Hash table’s head pointer to e
  4. e = next; // move e to the next node

Resize process during concurrency

  1. Let’s say that as soon as we finish our threade.next = newTable[i];It was suspended by dispatch


Concurrent resize process

And our thread two execution is done. So we have something like this.

Because Thread1’s E points to key(3) and next points to key(7), it points to thread 2’s rehash of the linked list. We can see that the order of the linked list is reversed.

  1. Thread one is scheduled back for execution.
    1. Run newTalbe[I] = e.
    2. Executing e = next causes e to point to key(7).
    3. Next = e.next causes next to point to key(3).


Concurrent resize process

  1. Everything is fine. The thread resumed work. Remove key(7) and place it first in newTable[I], then move e and next down.


Concurrent resize process

  1. Circular links appear. E.ext = newTable[I] causes key(3). Next points to key(7). Note that key(7).next already points to key(3), and the circular list appears.


Concurrent resize process

So, when our thread calls hashtable.get (11), the tragedy occurs — Infinite Loop.

Optimization of HashMap in JDK8

1. Optimization of long linked lists

In JDK8, if the length of the list is greater than or equal to 8, the list is converted to a red-black tree; When the length is less than or equal to 6, it becomes a linked list again

The average search length of a red-black tree is log(n). At length 8, the average search length is 3. If you continue to use a linked list, the average search length is 8/2=4, which makes it necessary to convert to a tree. If the list length is less than or equal to 6, 6/2=3, it is also very fast, but the time to convert to tree structure and spanning tree is not too short. There are also options 6 and 8, with a difference of 7 between them to prevent frequent transitions between lists and trees. For example, if the number of linked lists is more than 8, the linked list will be converted into a tree structure; if the number of linked lists is less than 8, the tree structure will be converted into a linked list. If a HashMap keeps inserting and deleting elements, and the number of linked lists is around 8, it will frequently convert tree to linked list and linked list to tree, and the efficiency will be very low.

Note that when the array length is less than 64 (constant), double the array length, otherwise convert the list to a tree. It’s not always directly tree-like.

In addition, in JDK7, operands are moved to the right during hash calculation, which is complicated. The purpose of the calculation is to include high values in the calculation and reduce hash collisions. In JDK8, linked lists can be turned into red-black trees, making it easy to hash. The following diagram shows hash and index calculations in JDK8.


Hash and index calculations for JDK8

2. Optimization of hash collision insertion mode

When a hash collision occurs, JDK7 inserts at the head of the list and JDK8 inserts at the tail. The header insertion method is the fastest, and you can find the insert location directly. However, before JDK8, the hashMap insertion method would have an infinite loop in concurrent scenarios. The JDK8 starts the hashMap list as a red-black tree after the length of the node reaches 8. This allows for a much smaller number of iterations as the length of the node increases behind the array (otherwise you would have to iterate through all of them at a time), and also avoids the previous problem of looping lists. And if it’s a red-black tree, it’s impossible to do a head plug.

3. Optimization of capacity expansion mechanism

In JDK7, all linked lists are rehash; In JDK8, I = (n-1) &hash is used to find the position of the element in putVal. Let’s assume that n bits are left after the key is mod before capacity expansion. After the capacity is expanded, the capacity is doubled, so the key mod has n+1 bits. In this n+1 bit, if the first bit is 0, then the position of the key will remain the same before and after the expansion. If the n+1 bit starts with a 1, then it’s not the same as before, then the key should be where it was plus the length of the entire array. This reduces the cost of moving all the data. (Read it slowly for two times and figure it out. It’s easier to understand without looking at the picture.)


Rehash of JDK8

Common FAQ

HashMap Capacity expansion conditions

Look at the JDK7 source code for the PUT function, then jump to the addEntry function. Then you can see that in JDK7 there are two conditions for hashMap expansion:

  1. The current number of data stores (size()) must be greater than or equal to the threshold
  2. Whether the current added data has hash conflicts

Because of these two conditions, there are two cases:

  1. That is, when hashMap is storing values (default size: 16, load factor: 0.75, threshold: 12), it may reach the end of storing 16 values, and the expansion phenomenon will occur only when the 17th value is stored, because each of the first 16 values occupies a position in the underlying array, and no hash collision occurs.
  2. Of course, it is possible to store more values (up to 16 values, up to 26 values) without expanding. Principle: Before eleven values all hash collision, save to the array of the same position (the element number 12 is less than the threshold, not expansion), behind all deposited in the value of 15 to array the remaining 15 position (the element number greater than or equal to the threshold, but each deposit element and hash collision does not occur, so will not increase). The first 11+15=26, so the above two conditions are met when the 27th value is saved, and then the expansion phenomenon occurs.

In JDK8, the only condition for capacity expansion is that the current capacity is greater than the threshold (the threshold is equal to the current hashMap maximum capacity times the load factor).

HashMap extends the method of evaluating new indexes in JDK7

Elements from the old array are copied to the new array using the Transfer method, which involves releasing object references from the old Entry, recalculating hash values if necessary, and then evaluating indexes based on the Indexfor () method. The index value can be calculated by h & (Length-1), which is the hash value calculated by HashCode and the array length.

Java’s %, / operations are generally considered about 10 times slower than &, so using & instead of H % length improves performance.

HashMap method of evaluating indexes in JDK8

In addition, the resize process evenly distributes the conflicting nodes to the new bucket, since it is considered random whether the new bit is 0 or 1. This is the new optimization point for JDK1.8. In JDK1.7, when the old list is migrated to the new list, if the new list is in the same array index position, the list element will be inverted. However, as can be seen from the following figure, JDK1.8 will not be inverted, using the tail method.


Rehash of JDK8

Why wrapper classes like String and Integer in HashMap are suitable for key keys


The choice of Key

What methods should be implemented if the key in a HashMap is of type Object?


The importance of rewriting methods

Oh, if I lose my card. I can still be found by searching “Engineer Xiaohui” on wechat