preface

We all know that HashMap is thread-unsafe and is not recommended in a multi-threaded environment, but this article will try to decipher the main thread unsafe aspects of it.

1. HashMap in JDK1.7

In jdK1.7, there are several optimizations for HashMap. The following are some of the optimizations for HashMap: About the Java project collated 100+Java project video + source + notes, address: 100+Java project video + source + notes

public class HashMapTest { public static void main(String[] args) { HashMapThread thread0 = new HashMapThread(); HashMapThread thread1 = new HashMapThread(); HashMapThread thread2 = new HashMapThread(); HashMapThread thread3 = new HashMapThread(); HashMapThread thread4 = new HashMapThread(); thread0.start(); thread1.start(); thread2.start(); thread3.start(); thread4.start(); } } class HashMapThread extends Thread { private static AtomicInteger ai = new AtomicInteger(); private static Map<Integer, Integer> map = new HashMap<>(); @Override public void run() { while (ai.get() < 1000000) { map.put(ai.get(), ai.get()); ai.incrementAndGet(); }}}Copy the code

The above code is relatively simple: multiple threads are constantly put, and both HashMap and AtomicInteger are shared globally. After running the code a few more times, the following loop occurs:

In a few cases, arrays are out of bounds:

Here we focus on the analysis of why there is an infinite loop, through the JPS and jstack name to check the infinite loop, the results are as follows:

From the stack information, we can see the location where the dead loop occurs. From this information, we can clearly know that the dead loop occurs in the expansion function of HashMap, and the root is in the transfer function of HashMap. The transfer function of HashMap in JDK1.7 is as follows:

void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

To summarize the main functions of this function:

After the expansion of table to newTable, the original data needs to be transferred to newTable. Notice lines 10-12 here, it can be seen that in the process of transferring elements, the header method is used, that is, the order of the linked list will be reversed, which is also the key point of forming an infinite loop. The following is a detailed analysis.

1.1 Analysis process of Dead loop caused by Capacity Expansion

Prerequisites:

It is assumed that

1. Hash algorithm for simple linked list size with key mod.

2. If the initial hash table size=2 and key=3,7,5 are in table[1].

3. Then resize to make size 4.

The data structure before resize is as follows:

In a single-threaded environment, the final result is as follows:

I won’t go into detail here, but it should be easy to understand what the Transfer function is doing, its transfer process, and how to reverse the linked list.

And then in A multithreaded environment, let’s say we have two threads A and B both doing put. Thread A hangs at line 11 of the Transfer function, which is so important to analyze here that it is posted again.

The result of running in thread A is as follows:

After thread A is suspended, thread B can execute normally and complete the resize operation. The result is as follows:

A special note here: since thread B has finished executing, newTable and Entry in the table are now the latest values in main memory according to the Java memory model: 7.next=3, 3.next=null.

When thread A is suspended, the value in memory is as follows: e=3, next=7, newTable[3]=null. The code execution process is as follows:

newTable[3]=e ----> newTable[3]=3
e=next ----> e=7
Copy the code

The result is as follows:

Continue the loop:

E =7 next= e.ext ----> next=3 [Value from main memory] E.ext =newTable[3] ----> E.ext =3 [value from main memory] newTable[3]=e ----> newTable[3]=7 e=next ----> e=3Copy the code

The results are as follows:

Loop again:

E =3 next= e.ext ----> next=null e.ext =newTable[3] ----> e.ext =7 3.next=7 newTable[3]=e ----> newTable[3]=3 e=next ----> e=nullCopy the code

Note that this loop: e.ext =7, whereas in the last loop 7.next=3, the circular list appears and e=null completes the loop.

The results are as follows:

This is where an infinite loop will occur whenever polling the data structure of the HashMap is involved in subsequent operations, causing tragedy.

1.2 Analyzing Data Loss caused by Capacity Expansion

Following the above analysis process, initially:

Thread A and thread B put, and thread A suspends:

The running result of thread A is as follows:

Thread B has obtained the CPU time slice and completed the resize operation:

Also note that since thread B has completed, newTable and table are the latest values: 5. Next =null.

Switch to thread A at this point, when thread A is suspended: e=7, next=5, newTable[3]=null.

If newTable [I]=e, place **7 at table[3]** and next=5. Then proceed to the next loop:

E =5 next= e.ext ----> next=null, value from main memory e.ext =newTable[1] ----> e.ext =5, Values from the main memory newTable [1] = e -- -- -- -- > newTable [1] = 5 e = next -- -- -- -- > e = nullCopy the code

5 is placed at table[1], at which point e=null loop ends, 3 elements are lost, and a circular linked list is formed. And creates an infinite loop for subsequent hashMap operations.

2. HashMap in JDk1.8

In jdk1.8, HashMap is optimized to insert the end of a linked list directly into a hash collision, so there is no looped list, but it is still unsafe in multi-threaded situations.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; If ((p = TAB [I = (n-1) & hash]) == null) // Insert elements directly if there is no hash collision TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

This is the main function for the PUT operation in HashMap in JDK1.8. Note line 6 that the element is inserted directly if there is no hash collision. If thread A and thread B perform A put operation at the same time, the hash value of the two different data is the same, and the data is null, so thread A and thread B both go to line 6. Assume that thread A is suspended before data insertion after entering, thread B performs normally and inserts data normally. Then thread A obtains the CPU time slice, and thread A does not need to make hash judgment any more. The problem occurs: Thread A overwrites the data inserted by thread B, causing thread insecurity.

This is just a brief analysis of the thread insecurity problem with HashMap in JDK1.8. We will summarize the Collection framework in Java later.

conclusion

First of all, HashMap is thread unsafe, which mainly reflects:

1. In JDK1.7, in a multi-threaded environment, capacity expansion will cause ring chain or data loss.

2. In JDK1.8, in a multi-threaded environment, data overwriting occurs.