Why do read operations on ConcurrentHashMap not need to be locked?

As we know, ConcurrentHashmap(1.8) is thread-safe. When you look at the source code for get operations, you will see that the get operation is completely unlocked. This is the question discussed in this blog post – why doesn’t it need to be locked? 2021Java interview treasure book

The summary of ConcurrentHashMap

In JDk1.7, we use Segment + HashEntry + ReentrantLock. In 1.8, we give up the design of bloated Segment. Instead, Node + CAS + Synchronized is used to ensure concurrency security.

  • The JDK1.8 implementation reduces the granularity of locks. The JDK1.7 version locks are segment-based and contain multiple hashentries, whereas the JDK1.8 lock granularity is HashEntry.

  • JDK1.8’s data structure has become simpler, making the operation more clear and smooth, because synchronized has been used to synchronize, so there is no need for the concept of Segment lock, so there is no need for data structure such as Segment, due to the reduction of granularity, implementation complexity has increased

  • JDK1.8 uses red-black trees to optimize linked lists. Traversal based on long linked lists is a long process, and red-black trees traversal efficiency is fast, instead of a certain threshold of linked lists, so as to form an optimal partner

Get operation source code

  1. First compute the hash value, locate the table index position, return if the first node matches

  2. If capacity expansion occurs, the find method of ForwardingNode, which marks the node being expanded, will be called to find the node and return if the node matches

  3. If none of the above is true, the node is traversed and returns if it matches, otherwise null is returned

    Public V get(Object key) {Node<K,V>[] TAB; Node<K,V> e, p; int n, eh; K ek; int h = spread(key.hashCode()); // Compute hash if ((TAB = table)! = null && (n = tab.length) > 0 && (e = tabAt(tab, (n – 1) & h)) ! = null) {/ / read the first Node if the Node of elements (eh = e.h (ash) = = h) {/ / if the Node is the first Node is returned if (= e.k (ek ey) = = key | | (ek! = null && key.equals(ek))) return e.val; } // A negative hash value indicates expansion. In this case, the ForwardingNode find method is used to locate nextTable. //eh=-1 indicates that this node is a ForwardingNode and is being migrated. Call ForwardingNode’s find method to look in nextTable. //eh=-2, indicating that the node is a TreeBin, call TreeBin’s find method to traverse the red-black tree. Because the red-black tree may be rotating and changing colors, there will be read/write locks in the find. //eh>=0, indicating that a linked list is attached to this node. else if (eh < 0) return (p = e.find(h, key)) ! = null ? p.val : null; while ((e = e.next) ! First node = null) {/ / is neither nor ForwardingNode, then to traverse the if (e.h ash = = h && (= e.k (ek ey) = = key | | (ek! = null && key.equals(ek)))) return e.val; } } return null; }

How does ConcurrentHashMap ensure that the data read is not dirty if get is not locked?

Volatile appearance

For visibility, Java provides the volatile keyword to ensure visibility and order. But atomicity is not guaranteed. Common shared variables do not guarantee visibility, because it is uncertain when a common shared variable will be written to main memory after modification. When another thread attempts to read a common shared variable, the old value may still be in memory, so visibility cannot be guaranteed.

  • The e volatile keyword can be used to modify the base type consistently for subsequent reads across multiple threads, but for reference types such as arrays and entity beans, only the visibility of the reference is guaranteed, but not the content of the reference.

  • Command reordering is disabled.

Background: To improve processing speed, the processor does not communicate directly with the memory, but reads data from the system memory to the internal cache (L1, L2, or others) before performing operations, but does not know when the operation will be written to the memory.

  • If you write to a volatile variable, the JVM sends an instruction to the processor to write the variable’s cached row back to system memory. However, even if the write is back to memory, if the value cached by other processors is still old, performing the calculation will be problematic.

  • On a multiprocessor, in order to ensure that each processor cache is consistent, will realize the cache coherence protocol, when the CPU when writing data, if it is found that operation variables are Shared, will notify the other CPU told the variable cache line is invalid, so other CPU, while reading the variable found the invalid to load data from main memory.

To sum up:

  • First, using the volatile keyword forces the modified value to be written to main memory immediately.

  • Second, using volatile will invalidate the working memory of thread 1 when thread 2 makes changes (i.e., the CPU L1 or L2 cache).

  • Third: since thread 1’s working memory has an invalid cache line for the cached variable, thread 1 reads the value of the variable again from main memory.

Was it volatile added to the array?

/**
 * The array of bins. Lazily initialized upon first insertion.
 * Size is always a power of two. Accessed directly by iterators.
 */
transient volatile Node<K,V>[] table;
Copy the code

We know that volatile can modify arrays, but it doesn’t mean what it seems. For example, volatile int array[10] means that the address of the array is volatile, not that the value of the array elements is volatile.

A Node that is volatile

The get operation is unlocked because the Node element val and pointer next are volatile. In A multithreaded environment, thread A is visible to thread B when modifying val or adding A Node.

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; // Volatile V val; volatile Node<K,V> next; Node(int hash, K key, V val, Node<K,V> next) { this.hash = hash; this.key = key; this.val = val; this.next = next; } public final K getKey() { return key; } public final V getValue() { return val; } public final int hashCode() { return key.hashCode() ^ val.hashCode(); } public final String toString(){ return key + "=" + val; } public final V setValue(V value) { throw new UnsupportedOperationException(); } public final boolean equals(Object o) { Object k, v, u; Map.Entry<? ,? > e; return ((o instanceof Map.Entry) && (k = (e = (Map.Entry<? ,? >)o).getKey()) ! = null && (v = e.getValue()) ! = null && (k == key || k.equals(key)) && (v == (u = val) || v.equals(u))); } /** * Virtualized support for map.get(); overridden in subclasses. */ Node<K,V> find(int h, Object k) { Node<K,V> e = this; if (k ! = null) { do { K ek; if (e.hash == h && ((ek = e.key) == k || (ek ! = null && k.equals(ek)))) return e; } while ((e = e.next) ! = null); } return null; }}Copy the code

Since volatile arrays have no effect on get, what is the purpose of volatile on arrays?

Volatile is used to make the Node array visible to other threads as it expands

conclusion

  • In 1.8 the ConcurrentHashMap get operation does not need to lock all the way, this also is it better than other concurrent Collections such as hashtable, with Collections. SynchronizedMap () packaging hashmap; One of the reasons for high safety and efficiency. 2021Java interview treasure book

  • The get operation does not need to be locked at all because the Node member val is volatile and the array is volatile.

  • The main purpose of volatile is to ensure that the array is visible as it expands.