Thread safety of HashMap

HashMap is thread-unsafe.

In order to use a thread-safe data structures, multi-threaded conditions, can use the Collections. SynchronizedMap method constructed a synchronized Map, or directly using a thread safe ConcurrentHashMap.

Java 7 segmented locking based ConcurrentHashMap

Note: This chapter is based on JDK 1.7.0_67

The data structure

The underlying data structures of ConcurrentHashMap in Java 7 are still arrays and linked lists.

Unlike a HashMap, the outer layer of a ConcurrentHashMap is not a large array, but an array of segments.

Each Segment contains a linked list array similar to a HashMap data structure. The overall data structure is shown below.

Addressing mode

When reading or writing a Key, the hash value of the Key is first obtained.

It modulates the number of segments by the high N bits of the hash value to find which Segment the Key should belong to, and then operates on the Segment as if it were a HashMap.

To ensure that different values are evenly distributed across different segments, hash values need to be calculated using the following method.

private int hash(Object k) {
  int h = hashSeed;
  if ((0! = h) && (kinstanceof String)) {
    return sun.misc.Hashing.stringHash32((String) k);
  }
  h ^= k.hashCode();
  h += (h <<  15) ^ 0xffffcd7d;
  h ^= (h >>> 10);
  h += (h <<   3);
  h ^= (h >>>  6);
  h += (h <<   2) + (h << 14);
  return h ^ (h >>> 16);
}
Copy the code

In order to improve the efficiency of modular operation, ssize is the smallest power 2 to the N greater than concurrencyLevel by the following calculation, and segmentMask is 2^ n-1.

This is consistent with the method used to calculate the length of the array.

For a hash of a Key, just move the segmentShift bit to the right to get the sshift bit higher, and then take and operate the segmentMask to get its index in the Segment array.

int sshift = 0;
int ssize = 1;
while (ssize < concurrencyLevel) {
  ++sshift;
  ssize <<= 1;
}
this.segmentShift = 32 - sshift;
this.segmentMask = ssize - 1;
Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
Copy the code

synchronously

Segments inherit from ReentrantLock, so we can easily lock each Segment.

For read operations, you need to ensure visibility when obtaining the Segment where the Key resides (see How to Ensure visibility in multithreaded conditions).

The implementation can use the volatile keyword or lock. But using locks was expensive, as was using volatile, which invalidated all CPU caches with each write.

ConcurrentHashMap uses the following methods to ensure visibility and get the latest Segment.

Segment<K,V> s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)
Copy the code

A similar method is used to get a HashEntry in a Segment

HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
  (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE)
Copy the code

For write operations, it is not required to lock all segments at the same time, as that would lock the entire Map.

It obtains the key-value lock on the Segment. After obtaining the lock, it can operate on the Segment just like a normal HashMap and guarantee the security of the Segment.

Because locks on other segments are not acquired, concurrent read-write concurrencyLevel (equal to the number of segments) is theoretically supported.

When acquiring a lock, you do not use lock directly because it will hang if the method fails to acquire the lock.

In fact, it uses a spin lock, and if tryLock fails to acquire the lock, the lock is being held by another thread, at which point it loops through again to tryLock the lock.

Resets the retry times if the Key’s linked header is changed during the loop. If the retry times exceed a certain value, apply for a lock using the lock method.

Spinlocks are used here because they are more efficient, but they consume more CPU resources, so they are switched to mutex when the number of spins exceeds the threshold.

The size operation

The put, remove, and get operations only care about one Segment, while the size operation needs to traverse all segments to figure out the size of the entire Map.

A simple solution is to lock all segments and then unlock them after the calculation is complete.

However, in this way, the Map cannot be written or read during the size operation, which is not conducive to parallel operations on the Map.

To support concurrent operations, ConcurrentHashMap computs the size of each Segment 3 times without locking it. The modCount of the Segment is equal every time the Segment is modified, indicating that there is no update operation in the two calculation processes. Therefore, the total size calculated by the two calculation processes is equal, which can be directly returned as the final result.

If the Map is updated during the three calculations, lock all segments and recalculate Size.

The calculation method code is as follows

public int size(a) {
  final Segment<K,V>[] segments = this.segments;
  int size;
  boolean overflow; // true if size overflows 32 bits
  long sum;         // sum of modCounts
  long last = 0L;   // previous sum
  int retries = -1; // first iteration isn't retry
  try {
    for (;;) {
      if (retries++ == RETRIES_BEFORE_LOCK) {
        for (int j = 0; j < segments.length; ++j)
          ensureSegment(j).lock(); // force creation
      }
      sum = 0L;
      size = 0;
      overflow = false;
      for (int j = 0; j < segments.length; ++j) {
        Segment<K,V> seg = segmentAt(segments, j);
        if(seg ! =null) {
          sum += seg.modCount;
          int c = seg.count;
          if (c < 0 || (size += c) < 0)
            overflow = true; }}if (sum == last)
        break; last = sum; }}finally {
    if (retries > RETRIES_BEFORE_LOCK) {
      for (int j = 0; j < segments.length; ++j) segmentAt(segments, j).unlock(); }}return overflow ? Integer.MAX_VALUE : size;
}
Copy the code

The difference

ConcurrentHashMap has the following differences compared with HashMap

  1. ConcurrentHashMap is thread-safe, while HashMap is not thread-safe

  2. HashMap allows the Key and Value to be null, whereas ConcurrentHashMap does not

  3. A HashMap does not allow Iterator traversal to be modified through a HashMap, whereas ConcurrentHashMap allows this behavior, and the update is visible to subsequent traversals

Java 8 Cas-based ConcurrentHashMap

Note: Code in this chapter is based on JDK 1.8.0_91

The data structure

Java 7 introduces the Segment structure to achieve parallel access, and implements segmental locking. Theoretically, the maximum concurrency is equal to the number of segments.

To further improve concurrency, Java 8 eliminates the piecewise locking scheme and instead uses a large array.

In order to improve the addressing performance under hash collisions, Java 8 converts linked lists (O(N)) to red-black trees (O(log(N)) when the list length exceeds a certain threshold (8).

Its data structure is shown in the figure below:

Addressing mode

Java 8’s ConcurrentHashMap also determines the index of the Key in the array by modulating the hash value of the Key and the array length.

Also to avoid a bad hashCode design for keys, it calculates the final hash value of the Key as follows.

In contrast, the author of ConcurrentHashMap for Java 8 believes that the addressing efficiency is high enough after the introduction of red-black trees, so the author does not design too much in the calculation of hash values. Just xor the Key’s hashCode value 16 bits above it and ensure that the highest bit is 0 (thus ensuring that the final result is a positive integer).

static final int spread(int h) {
  return (h ^ (h >>> 16)) & HASH_BITS;
}
Copy the code

synchronously

For the PUT operation, if the array element corresponding to the Key is NULL, the CAS operation sets it to the current value.

If the array element corresponding to the Key (that is, the head of the list or the root of the tree) is not null, the synchronized keyword is used to lock the element and then the operation is performed.

If the PUT operation causes the current list length to exceed a certain threshold, the list is converted into a tree to improve addressing efficiency.

For reads, arrays are not visible because they are decorated with the volatile keyword.

At the same time, each element is a Node instance (in Java 7, each element is a HashEntry), and its Key and hash values are fixed by final, so there is no need to worry about their visibility after modification.

The Value and reference to the next element are volatile, and visibility is guaranteed.

static class Node<K.V> implements Map.Entry<K.V> {
  final int hash;
  final K key;
  volatile V val;
  volatile Node<K,V> next;
}
Copy the code

Visibility for array elements corresponding to keys is guaranteed by unsafe.getobjectVolatile ().

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
  return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
}
Copy the code

The size operation

Both the PUT and remove methods maintain the Map’s size through the addCount method.

The size method uses sumCount to get the size of the Map maintained by the addCount method.

public int size(a) {
    long n = sumCount();
    return ((n < 0L)?0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}
Copy the code

summary

HashMap and List are two of my favorite data structures, simple and useful.

ConcurrentHashMap improves itself in different JDK versions for better performance, how about you and me?

Hope you found this article helpful, and if you have any other ideas, please share them in the comments section.

All geeks’ likes, favorites and forwarding are the biggest motivation for Lao Ma to continue writing!