Understand the implementation principles of HashMap and ConcurrentHashMap

Offer to come, dig friends take it! I am participating in the 2022 Spring Recruit Punch card activity. Click here for details.

Abstract

As Java developers, we often use containers in programming, such as List, Map, and Set. Today we will learn the most common HashMap and ConcurrentHashMap. Compare HashMap and ConcurrentHashMap, find out their similarities and differences, and why ConcurrentHashMap is thread-safe.

Source code based on JDK8 version

HashMap

HashMap is one of the backbone members of Map family. It inherits AbstractMap abstract class and implements Map interface. It is a data structure stored by key-value pair.

Features of HashMap:

There can be only one NULL key, and the subscript of the null key is 0
The key cannot be repeated
Is thread unsafe
Based on array, linked list and red black tree implementation

Definition of HashMap in Java:

public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable {}Copy the code

Structure of HashMap:

JAVA7 differs from JAVA8 in structure by adding a red-black tree, and the time complexity of searching changes from O(n) to O(logn).

Put method

A HashMap stores a value from a collection using the PUT (K key, V value) method

public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
Copy the code

putVal:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // If the array is not created or its length is 0, the resize() method is used to initialize it
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // If the node under the subscript of key is null, create a new node and store it
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else { // If hash conflicts
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p; // If the hash and key are the same and the original node is replaced, the original value will be overwritten
            else if (p instanceof TreeNode) // If the node is a treeNode, execute putTreeVal
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) { // If the next node of the node is null, the new node is placed on the next node
                        p.next = newNode(hash, key, value, null);
                        // If the list length is greater than or equal to 8, convert to a red-black tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st 
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// If there are nodes with the same key
            if(e ! =null) { // existing mapping for key 
                V oldValue = e.value; 
                // Check whether the no override switch is enabled. If the switch is enabled, the original value cannot be overwritten. If the switch is not enabled, the new value can be set
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue; //
            }
        }
        ++modCount; // The number of operations is increased by one
        if (++size > threshold) // Check whether the element in the array reaches the capacity threshold
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

The get method

Get (Object key): The get method is used to pass in a key to obtain the corresponding value

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
Copy the code

The getNode() method is used to obtain the value, and the hash() method computes the hash value of the key

 final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first; // If the subscript is found by key, and the node corresponding to the subscript has the same key as the key
            if((e = first.next) ! =null) {// If the first node is not found, check to see if it is on the conflicting chain
                if (first instanceof TreeNode) // If the head node is treeNode, it represents a red-black tree
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {// Since the next node of first is already assigned to e, use the do while loop.
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e; // Return if found in the list
                } while((e = e.next) ! =null); }}return null; // Return null if none is found
    }
Copy the code

The hash method

static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

The hash of the null value is 0

Hash (key) = (array length -1) + hash(key) = (array length -1) + hash(key) Here’s a reference to an image from the web to make it clearer:

The resize method

The resize method is relatively complex, mainly has the following points

During Capacity expansion, newCap and newThr are calculated. These are the abbreviations of two words, one is Capacity and the other is Threshold.
NewCap array bucket for innovation New Node[newCap];
With the expansion, the elements that were originally stored in linked lists and red-black trees due to hash collisions need to be split and placed in new locations.

Expansion Opportunity:

When the size of the array is larger than the threshold threshold, the array is expanded. When the table is null for the first time, resize() is executed. In this case, the value of threshold is initialCapacity*DEFAULT_LOAD_FACTOR. DEFAULT_LOAD_FACTOR The default load factor is 0.75. That is, if the array is initialized to 16, 16*0.75=12, and if the array is used to a size greater than 12, it needs to be expanded. The expanded array is twice the size of the original array, which is 16<<1 = 32.

InitialCapacity is the length of the array to be initialized. You can set the size of the array by calling the constructor parameter.

The following method is used to find the smallest power of 2 greater than or equal to initialCapacity (if initialCapacity is a power of 2, it will return the same number) and assign the value to threshold, which is the threshold for expansion.

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

When to turn a linked list into a red-black tree:

If the length of a subscript list is greater than or equal to 8, it does not necessarily turn into a red-black tree, but the length of the array must be greater than or equal to 64. When the collision chain is 8, it will determine whether the current array size is less than 64. If it is less than 64, it will expand the array size and hash all keys again. When the array size is not less than 64, it will turn the list into a red-black tree.

Why is HashMap thread unsafe?

When two threads hash, they both put a new node into it, but they don’t always succeed. One of them might be overwritten.

In addition, when multiple threads expand capacity, only the array created by the last thread is effective. The array created by the previous thread is overwritten. If the previous thread stores values in the array after capacity expansion, these values will be lost.

ConcurrentHashMap

ConcurrentHashMap is a container commonly used in Concurrent Java programming. It inherits AbstractMap class and implements ConcurrentMap interface. Different from HashMap, ConcurrentHashMap is a multithread safe container.

Definition in Java:

public class ConcurrentHashMap<K.V> extends AbstractMap<K.V>
    implements ConcurrentMap<K.V>, Serializable {}Copy the code

Its underlying data structure, add, delete and query implementation is similar to HashMap, but with a thread-safe implementation.

Features:

NullPointerException is thrown because null keys and values are not allowed
thread-safe

How to achieve thread safety?

JDK 1.7

Using a segmented lock (ReentrantLock + Segment + HashEntry) is equivalent to dividing a HashMap into segments and assigning one lock to each Segment. This supports multi-threaded access.

Lock granularity: Based on Segment and contains multiple Hashentries.

ConcurrentHashMap is an array of segments that can be locked by inheritingReentrantLock. Therefore, every Segment that needs to be locked is locked. Global thread safety is achieved.

JDK 1.8

CAS + synchronized + Node + red-black tree

Lock granularity: Node (the first Node). Lock granularity has been reduced.

The ConcurrentHashMap Node differs from the HashMap Node in that it is volatile on vaule and next to maintain visibility.

static class Node<K.V> implements Map.Entry<K.V> {
    // List data structure
    final int hash;    / / the hash value of the key
    final K key;       //key
    // Both val and next will change during capacity expansion, so volatile is used to maintain visibility and disallow reordering
    volatile V val;   // Get does not need to be locked at all because Node val is volatile
    volatile Node<K,V> next;     
}
Copy the code

The array is also volatile to ensure visibility during expansion.

 transient volatile Node<K,V>[] table;
Copy the code

Let’s take a look at

static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
        return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
    }
Copy the code

Thread synchronization mechanism:

Addressing sizeCtl and location-sensitive nodes, the unsafe method is always used for concurrency safety
When a node needs to be set at a certain location, the node is locked through the synchronization mechanism.
When expanding the array, use the step size and FWD node to ensure concurrent security by setting the hash value to MOVED
When a node at a certain location is copied to the expanded table, the Synchronized synchronization mechanism is also used to ensure the security of the present program

conclusion

This is the end of the study of HashMap and ConcurrentHashMap. Most of the above content is a summary of my own learning, and I refer to some online blogs. If there is anything wrong, please point out and correct it.

In my opinion, learning requires reviewing the old and learning the new. The learned knowledge is not easy to forget after a period of time, so repeated learning and understanding are needed to deepen memory.

Reference:

www.zhihu.com/question/20…

Blog.csdn.net/weixin_4318…