Unsafe in HashMap

It is not clear what is the root cause of the insecurity in the case of HashMap multi-threading. This can not endure, decisive search for information and their own hands to operate. You have to do it yourself to remember!! Off-topic: March and April this year is different from the past, now the most contact is the temperature gun, health code, masks and so on. Wearing a mask for a day at work makes me feel oppressed. I really hope the epidemic will pass quickly.

Introduction of a HashMap

A HashMap is a collection object stored as a key-value in Java. It uses an array and a linked list structure (red-black tree was added after 1.8). It allows null keys and values.

HashMap in version 1.7

The reason behind the

Dead loops and data loss occur in 1.7. At the same time, the element insertion method is the head insertion method.

The relevant code

TRANSFER method

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        // Add each chain in oldTable back to newTable
        for (Entry<K,V> e : table) {
            while(null! = e) {// Get e's next node
                Entry<K,V> next = e.next;
                // Whether rehash is required
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                // Find the array location in the new table
                int i = indexFor(e.hash, newC apacity);
                // Assign the array values to the next node of the current e
                e.next = newTable[i];
                // The current value replaces the element in the array
                newTable[i] = e;
                // Move to the next nodee = next; }}}Copy the code

Where multithreading problems may occur in the transfer method above are:

Run with multiple threads. E.next = newTable[I]; Running on a single thread is no problem. After thread A runs the full expansion code, thread B does itwhileWhen you loop, all the elements are pointing backwards, it’s an infinite loop. At the same time, if there is data after the list, the value of the remaining node will not be found during thread B’s loop (1.8Inside, two pairs of temporary variables are used to ensure that there is no generationnewTableThe previous is not changing the previous structure).

HashMap in version 1.8

The reason behind the

The 1.8 version of HashMap is modified over 1.7. Although it does not have the same situation as 1.7, it does have a new problem, that is, data overwriting. At the same time, 1.8 uses tail interpolation for element insertion.

The relevant code

PUT method

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) 
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // Check whether the table is empty. If it is empty, create a table and get its length
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // If there is no data before the calculated index position, it is directly inserted
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
        // The index position is already indexed
            Node<K,V> e; K k;
            // Check whether the put data is the same as the previous data
            // The value of the key is overwritten by its equals()
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            // Check if it is a red-black tree. If it is a red-black tree, insert it directly into the tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // If it is not a red-black tree, each node is traversed to determine whether the list length is greater than 8. If it is, the list is converted to a red-black tree
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // Determine if the key of each element in the index is the same as the key to be inserted
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// If e is not null, the loop is not iterated at the end of the loop, indicating that the linked list has the same key, so just overwrite value and return oldValue
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                returnoldValue; }}Therefore, a key-value is inserted and the number of internal structure changes is recorded. The fast-fail mechanism is used to record the number of internal structure changes
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

Examples of places where multithreading can occur in the put method above are:

If (p = TAB [I = (n-1) &hash]) == null) if (p = TAB [I = (n-1) &hash]) == null) When A thread A execution after this code suspended due to run out of time slice, and thread B time slice after insert the elements in the subscript place, completed the normal insert, and then thread A time slice, because before has conducted A hash collision judgment, all this time won’t judge, but direct insertion, This results in thread A overwriting the data inserted by thread B, making it unsafe.
++size, this code is used to manipulate the current collection size. When thread A and thread B perform the put operation at the same time, assume that the size of the current set is 10. When thread A executes this line of code, it obtains the value of size from the main memory and prepares to perform the +1 operation. However, it has to give up the CPU due to the time slice running out. Thread A takes the CPU again and writes size=11 back to main memory. Then thread A takes the CPU again and writes size=11 back to main memory. When thread B is done with the PUT operation, it writes size=11 back to main memory. Thread A and thread B both performed A PUT operation, but the value of size only increased by 1, which made the thread unsafe due to data overwriting.

The RESIZE method

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        // Only if it is not the first capacity expansion (first capacity expansion in the first put)
        if (oldCap > 0) {
        	// oldCap MAXIMUM_CAPACITY(2^30).
        	// If oldCap can also be changed to integer. MAX_VALUE = (1<<31) -1. So every time hash& (cap-1). You know that the lowest possible value for any hash value is zero, so there are only 2^30 possible subscripts, and there are 2^30 minus 1 subscripts that are not used.
        	// If MAX_VALUE(2^31-1) is maximum_value (2^31-1), half of the space is wasted.
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // Parameter initialization will enter here, mainly to recalculate threshold
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        // No-parameter initialization will enter here
        else {       // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // Reset threshold
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        / / capacity
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // Copy the data to the new table
        if(oldTab ! =null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    // If there is only one node, it is directly assigned
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // If the node is a tree node type, red-black tree processing is performed
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { 
                        // The reason why we define two first and two last objects is because the index of the elements in the linked list is either the old index +oldCap or unchanged after expansion
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        // Iterate through the bucket list
                        do {
                            next = e.next;
                            // The subscript is not changed
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    // The first node
                                    loHead = e;
                                else
                                    // Add to the tail
                                    loTail.next = e;
                                // Adjust the tail element
                                loTail = e;
                            }
                            // Subscript changes
                           i //Hash map (+oldCap); //Hash map (+oldCap); //Hash map (+oldCap)
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        // The original subscript corresponds to the linked list
                        if(loTail ! =null) {
                            // The next node is set to null, the code is strict (because after the above processing, the last node next points to itself, so it forms a loop node)
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // The new subscript corresponds to the linked list
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

Examples of possible multithreading problems in the resize method above are:

Threads A and B run simultaneouslyresize, thread A completes first and then inserts dataresizeIn this case, the data update made by thread A will be lost. Thread A inserts data that thread B has already looped throughresizeThe node returned does not contain the insert of thread A.

conclusion

The thread insecurity of HashMap is mainly reflected in the following two aspects:

In JDK1.7, circular chains and data loss occur when concurrent capacity expansion operations are performed.
In JDK1.8, data overwrite occurs when concurrent PUT operations are performed.

If you have any other ideas, please comment below. I will check them out in detail. Weekend do not go out, and can save a mask 😃.

Why is HashMap thread unsafe in JDK1.7 and JDK1.8?

Introduction of a HashMap

HashMap in version 1.7

The reason behind the

The relevant code

TRANSFER method

HashMap in version 1.8

The reason behind the

The relevant code

PUT method

The RESIZE method

conclusion

Related Posts

Blood case caused by JDBC Garbled Code

Leetcode 454. 4Sum II (Python)

Spring Core Concepts (source code related)