Symptoms of a problem

Our Java code used to use a HashMap thing for some reason, but back then the program was single-threaded and everything was fine. Later, we had performance problems with our program, so we needed to make it multithreaded, and when we went online, we found that the program was often using 100% of the CPU. When we looked at the stack, we found that the program was stuck on hashmap.get (), and when we restarted the program, the problem disappeared. But they’ll come back in a little while. Also, this problem may be difficult to reproduce in a test environment.

Hash table data structure

A HashMap usually uses an array of Pointers (let’s say table[]) to scatter all the keys. When a key is added, the Hash algorithm will calculate the index I of the array using the key, and then insert the <key, value> into the table[I]. If two different keys count in the same I, this is called a collision, and this will form a linked list on the table[I].

We know that if table[] is small in size, say 2 keys, and 10 keys are to be put in, then collisions are so frequent that an O(1) lookup algorithm becomes O(n) traversal, which is the drawback of Hash tables.

Therefore, the size and capacity of the Hash table are very important. In general, when the Hash table container has data to insert, the thredhold is checked to see if it exceeds the thredhold. If it does, the size of the Hash table is increased, but the entire Hash element needs to be recalculated. It’s called rehash, and it’s a pretty big cost.

HashMap rehash source code

Put a Key and Value pair into the Hash table:

public V put(K key, V value)
{.../ / calculate the Hash value
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    // If the key has already been inserted, replace the old value (link operation)
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    // The key does not exist and a node needs to be added
    addEntry(hash, key, value, i);
    return null;
}
Copy the code

Check whether the capacity exceeds the limit

void addEntry(int hash, K key, V value, int bucketIndex)
{
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
    // Check whether the current size exceeds the threshold we set. If so, resize is required
    if (size++ >= threshold)
        resize(2 * table.length);
}
Copy the code

Create a larger hash table and migrate the data from the old hash table to the new hash table

void resize(int newCapacity)
{
    Entry[] oldTable = table;
    intoldCapacity = oldTable.length; .// Create a new Hash Table
    Entry[] newTable = new Entry[newCapacity];
    // Migrate data from Old Hash Table to New Hash Table
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}

void transfer(Entry[] newTable)
{
    Entry[] src = table;
    int newCapacity = newTable.length;
    // The following code means:
    // Take an element from OldTable and place it in NewTable
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if(e ! =null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while(e ! =null); }}}Copy the code

Normal ReHash process

  • I assumed that our hash algorithm would simply mod the table size (that is, the length of the array) with a key.
  • Hash table[1]; hash table[1]; hash table[1]
  • The next three steps are to resize the Hash table to 4 and then rehash all

    .
    ,value>

Concurrent Rehash

1. Suppose we have two threads. I’ve highlighted it in red and light blue.

Here’s another detail from the Transfer code:

do {
    Entry<K,V> next = e.next; // <-- suppose the thread is scheduled to suspend as soon as it reaches this point
    int i = indexFor(e.hash, newCapacity);
    e.next = newTable[i];
    newTable[i] = e;
    e = next;
} while(e ! =null);
Copy the code

And our thread two execution is done. So we have something like this.

Note that since e of Thread1 points to key(3) and next points to key(7), it points to the reassembled list of thread 2 after thread 2 rehash. We can see that the order of the linked list is reversed.

2. Thread 1 is scheduled for execution.

  • NewTalbe [I] = e
  • And then e = next, causing e to point to key(7).
  • Next = e.next causes next to point to key(3).

3. Everything is fine

The thread resumed work. Remove key(7) and place it first in newTable[I], then move e and next down.

4. Circular links appear

E.next = newTable[I] key(3).next points to key(7).

So, when our thread calls hashtable.get (11), the tragedy occurs — Infinite Loop.