(1) Hashmap design and problems

Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Java Hashmap and corresponding optimization are common knowledge points of interview. Interview is for selection, and it is often difficult to involve logical reasoning. Some points can open the gap between people. Therefore, I want to reason with you over the key points and difficulties. After the reasoning, I hope that the age-old problem of Hashmap will become no longer difficult.

How is hashMap implemented

The basic principle for implementing hashMap is simple:

Hash array single chain bucket
Table array to hold buckets

The classic picture is as follows:

Put the process

The put method evaluates the key first
Overwrite it if it exists
Use addEntry when you’re not
Decide whether to expand or not

Expansion and judgment

Load factor determines capacity expansion
Double capacity expansion bucket
Existing data is hashed again

Put hit barrels

New Entry bucket header
Put the original list next

Null key processing

If the key is empty, the PUT/GET process considers the hash value of null to be 0.

The get process

Get evaluates the hash of the key, locates the bucket, and iterates over the single linked list. Until an equal key is found. If the list is not found, the key does not exist and null is returned.

The get method evaluates the key first
Find the bucket to iterate over the key
Until key and so on return

Red-black tree optimization search efficiency

1.8 Add red black tree
Long lists are also efficient
Query efficiency LogN

What’s the problem? Infinite loop

This is fine if you use hashMap in a single thread. If you introduce multi-threaded concurrency, you may find that the CPU usage is 100%, which is very high. By looking at the stack, you’ll be surprised to see that threads hang on the Get method of the HashMap, and the problem disappears after the service restarts.

Why is that?

That’s what reasoning is about.

As mentioned above, a HashMap uses an Entry array to hold an array of keys and values. When hash conflicts exist, the entries are connected by a linked list. During expansion, a larger array will be created and elements will be transferred. The logic of movement is also clear, iterating through the linked list of each bucket in the old table and rehashing each element to find a home in the new newTable.

Here is the code for the Java 1.7 implementation:

   void resize(int newCapacity) { // Pass in a new capacity
        Entry[] oldTable = table;  // Reference the Entry array before expansion
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before expansion is (2^30)
            threshold = Integer.MAX_VALUE;  // Change the threshold to the maximum value of int (2^31-1), so that there is no future expansion
            return;
        }

        Entry[] newTable = new Entry[newCapacity];  // Initialize a new Entry array
        transfer(newTable); // Move all the elements of the original table into newTable
        table = newTable;  // Assign newTable to table
        threshold = (int)(newCapacity * loadFactor);// Change the threshold
    } 
Copy the code

   void resize(int newCapacity) { // Pass in a new capacity
        Entry[] oldTable = table;  // Reference the Entry array before expansion
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before expansion is (2^30)
            threshold = Integer.MAX_VALUE;  // Change the threshold to the maximum value of int (2^31-1), so that there is no future expansion
            return;
        }

        Entry[] newTable = new Entry[newCapacity];  // Initialize a new Entry array
        transfer(newTable); // Move all the elements of the original table into newTable
        table = newTable;  // Assign newTable to table
        threshold = (int)(newCapacity * loadFactor);// Change the threshold
    } 
Copy the code

So why does this implementation have problems?

Assume that thread 1 and thread 2 are expanded when they put, and the values of local variables E and next of thread 1 are as follows.

If the CPU time slice is switched to thread 2, thread 2 completes the move, assuming that the hashes of A, B, and C are still in the same bucket, the result is as follows

A, B, and C are still in the same bucket, but the order of these entries in the list is reversed.

At this point, thread 1 will continue to switch to run, then thread 1’s newtable will form a ring, as follows

In thread 1, e is A,next is B, and because thread 2 moved, B’s next is A, then according to the above logic, thread 1’s bucket list will have a ring, B refers to each other.

Assume that the final capacity expansion of thread 1 is successful. Thread 1’s newTable is renamed as table and then gets data, hashing the bucket, but if the key is not a or B, an infinite loop will occur.

To summarize, there is an infinite loop that causes CPU spikes. This is a problem with the hashMap expansion implementation. To summarize:

Concurrent movement is risky
New barrels appear in chain ring
There is no dead loop when fetching key

Fix the problem -1.8 implementation

The root cause is that each insertion is placed at the head of the list, and data is moved from the head of the list, so an endless loop can occur. Java 1.8 has been optimized so that each new element is inserted at the end of the list.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);/ / stern interpolation
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

p.next = newNode(hash, key, value, null); / / stern interpolation

How is hashMap implemented

Put the process

Expansion and judgment

Put hit barrels

Null key processing

The get process

Red-black tree optimization search efficiency

What’s the problem? Infinite loop

Fix the problem -1.8 implementation

Related Posts

Nuggets Markdown editor theme styles recommended

JSON library tutorial from scratch

Hostapd Startup Process (I)