Java source code parsing HashMap

We believe that the underlying principle of HashMap is more or less understood, here is a brief mention, jdK1.8 version after the internal HashMap mainly through array + linked list + red-black tree data structure to store elements, this article we will analyze several main methods of HashMap, look at the source code of HashMap is specifically how to achieve. And what optimizations the JDK has made to improve performance.

The Node Node

static class Node<K.V> implements Map.Entry<K.V> {
    / / hash value
    final int hash;
    // Element key
    final K key;
    // The element value
    V value;
    // point to the next node
    Node<K,V> next;

    // Omit other code here...
}
Copy the code

The Node class is an internal class of the HashMap. It represents the element to be added to the HashMap. The next field represents a pointer to the next Node

Put method

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

The put method calls putVal directly, and you can see that the first argument is passed a hash(key), so let’s look at the hash method

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

This hash method is used to calculate the hash value of the element put into the map method. This code (h = key.hashcode ()) ^ (h >>> 16) obtains the hash value of the key by using key.hashcode (). Then xor (h >>>16) computs the final hash value. H >>>16 moves the hash value 16 bits to the right

If you are not familiar with right shift >>> and xor ^, you may be confused by this line of code. For example:

H value: 1111 1111 1111 1111 1010 0111 1100

H >>>16 value: 0000 0000 0000 1111 1111 1111 1111

H ^ (h >>> 16) Value: 1111 1111 1111 1111 0000 0101 1000 0011

The final value computed by h ^ (h >>> 16) is the hash value obtained by the hash method. In order to reduce hash collisions, we will explain why we can reduce hash collisions later when we talk about the position of the elements in the array. We only need to know that xor operation can preserve the high and low 16 bits of the original hash and reflect them in the final hash value

Now that we’re done with the hash method, let’s take a look at the putVal method, which is a bit longer and has more complicated branch logic, so let’s look at it bit by bit

 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //1. If the array is empty, call resize to generate an array
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //2. Locate the element in the array first. If the element is empty, build a Node and put it in the current location
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            //3. An element already exists in the current location with the same key, overwriting the current element directly
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            //4. If the current element is a tree node, attach the new element to the tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //5. Attach the element to the list
            else {
                //6. Iterate through the list elements
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        //7. Construct a node. Here we use the next pointer to link the node
                        p.next = newNode(hash, key, value, null);
                        //8. If the list has more than 8 elements, it becomes a red-black tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //9. If the current element has the same key as the new element, overwrite the node directly on the list
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

In the above code, I have added comments on each important code, which should be easy to understand. Let’s look at some important points:

1. Locate elements: p = TAB [I = (n-1) & hash]

Once you get the hash value, (n-1) &hash is used to locate the element in the array by subtracting the length of the hash and the array by 1. N is guaranteed to be a power of 2 in the code. For example, 16, 32, 64, and so on. After looking at this method, you can see why it is a power of 2. Similarly, let’s look at this operation:

The n value is assumed to be 16, so n-1 is 15, and the binary value is :1111

The hash value is 1111 1111 1111 1111 1111 0000 0101 1000 0011, and the result is 11, so the element will be at the index of 11. We use the and instead of modulo the array, again for performance reasons.

So why do xor, let’s say the first element is called a, the original hash is 1111 1111 1111 1111 1010 0111 1100, There’s another element b that has the original hash value of 1111 1111 1111 1110 1111 1010 0111 1100, and notice the difference from the previous element, only the 16th bit is different, a is 1, b is 0

If there’s no shift and xor, then it’s going to have the same (n minus 1) hash value as 16, so it’s going to clash. So let’s look at b’s hash value after the shift and xor

The h value of B is 1111 1111 1110 1111 1010 0111 1100

H >>>16 value: 0000 0000 0000 1111 1111 1111 1110

H ^ (h >>> 16) Value: 1111 1111 1111 1111 0000 0101 1000 0010

So the final hash value of B is 1111 1111 1111 1111 1111 0000 0101 1000 0010, which is 10 after (n-1) &hash, which is the position of the array with index 10, and it does not conflict with A, feel that? That’s what shift and xOR do. Reduces the probability that elements will collide in array positions.

2. Convert linked list to red-black tree: treeifyBin(TAB, hash); This method is simply to link the list into a red black tree, the internal implementation is still very complex, interested can go to see the source code.

The resize method

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    next = e.next;
    //1. Hash value & old array length equals 0
    if ((e.hash & oldCap) == 0) {
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;
        loTail = e;
    }
    //2. Hash value & old array length not equal to 0
    else {
        if (hiTail == null)
            hiHead = e;
        elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
if(loTail ! =null) {
    loTail.next = null;
    //3. Put it in the original position of the new array
    newTab[j] = loHead;
}
if(hiTail ! =null) {
    hiTail.next = null;
    //4. Place (old array length + old position) in a new position
    newTab[j + oldCap] = hiHead;
}
Copy the code

When a HashMap is expanded, it creates a new array twice the size of the old one, and copies the old elements into the new array. The code above is taken from the resize method to determine the location of the new elements. If it is 0, put it in the old position. If it is not 0, put it in the new position. The new position is determined by the old array length + the old position.

Suppose the hash is 1111 1111 1111 1111 0000 0101 1000 0011, the array expands from 16 to 32, and the value evaluated by E.hash & oldCap is 0, so the position on both the old and new arrays is 11

The hash element 1111 1111 1111 1111 0000 0101 1001 0011 expands the array from 16 to 32, evaluates to 1 by e.hash & oldCap, so the position on the new array becomes 11 + 16 = 27 instead of 11

conclusion

This article mainly describes the following implementation of HashMap: 1. Calculate hash value (shift, xOR operation) 2. 3. Implementation of put() method (linked list and linked list to red black tree) 4. Implementation of resize method (Rehash position in a new array)

Above, I hope it will be helpful for you to understand HashMap. If you have different views or questions, please feel free to discuss with me

The Node Node

Put method

The resize method

conclusion

Related Posts

Jdk8 -CopyOnWriteArrayList and CopyOnWriteArraySet

9- There is a bug with TDengine’s low version offset page. How to smooth version upgrade and migrate data

Nodejs preliminary