preface

HashMap is an asynchronous implementation of the Map interface based on hash tables. This implementation provides all optional mapping operations and allows null values and NULL keys. This class does not guarantee the order of the mapping, and in particular it does not guarantee that the order is constant.

The data structure of the HashMap

In the Java programming language, there are two basic structures, one is an array, the other is a mock pointer (reference), all data structures can be constructed with these two basic structures, HashMap is no exception. A HashMap is actually a “linked list hash” data structure, which is a combination of arrays and lists.

The text description should always be accompanied by the figure above to better explain the data structure. The structure diagram of HashMap is shown below.

As you can see from the figure above, the underlying HashMap is an array structure, and each item in the array is a linked list or red-black tree. When a HashMap is created, an array is initialized.

Let’s take a quick look at the core members of the HashMap.

Public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30 The default load factor is 0.75 static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; /* Minimum hash table capacity when bin in bucket is treed. * If the hash table capacity does not reach the threshold, that is, the hash table capacity is less than MIN_TREEIFY_CAPACITY. If the number of bins in the bucket is too large, the resize operation will be performed. * The MIN_TREEIFY_CAPACITY value is at least 4 times the TREEIFY_THRESHOLD value. */ static final int MIN_TREEIFY_CAPACITY = 64; static class Node<K,V> implements Map.Entry<K,V> { //... } // A transient Node<K,V>[] table; // A container transient Set< map. Entry<K,V>> entrySet; // Map key-value transient int size; /** * The number of structural changes. Structural changes are changes in the number of elements in a map, such as rehash operations. * for a HashMap fail fast operation, such as structural changes that happened in traversed will throw ConcurrentModificationException. */ transient int modCount; // The size of the next resize operation. int threshold; // loadFactor, the size of the capacity after resize will increase the existing size * loadFactor final float loadFactor; }Copy the code

Initialization of HashMap

public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // All other values are default values}Copy the code

We have compiled the 2021 Java Engineer classic interview questions, a total of 485 pages of approximately 850 interview questions with answers. Java, MyBatis, ZooKeeper, Dubbo, Elasticsearch, Memcached, Redis, MySQL, Spring, Spring Boot, Spring Cloud, RabbitMQ, Kafka, Linux And so on almost all technology stack, each technology stack has not less than 50 classic interview real questions, dare not say to brush the package you into the big factory, but targeted brush let you face the interviewer more than a few minutes of confidence or no problem.

The array table is not initialized. The array table can only be initialized when the put operation is performed. If you initialize a HashMap, you can use it without using it.

Storage operations for HashMap

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

Copy the code

Let’s take a look at how HashMap determines the location of an array index, how to put it, and how to resize it.

Hash to determine the array index position

Whether adding, deleting, or finding key-value pairs, locating the hash bucket array is a critical first step. Said earlier HashMap data structure is a combination of arrays and linked list, so we certainly hope this HashMap element position as far as possible some more uniform distribution, try to ensure that only one of the number of elements in each position, so when we use the hash algorithm to obtain the location, immediately can know is the corresponding position of the element, No need to traverse the linked list, greatly optimizing the efficiency of the query. The hash map locates the index position of the array, which directly determines the discrete performance of the hash method.

Take a look at the source code implementation:

Static final int hash(Object key) {//jdk1.8 int h; // h ^ (h >>> 16) return (key == null)? 0 : (h = key.hashCode()) ^ (h >>> 16); }Copy the code

Through the high or low 16 bits of hashCode() : (h = k.hashcode ()) ^ (h >>> 16), mainly from the speed, efficiency, quality, so that when the array table length is relatively small, it can also ensure that the high and low bits are involved in the Hash calculation, and at the same time there is not too much overhead.

The key.hashcode () function in the above code calls the hash function of the key type and returns an int hash. In theory, the hash value is an int. If you access the HashMap main array directly with the hash value as the subscript, consider that the binary 32-bit signed INT table values range from 2147483648 to 2147483648. That adds up to about 4 billion mappings. As long as hash functions are mapped evenly and loosely, collisions are hard to occur in general applications. But the problem is that with a 4 billion array, you can’t fit it in memory. If you think about it, the initial size of the array before HashMap was expanded was 16. So you can’t use this hash value directly. The remainder of the array can be used to access the subscript of the array. The source moduli is done in the indexFor() function.

bucketIndex = indexFor(hash, table.length); Static int indexFor(int h, int length) {return h & (length-1); static int indexFor(int h, int length) {return h & (length-1); }Copy the code

This, by the way, also explains why the HashMap’s array length is a full power of two. Because this (array length 1) is exactly equivalent to a “low order mask”. The result of the “and” operation is that the hash values are all zeroed out, and only the low values are reserved for array subscript access. Take the initial length 16, 16‑1=15. The value is 000000000000000000001111 in base 2. The “and” operation on a hash value is as follows, and the result is the lowest four-digit value truncated.

10100101 11000100 00100101&00000000 00000000 00001111 ---------------------------------- 00000000 00000000 00000000 00000101 // The last four digits are reservedCopy the code

But that’s where the problem comes in, because even if I have a loose hash distribution, if I just take the last few bits, I’m going to get a lot of collisions. Even worse, if the hash itself is not well done, the distribution of an arithmetic sequence of holes, just so that the last few low points appear to repeat regularly, very painful. And that’s where the value of the perturbation function comes in, and I think you get the idea, if you look at the diagram below.

Hash process

Right shift 16 bits, exactly 32bit half, their own high half and low half do xor, is to mix the original hash code high and low, in order to increase the randomness of the low. In addition, the mixed low level is mixed with some features of the high level, so that the information of the high level is also preserved in a disguised way.

PutVal method

HashMap put method execution process can be understood through the following figure, their interest can be compared to the source code more clearly study study.

Source code and explanation as follows:

Final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; / / if the table is not initialized, or the size of the initialized to 0, for the resize operation if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If no data exists in the bucket corresponding to the hash value, If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); Else {Node<K,V> e; K k; / / judge put elements and existing element is the same (hash, and equals return true) if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the elements in the bucket are of type TreeNode, that is, the tree structure used to resolve hash conflicts, Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // If the length of the list is greater than the TREEIFY_THRESHOLD, If (binCount >= treeify_threshold-1) // -1 for 1st treeifyBin(TAB, hash); break; } / / check if the key already exists, stop traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); // If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

Expansion mechanism

The HashMap scaling mechanism is cleverly used to achieve scaling with minimal performance. So after rehashing, the element will either be in its original position, or it will be moved up to the last number of times it was in its original position. That is, if the last capacity was 16, the next capacity will be 16+16. If an element is at index 7, the next time you expand it, it will either be at 7 or at 7+16.

Let’s explain how the capacity expansion mechanism of Java8 works. N is the length of the table. Figure (a) shows the example of determining the index position of key1 and KEY2 before capacity expansion. Figure (b) shows the example of determining the index position of key1 and KEY2 after capacity expansion.

After the element recalculates the hash, the mask range of n-1 is 1bit more (red) at the high end, so the new index will look like this:

Therefore, when we extend the HashMap, we don’t need to recalculate the hash as we did in JDK1.7. We just need to see if the new bit is 1 or 0. 0 is the same index, and 1 is old index +oldCap. You can see the resize from 16 to 32 below:

If the hash value is 1, you only need to do the operation with the expanded length. Since the expanded length is a power of 2, the high value must be 1 and the low value must be 0, such as 10000. There is e.hash & oldCap in the source code to do this logic.

This design is indeed very clever, not only saves the time to recalculate the hash value, but also because the resize process can be considered random whether the new 1bit is 0 or 1, so the resize process evenly distributes the previously conflicting nodes to the new bucket. This is the new optimization point for JDK1.8. In JDK1.7, the list element will be inverted if the index position of the new table is the same as that of the old list. Resize JDK1.8 resize JDK1.8 resize

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        // 计算新的容量值和下一次要扩展的容量
        if (oldCap > 0) {
        // 超过最大值就不再扩充了,就只好随你碰撞去吧
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 没超过最大值,就扩充为原来的2倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // 计算新的resize上限
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            // 把每个bucket都移动到新的buckets中
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                //如果位置上没有元素,直接为null
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    //如果只有一个元素,新的hash计算后放入新的数组中
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    //如果是树状结构,使用红黑树保存
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    //如果是链表形式
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            //hash碰撞后高位为0,放入低Hash值的链表中
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            //hash碰撞后高位为1,放入高Hash值的链表中
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 低hash值的链表放入数组的原始位置
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // 高hash值的链表放入数组的原始位置 + 原始容量
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
    ```

Copy the code