preface

HashMap is an asynchronous implementation of the hash table-based Map interface. This implementation provides all optional mapping operations and allows the use of null values and null keys. This class does not guarantee the order of the mappings, especially it does not guarantee that the order is constant.

The data structure of HashMap

In the Java programming language, there are two basic constructs, an array and a mock pointer (reference). All data structures can be constructed from these two basic constructs, and HashMap is no exception. A HashMap is actually a “list hash” data structure, which is a combination of an array and a list.

The text description should always be illustrated with an image to explain the data structure better. Here is the structure of a HashMap.

As you can see from the figure above, the underlying HashMap is an array structure, with each item in the array being a linked list or a red-black tree. When a new HashMap is created, an array is initialized.

Let’s take a look at some of the core members of HashMap.

Public class HashMap<K,V> extends Map<K,V> Implements Map<K,V>, Cloneable, Serializable Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30 Static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; /* Minimum hash table capacity when bin in bucket is treified. * If the hash table capacity is smaller than MIN_TREEIFY_CAPACITY, the resize operation is performed when the number of bin in the bucket is too large. * The value of MIN_TREEIFY_CAPACITY is at least 4 times the value of TREEIFY_THRESHOLD. */ static final int MIN_TREEIFY_CAPACITY = 64; static class Node<K,V> implements Map.Entry<K,V> { //... } // Transient Node<K,V>[] table; Entry<K,V>> entrySet; // Transient Set< map. Entry<K,V>> entrySet; // Map KEY VALUE is transient int size; /** * Number of structural changes. * Structural changes are changes in the number of elements in a map, such as a rehash operation. * for a HashMap fail fast operation, such as structural changes that happened in traversed will throw ConcurrentModificationException. */ transient int modCount; // The size of the next resize operation. int threshold; Size * loadFactor Final float loadFactor; // Float loadFactor; // Float loadFactor; }Copy the code

Initialization of the HashMap

public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // All other values are default values}Copy the code

The array table is not initialized. It can only be put during the put operation. Why do we do this? I guess it is to avoid using HashMap after initializing it, which takes up memory, lol.

Storage operations for a HashMap

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
Copy the code

Let’s take a look at how HashMap determines the location of the array index, the process of putting, and the mechanism of resize.

Hash calculation to determine the array index position

Finding the location of the hash bucket array is a critical first step in adding, deleting, and finding key-value pairs. Said earlier HashMap data structure is a combination of arrays and linked list, so we certainly hope this HashMap element position as far as possible some more uniform distribution, try to ensure that only one of the number of elements in each position, so when we use the hash algorithm to obtain the location, immediately can know is the corresponding position of the element, No need to traverse the list, greatly optimize the efficiency of the query. HashMap locates the array index position, which directly determines the discrete performance of the hash method.

Take a look at the source code implementation:

Static final int hash(Object key) {//jdk1.8 int h; // h ^ (h >>> 16) return (key == null)? 0 : (h = key.hashCode()) ^ (h >>> 16); }Copy the code

Implemented with the high 16-bit xOR low 16-bit hashCode() : (h = k.hashcode ()) ^ (h >>> 16), mainly from the speed, efficiency, quality consideration, this can ensure that the high and low Bit can participate in the Hash calculation when the length of the array table is relatively small, at the same time, there is no too much overhead.

As you know, the key.hashcode () function in the above code calls the built-in hash function of the key type and returns an int hash. In theory, the hash value is an int. If you access the main HashMap array directly with the hash value as a subscript, consider that the 32-bit signed int table values in binary range from 2147483648 to 2147483648. That adds up to about four billion mappings. As long as the hash function is mapped evenly and loosely, it is very difficult to have collisions in general applications. But the problem is that with an array four billion long, there’s no room for it. If you think about it, the initial size of the array before the HashMap expansion was 16. So this hash value is not directly usable. Before you use it, you have to modulo the length of the array, so the remainder is used to access the index of the array. In the source code, the module operation is done in the indexFor() function.

bucketIndex = indexFor(hash, table.length); Static int indexFor(int h, int length) {return h & (length-1); }Copy the code

This, by the way, explains why the length of a HashMap array should be a full power of 2. Because this (array length 1) is exactly the equivalent of a “low mask”. The result of the “and” operation is that all the higher hash values are zeroed, and only the lower hash values are retained for array indexing access. Take an initial length of 16, for example. 16‑1=15. In the base 2 format, the value is 00000000000000001111. The result is the interception of the lowest four values.

10100101 11000100 00100101 & 00000000 00000000 00001111 ---------------------------------- 00000000 00000000 00000101 // Return all the high positions to zero, except the last fourCopy the code

But that’s where the problem comes in, because even if MY hash is loosely distributed, if I just pick the last few bits, I’m going to get a lot of collisions. What’s more, if the hash itself is not done well, an arithmetic hole in the distribution that happens to make the last few low positions repeat regularly is a pain in the ass. That’s where the value of the perturbation function comes in, and I think that’s clear to you. Look at the graph below.

Hash Calculation process

The right displacement is 16 bits, which is exactly half of 32bit, and its own high half and low half do xor, in order to mix the high and low of the original hash code, to increase the randomness of the low. Moreover, the mixed low position is mixed with some of the characteristics of the high position, so that the information of the high position is preserved in a disguised way.

PutVal method

The execution process of the Put method of HashMap can be understood by the following figure. If you are interested, you can compare the source code to learn more clearly.

The source code and explanation are as follows:

Final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; / / if the table is not initialized, or the size of the initialized to 0, for the resize operation if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If the bucket corresponding to the hash value has no data, If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); // Else {Node<K,V> e; // Else {Node<K,V> e; K k; / / judge put elements and existing element is the same (hash, and equals return true) if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If the element in the bucket is of type TreeNode, that is, if the element in the bucket is of type TreeNode, that is, if the element is of type TreeNode, Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeval (this, TAB, hash, key, value); For (int binCount = 0; int binCount = 0; int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // If the length of the list is greater than the threshold (TREEIFY_THRESHOLD), If (binCount >= treeify_threshold_1) // -1 for 1st treeifyBin(TAB, hash); break; } / / check if the key already exists, stop traversing the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } // If (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); // If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

Expansion mechanism

HashMap’s scaling mechanism is clever enough to achieve scaling with minimal performance. The capacity is now twice as big as it was before, and the initial capacity was 16, so after the rehash, the element will either be in its original position, or it will be in its previous position and then it will be in the higher order, so if it was 16 last time, it will be 16+16 next time, If an element is at 7, the next time you expand it, it will either be at 7 or 7+16.

Let’s explain how Java8’s scaling mechanism works. N is the length of the table. Figure (a) shows the example of determining the index position with key1 and key2 before capacity expansion. Figure (b) shows the example of determining the index position with key1 and key2 after capacity expansion.

After recalculating the hash, the mask range of n-1 is 1bit higher (red), so the new index will change like this:

Therefore, when we extend the hash map, we don’t need to recalculate the hash as in the JDK1.7 implementation. We just need to see if the new bit is 1 or 0. If 0 is the same index, the index becomes “old index +oldCap”. Take a look at the following resize diagram from 16 to 32:

Whether the high hash value is 1 or not, you only need to do the operation with the expanded length. Since the expanded length is a power of 2, the high hash value must be 1 and the low hash value must be 0, such as 10000. There is e.ash & oldCap in the source code to do this logic.

This is a clever design that saves the time to recalculate the hash value, and since the new bit is considered random whether it is 0 or 1, the resize process uniformly dispersing the previously conflicting nodes into the new bucket. This is the new optimization in JDK1.8. In JDK1.7, if the index of the new list is the same as that of the old list, then the list elements will be inverted. As shown in the figure above, JDK1.8 does not invert the list elements. The following is the JDK1.8 resize source code, write very good, as follows:

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap > 0) {// If (oldCap > 0); If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; } // The maximum value is not exceeded, Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Calculate the new resize upper limit if (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// move each bucket to the new bucket for (int j = 0; j < oldCap; ++j) { Node<K,V> e; Null if ((e = oldTab[j])! = null) { oldTab[j] = null; If (e.ext == null) newTab[e.ash & (newcap-1)] = e; if (e.ext == null) newTab[e.ash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// Preserve order Node<K,V> loHead = null, loTail = null; // Preserve order Node<K,V> loHead = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; Else {if (hiTail == null) hiHead = e;} else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // Put the low hash list into the original location of the array if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // Add a list with a high hash value to the original location of the array + the original capacity if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; } ' ' 'attention to the public number: programmers chase the wind, reply to the data to get the sorted data.Copy the code