What is the underlying implementation of HashMap? What optimizations have been made in JDK8?

Focus on system architecture, high availability, high performance, high concurrency technology sharing

In JDK 1.7, HashMap is an array plus a linked list. After JDK 1.8, we added a red-black tree structure. When the list is larger than 8 and the capacity is larger than 64, the structure of the list will be transformed into a red-black tree structure, as shown in the following figure:

The elements of the array are called hash buckets and are defined as follows:

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; }}Copy the code

You can see that each hash bucket contains four fields: Hash, key, value, and next, where next represents the next node in the list.

The reason why JDK 1.8 adds red-black trees is that if the list is too long, the performance of HashMap will be seriously affected. Red-black trees can be used to quickly add, delete, modify, and query the list, which can effectively solve the problem of slow operation when the list is too long.

The test analysis

The above Outlines the structure of a HashMap, but interviewers may want to know more than that. Here are a few more HashMap related interview questions:

What are the optimizations for JDK 1.8 HashMap expansion?
Why is the loading factor 0.75?
How does a HashMap find and confirm elements when there are hash conflicts?
What are the important methods in the HashMap source?
How does HashMap cause an infinite loop?

Knowledge extension

1. Source code analysis of HashMap

Note: This series of courses, without special explanation, will use the current mainstream JDK version 1.8 as an example for source code analysis.

The HashMap source code contains the following attributes:

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; // 1073741824 // Default loading factor (capacity expansion factor) Static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY =Copy the code

What is a load factor? Why is the loading factor 0.75?

If the load factor is 0.5 and the initial capacity of the HashMap is 16, then the HashMap will be expanded when there are 16*0.5=8 elements in the HashMap. So why is the loading factor 0.75 and not 0.5 or 1.0?

This is a trade-off between capacity and performance:

When the load factor is larger, the threshold of the expansion was improved and expansion of frequency is lower, the amount of space will be small, but the chance of Hash conflict will increase, so we need more complex data structure to store the elements, this will increase to the operation of the element of time, efficiency will also reduce;
When the loading factor value is small, the threshold for expansion is low, so more space will be occupied. In this case, the storage of elements is sparse, and the possibility of hash conflict is low, so the operation performance is high.

Therefore, an average of 0.75 from 0.5 to 1.0 was taken as the loading factor. There are three important methods in the HashMap source code: query, add, and data expansion. First look at the query source:

public V get(Object key) { Node<K,V> e; Return (e = getNode(hash(key), key)) == null? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; If ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {/ / judge whether the first element is the element to query the if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; If ((e = first.next)! = null) {// If the first node is tree, If (first instanceof TreeNode) return ((TreeNode<K,V>)first). GetTreeNode (hash, key); Do {/ / the tree structure, loop nodes judging / / hash is equal and the same key, it returns the node if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

As can be seen from the above source code, when hashing conflicts, we need to determine whether the key value is equal to determine whether the element is the one we want. The second important method of HashMap: add method, source code is as follows:

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / a hash table is empty, create a table if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) &hash]) == null) // If table[I] = null, Insert TAB [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / if the key already exists, direct cover value if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; // If key does not exist, Else if (p instanceof TreeNode) // Insert key/value pairs e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); Else {// For a linked list, loop ready to insert for (int binCount = 0; ; ++binCount) {// if ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); If (binCount >= treeify_threshthreshold -1) // -1 for 1st treeifyBin(TAB, hash); break; } / / the key already exists to cover the value directly if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

The execution process of the new method is shown in the figure below:

The third important method of HashMap is the expansion method, source code is as follows:

Final Node<K,V>[] resize() {final Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; int oldThr = threshold; Int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; if (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; } // Double the current capacity, If ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double threshold} // The current array has no data, Else if (oldThr > 0) // Initial capacity was placed in threshold newCap = oldThr; // Zero initial threshold using defaults // So I get the default initialization capacity newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; Table = newTab; // Add new capacity to table. If (oldTab! For (int j = 0; int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; If (e.next == null) newTab[e.hash & (newcap-1)] = e; if (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) // Red-black tree related operation ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); LoHead = null, loTail = null; else {// preserve order // list copy, JDK 1.8 Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } oldCap else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); // Put the original index into the hash bucket if (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // put oldCap into the hash bucket if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

JDK 1.8 does not recalculate the hash value of each element as JDK 1.7 does. Instead, it does a high-order operation (e.hash & oldCap) to determine if an element needs to be moved.

key1.hash = 10 0000 1010

oldCap = 16 0001 0000

The result obtained by using E. hash & oldCap, the higher bit is 0, when the result is 0, it means that the position of the element will not change during expansion, and the information of key 2 is as follows:

key2.hash = 10 0001 0001
oldCap = 16 0001 0000

When the result is 1, it indicates that the position of the element has changed during expansion. The new subscript position is equal to the original subscript position + the original array length, as shown in the figure below:

The red dotted line represents the position of element movement during expansion.

2.HashMap infinite loop analysis

Using JDK 1.7 as an example, we assume that the default size of the HashMap is 2, and the original HashMap has one element key(5), and we use two threads: T1 adds the element key(3) and T2 adds the element key(7). After the elements key(3) and key(7) are added to the HashMap, thread T1 executes Entry

next = e.next; , handed over the use of the CPU, the source code is as follows:
,v>

void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) { Entry<K,V> next = e.next; // thread one executes here if (rehash) {e.hash = null == e.key? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

E in thread T1 points to key(3) and next points to key(7); After thread T2 rehash, the order of the list is reversed, and the position of the list becomes key(5) → key(7) → key(3), where “→” is used to represent the next element. NewTalbe [I] = e set next for key(3) to key(7). Next element for key(7) is key(3). Thus, circular references to key(3) and key(7) are formed, which leads to the occurrence of an infinite loop, as shown in the figure below:

Of course, the cause of the loop is that JDK 1.7 inserts the list in reverse order at the beginning. This problem has been improved in JDK 1.8, and has been changed to tail order inserts.

Some people have reported this question to Sun, but Sun thinks it is not a problem because HashMap itself is not thread safe. If you want to use multi-threading, ConcurrentHashMap is recommended instead. However, this question is still very likely to be asked in an interview. So a special note is needed here.

Focus on system architecture, high availability, high performance, high concurrency technology sharing

What is the underlying implementation of HashMap? What optimizations have been made in JDK8?

The test analysis

Knowledge extension

Related Posts

Vue development win7 environment build and package project and release strategy

How do you generally tune SQL? How does MySQL select indexes during execution?

1920×1080 Bing hd Wallpaper crawl, just for the crawlers to get started! | Python Theme month