This is the 13th day of my participation in Gwen Challenge
HashMap is one of the most commonly used types and one of the most frequently asked interview questions. This is because there are a lot of HashMap knowledge points and it is part of Java basics, so it is often asked in interviews.
How is the underlying HashMap implemented? What optimizations have been made in JDK 1.8?
The typical answer
In JDK 1.7, HashMap is an array plus a linked list. After JDK 1.8, we added a red-black tree structure. When the list is larger than 8 and the capacity is larger than 64, the structure of the list will be transformed into a red-black tree structure, as shown in the following figure:
The elements of the array are called hash buckets and are defined as follows:
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey(a) { return key; }
public final V getValue(a) { return value; }
public final String toString(a) { return key + "=" + value; }
public final int hashCode(a) {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false; }}Copy the code
You can see that each hash bucket contains four fields: Hash, key, value, and next, where next represents the next node in the list.
The reason why JDK 1.8 adds red-black trees is that if the list is too long, the performance of HashMap will be seriously affected. Red-black trees can be used to quickly add, delete, modify, and query the list, which can effectively solve the problem of slow operation when the list is too long.
The test analysis
The above Outlines the structure of a HashMap, but interviewers may want to know more than that. Here are a few more HashMap related interview questions:
-
What are the optimizations for JDK 1.8 HashMap expansion?
-
Why is the loading factor 0.75?
-
How does a HashMap find and confirm elements when there are hash conflicts?
-
What are the important methods in the HashMap source?
-
How does HashMap cause an infinite loop?
Knowledge extension
1. Source code analysis of HashMap
Declaration: in the absence of special instructions, are the current mainstream JDK version 1.8 as an example to source analysis.
The HashMap source code contains the following attributes:
// HashMap initializes the length
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum length of HashMap
static final int MAXIMUM_CAPACITY = 1 << 30; / / 1073741824
// Default load factor (expansion factor)
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// When the length of the list is greater than this value and the capacity is greater than 64
static final int TREEIFY_THRESHOLD = 8;
// Convert the critical value of the linked list. When the number of elements is smaller than this value, the red-black tree structure is converted to the linked list structure
static final int UNTREEIFY_THRESHOLD = 6;
// Minimum tree capacity
static final int MIN_TREEIFY_CAPACITY =
Copy the code
What is a load factor? Why is the loading factor 0.75?
If the load factor is 0.5 and the initial capacity of the HashMap is 16, then the HashMap will be expanded when there are 16*0.5=8 elements in the HashMap.
So why is the loading factor 0.75 and not 0.5 or 1.0?
This is a trade-off between capacity and performance:
-
When the load factor is larger, the threshold of the expansion was improved and expansion of frequency is lower, the amount of space will be small, but the chance of Hash conflict will increase, so we need more complex data structure to store the elements, this will increase to the operation of the element of time, efficiency will also reduce;
-
When the loading factor value is small, the threshold for expansion is low, so more space will be occupied. In this case, the storage of elements is sparse, and the possibility of hash conflict is low, so the operation performance is high.
Therefore, an average of 0.75 from 0.5 to 1.0 was taken as the loading factor.
There are three important methods in the HashMap source code: query, add, and data expansion.
First look at the query source:
public V get(Object key) {
Node<K,V> e;
// Hash the key
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// Non-null judgment
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// Determine if the first element is the element to be queried
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
// The next node is not null
if((e = first.next) ! =null) {
// If the first node is a tree, use getTreeNode to get the corresponding data directly
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do { // Non-tree structure, loop node judgment
// If hash is equal and key is the same, return this object
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
As can be seen from the above source code, when hashing conflicts, we need to determine whether the key value is equal to determine whether the element is the one we want.
The second important method of HashMap: add method, source code is as follows:
public V put(K key, V value) {
// Hash the key
return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// create table if hash table is empty
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Calculate the array index I to insert based on the hash value of key
if ((p = tab[i = (n - 1) & hash]) == null)
// If table[I] is null, insert it directly
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// Override value if key already exists
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// If the key does not exist, check whether it is a red-black tree
else if (p instanceof TreeNode)
// Red-black tree inserts key-value pairs directly
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// For a linked list structure, loop ready for insertion
for (int binCount = 0; ; ++binCount) {
// The next element is empty
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// Convert to a red-black tree for processing
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// Key already exists overwriting value directly
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// Expand the capacity when the capacity exceeds the upper limit
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
The execution process of the new method is shown in the figure below:
The third important method of HashMap is the expansion method, source code is as follows:
final Node<K,V>[] resize() {
// Array before expansion
Node<K,V>[] oldTab = table;
// The size and threshold of the array before expansion
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
// Pre-define the size and threshold of the new array
int newCap, newThr = 0;
if (oldCap > 0) {
// If the value exceeds the maximum value, the capacity will not be expanded
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// Expand the capacity to double the current capacity, but cannot exceed MAXIMUM_CAPACITY
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
// There is no data in the current array, use the initialized value
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
// If the initialization value is 0, the default initialization capacity is used
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// If the new capacity is 0
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
// Start capacity expansion and assign the new capacity to the table
table = newTab;
// The original data is not empty. Copy the original data to the new table
if(oldTab ! =null) {
// Loop through the array based on capacity, copy non-empty elements into the new table
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// If there is only one list, direct assignment is performed
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// Red-black tree related operations
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// List replication, JDK 1.8 expansion optimization part
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
/ / the original indexes
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// Old index + oldCap
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// Put the original index into the hash bucket
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// Put the old index + oldCap in the hash bucket
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
JDK 1.8 does not recalculate the hash value of each element as JDK 1.7 does. Instead, it does a high-order operation (e.hash & oldCap) to determine if an element needs to be moved.
-
key1.hash = 10 0000 1010
-
oldCap = 16 0001 0000
The result obtained by using E. hash & oldCap, the higher bit is 0, when the result is 0, it means that the position of the element will not change during expansion, and the information of key 2 is as follows:
-
key2.hash = 10 0001 0001
-
oldCap = 16 0001 0000
When the result is 1, it indicates that the position of the element has changed during expansion. The new subscript position is equal to the original subscript position + the original array length, as shown in the figure below:
The red dotted line represents the position of element movement during expansion.
2.HashMap infinite loop analysis
Using JDK 1.7 as an example, we assume that the default size of the HashMap is 2, and the original HashMap has one element key(5), and we use two threads: T1 adds the element key(3) and T2 adds the element key(7). After the elements key(3) and key(7) are added to the HashMap, thread T1 executes Entry<K,V> next = e.next; , handed over the use of the CPU, the source code is as follows:
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null! = e) { Entry<K,V> next = e.next;Thread 1 executes here
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code
E in thread T1 points to key(3) and next points to key(7); After thread T2 rehash, the order of the list is reversed, and the position of the list becomes key(5) → key(7) → key(3), where “→” is used to represent the next element.
NewTalbe [I] = e set next for key(3) to key(7). Next element for key(7) is key(3). Thus, circular references to key(3) and key(7) are formed, which leads to the occurrence of an infinite loop, as shown in the figure below:
Of course, the cause of the loop is that JDK 1.7 inserts the list in reverse order at the beginning. This problem has been improved in JDK 1.8, and has been changed to tail order inserts.
Some people have reported this question to Sun, but Sun thinks it is not a problem because HashMap itself is not thread safe. If you want to use multi-threading, ConcurrentHashMap is recommended instead. However, this question is still very likely to be asked in an interview. So a special note is needed here.
summary
This article introduces the underlying data structure of HashMap. In JDK 1.7, HashMap was composed of arrays and linked lists. However, in JDK 1.8, red-black tree structure was added. It also introduces three important methods of HashMap, query, add, and expand, and explains why JDK 1.7 resize() causes an infinite loop in a concurrent environment.