Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.
Java Hashmap and corresponding optimization are common knowledge points of interview. Interview is for selection, and it is often difficult to involve logical reasoning. Some points can open the gap between people. Therefore, I want to reason with you over the key points and difficulties. After the reasoning, I hope that the age-old problem of Hashmap will become no longer difficult.
How is hashMap implemented
The basic principle for implementing hashMap is simple:
- Hash array single chain bucket
- Table array to hold buckets
The classic picture is as follows:
Put the process
- The put method evaluates the key first
- Overwrite it if it exists
- Use addEntry when you’re not
- Decide whether to expand or not
Expansion and judgment
- Load factor determines capacity expansion
- Double capacity expansion bucket
- Existing data is hashed again
Put hit barrels
- New Entry bucket header
- Put the original list next
Null key processing
If the key is empty, the PUT/GET process considers the hash value of null to be 0.
The get process
Get evaluates the hash of the key, locates the bucket, and iterates over the single linked list. Until an equal key is found. If the list is not found, the key does not exist and null is returned.
- The get method evaluates the key first
- Find the bucket to iterate over the key
- Until key and so on return
Red-black tree optimization search efficiency
- 1.8 Add red black tree
- Long lists are also efficient
- Query efficiency LogN
What’s the problem? Infinite loop
This is fine if you use hashMap in a single thread. If you introduce multi-threaded concurrency, you may find that the CPU usage is 100%, which is very high. By looking at the stack, you’ll be surprised to see that threads hang on the Get method of the HashMap, and the problem disappears after the service restarts.
Why is that?
That’s what reasoning is about.
As mentioned above, a HashMap uses an Entry array to hold an array of keys and values. When hash conflicts exist, the entries are connected by a linked list. During expansion, a larger array will be created and elements will be transferred. The logic of movement is also clear, iterating through the linked list of each bucket in the old table and rehashing each element to find a home in the new newTable.
Here is the code for the Java 1.7 implementation:
void resize(int newCapacity) { // Pass in a new capacity
Entry[] oldTable = table; // Reference the Entry array before expansion
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before expansion is (2^30)
threshold = Integer.MAX_VALUE; // Change the threshold to the maximum value of int (2^31-1), so that there is no future expansion
return;
}
Entry[] newTable = new Entry[newCapacity]; // Initialize a new Entry array
transfer(newTable); // Move all the elements of the original table into newTable
table = newTable; // Assign newTable to table
threshold = (int)(newCapacity * loadFactor);// Change the threshold
}
Copy the code
void resize(int newCapacity) { // Pass in a new capacity
Entry[] oldTable = table; // Reference the Entry array before expansion
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before expansion is (2^30)
threshold = Integer.MAX_VALUE; // Change the threshold to the maximum value of int (2^31-1), so that there is no future expansion
return;
}
Entry[] newTable = new Entry[newCapacity]; // Initialize a new Entry array
transfer(newTable); // Move all the elements of the original table into newTable
table = newTable; // Assign newTable to table
threshold = (int)(newCapacity * loadFactor);// Change the threshold
}
Copy the code
So why does this implementation have problems?
Assume that thread 1 and thread 2 are expanded when they put, and the values of local variables E and next of thread 1 are as follows.
If the CPU time slice is switched to thread 2, thread 2 completes the move, assuming that the hashes of A, B, and C are still in the same bucket, the result is as follows
A, B, and C are still in the same bucket, but the order of these entries in the list is reversed.
At this point, thread 1 will continue to switch to run, then thread 1’s newtable will form a ring, as follows
In thread 1, e is A,next is B, and because thread 2 moved, B’s next is A, then according to the above logic, thread 1’s bucket list will have a ring, B refers to each other.
Assume that the final capacity expansion of thread 1 is successful. Thread 1’s newTable is renamed as table and then gets data, hashing the bucket, but if the key is not a or B, an infinite loop will occur.
To summarize, there is an infinite loop that causes CPU spikes. This is a problem with the hashMap expansion implementation. To summarize:
- Concurrent movement is risky
- New barrels appear in chain ring
- There is no dead loop when fetching key
Fix the problem -1.8 implementation
The root cause is that each insertion is placed at the head of the list, and data is moved from the head of the list, so an endless loop can occur. Java 1.8 has been optimized so that each new element is inserted at the end of the list.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);/ / stern interpolation
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
p.next = newNode(hash, key, value, null); / / stern interpolation