As mentioned in the previous article, ThreadLocalMap uses the open address method to resolve conflicts, while HashMap uses the linked list method to handle conflicts. What is linked list method?
In the hash table, each “bucket” or “slot” has a linked list, and all elements with the same hash value are placed in the linked list with the same slot.
Jdk8 differs from JDK7 in that there are no red-black trees and only linked lists are mounted in arrays. In JDK8, if the bucket capacity is greater than or equal to 64 and the list node number is greater than or equal to 8, the tree is converted to a red-black tree. When the number of red-black tree nodes is less than 6, it will be converted to a linked list.
insert
But when inserting, we just need to compute the corresponding slot through the hash function and insert it into the corresponding linked list or red-black tree. If the number of elements exceeds a certain value, expansion is performed and rehash is performed.
Find or delete
Compute the corresponding slot through the hash function, and then traverse the list or delete
Why do linked lists become red black trees?
As mentioned in the previous article, the load factor is used to determine how many free slots there are. If the load factor is exceeded, the HashMap is dynamically expanded to double its original size (the initial capacity is 16, that is, the slot (array) size is 16). However, no matter how reasonably the load factor and hash function are set, it cannot avoid the problem that the linked list is too long. Once the linked list is too long, searching and deleting elements will be time-consuming and affect the performance of HashMap. Therefore, in JDK8, it is optimized to convert the linked list into a red-black tree when the length of the linked list is greater than or equal to 8. The performance of a HashMap can be improved by taking advantage of the characteristics of a red-black tree (the time complexity of lookup, insert, and delete is O(logn) at worst). When the number of nodes is less than six, the red-black tree is converted to a linked list. Because in the case of small amount of data, red black tree to maintain balance, compared with linked lists, performance advantage is not obvious, and coding is much more difficult than linked lists.
Source code analysis
Constructor and important properties
public HashMap(int initialCapacity, float loadFactor);
public HashMap(int initialCapacity);
public HashMap(a);
Copy the code
In the constructor of a HashMap, you can specify the initialization capacity (bucket size) and load factor, respectively. If you do not specify the default values are 16 and 0.75, respectively. It has several important properties as follows:
// Initialize capacity, must be 2 to the NTH power
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Default load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// The minimum length of a list node to convert from a list to a red-black tree
static final int TREEIFY_THRESHOLD = 8;
// The minimum size of the array when converted to a red-black tree
static final int MIN_TREEIFY_CAPACITY = 64;
// If the number of nodes in a red-black tree is less than 6, the tree is converted to a linked list.
static final int UNTREEIFY_THRESHOLD = 6;
// HashMap threshold, used to determine whether to expand capacity (threshold = capacity *loadFactor)
int threshold;
// Load factor
final float loadFactor;
// List nodes
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
// Hold an array of data
transient Node<K,V>[] table;
// Red black tree node
static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
}
Copy the code
The table above is an array of data stores (can be called buckets or slots), and arrays mount linked lists or red-black trees. It is worth noting that the array capacity is not initialized when the HashMap is constructed, but when the element is first put.
Design of hash functions
int hash = (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
int index = hash & (tab.length-1);
Copy the code
Key.hashcode () % tab.length; key.hashcode () % tab.length; A % B = A & (B-1); A % B = A & (B-1); Index = hash & (tab.length-1)
Here is designed using the concept of division remainder method, which can possibly reduce hash conflict division remainder method: Divide the keyword K by some number p that is no longer than the length of the hash table m, and use the remainder as the hash table address. For example, x/8=x>>3, that is, move x 3 bits to the right to obtain the quotient of x/8.
Hash (h = key.hashcode ()) ^ (h >>> 16). So why did we move 16 to the right? Is it not good to just use key.hashcode () & (tab.length-1)? If you do this, the tab.length value will be much smaller than the hash value, so only the low bits will participate in the operation, and the high bits will do nothing, risking hash collisions.
However, hashcode itself is a 32-bit orthopedic value. After shifting 16 bits to the right and then xOR operation, the calculated orthopedic value will have high and low properties, so that a very random hash value can be obtained. Through the division remainder method, the obtained index has a lower probability to reduce conflicts.
Insert data
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 1. If the array is not initialized, initialize it
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 2. If no data is inserted into the current node (no collision), insert a new node directly
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// 3. If the same key exists, overwrite it
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
// 4. If it is found to be a tree structure after collision, mount it to a tree
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
// 5. Insert tail, if the number of linked list nodes reaches the upper limit, convert to red black tree
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// 6. There is a collision in the list
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}// 7. Replace old value with new value
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// 8. Expand the operation threshold
if (++size > threshold)
resize();
// implement LinkedHashMap
afterNodeInsertion(evict);
return null;
}
Copy the code
Briefly describe the logic of PUT, which consists of the following steps:
- If not, initialize the array with an initial capacity of 16
- Hash &(n-1) is used to get the array subscript. If this position is empty, the data is not colluded and inserted directly
- If a collision occurs and the same key exists, it is overwritten directly in subsequent processing
- If it is found to be a tree structure after collision, it will be mounted directly to a red-black tree
- After the collision, it is found to be a linked list structure, and tail insertion is carried out. When the linked list capacity is greater than or equal to 8, it is converted to a tree node
- If a collision is found in a linked list, direct overwrite is processed later
- If the same key exists before, just replace the old value with the new value
- If the map capacity (number of storage elements) exceeds the threshold, expand the capacity twice
capacity
The resize() method initializes the array if it is found that the current array is uninitialized. If it is initialized, it will double the size of the array and rehash the data from the old array to the new array.
// The code to double the size of the array is omitted.
if(oldTab ! =null) {
// Iterate over the old array
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// 1. If there is no collision in the old array, move to the new array directly
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// 2. If there is a collision and the node type is tree node, split the tree node (mount to the expanded array or convert to a linked list)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// 3. If the conflict is a linked list, the order of the nodes will be preserved
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// 4. Determine whether the element is in its original position after expansion (this is very smart, will be analyzed below)
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// 5. The element is not in its original position
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// 6. Copy the elements that have not changed index to the new array
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// 7. Copy the element that changed the index position to the new array
if(hiTail ! =null) {
hiTail.next = null;
// 8. Index = j+oldCap
newTab[j + oldCap] = hiHead;
}
}
}
}
}
Copy the code
The entire rehash process is shown in the code above, iterating through the elements of the old array and then doing the following
- If there is no data collision in the old array (no linked list or red-black tree is mounted), then the element is assigned directly to the new array, where
index=e.hash & (newCap - 1)
. - If there are collisions and the node type is tree, split the tree node (mount it to an expanded array or convert it to a linked list)
- If there is a collision and the node is a linked list, the rehash process will preserve the original order of the nodes in the case of the linked list.
- To determine whether the element is still in its original position after expansion, pass here
(e.hash & oldCap) == 0
OldCap indicates the size of the array before expansion. - Update the hiTail and hiHead pointing relationship when the element is not in its original position
- Copy the expanded elements that have not changed index into the new array
- Copy the element whose index position changed after the expansion into the new array with the subscript is
j + oldCap
.
Points 4 and 5 divide the linked list elements into two parts (do.. While section), part is the unchanged element of the index after rehash, and part is the element whose index was changed. Two Pointers are used to point to the head and tail nodes.
For example, when oldCap=8,1–>9–>17 is mounted on TAB [1], and after capacity expansion,1–> 17 is mounted on TAB [1], and 9 is mounted on TAB [9].
So how do you determine if the index has changed after rehash? And now what is the index?
The design here is very clever, remember the array size of HashMap is 2 to the NTH power? When we calculate the index position, we use e.hash (tab.length-1).
Here we discuss the process of expanding the array size from 8 to 16.
tab.length -1 = 7 0 0 1 1 1
e.hashCode = x 0 x x x x
==============================
0 0 y y y
Copy the code
You can see that before the expansion, the index position was determined by the lower three digits of hashCode. What about after expansion?
tab.length -1 = 15 0 1 1 1 1
e.hashCode = x x x x x x
==============================
0 z y y y
Copy the code
After expansion, the index position is determined by the lower four digits, and the lower three digits are the same as before expansion. That is, whether the index position changes after the expansion is determined by the high byte, that is, we only need to calculate hashCode and the high byte to determine whether the index has changed.
The high position after expansion is the same as that of oldCap. As shown above, binary 15 is 1111 and binary 8 is 1000, both of which have the same high order. Therefore, we can judge whether index changes by the result of e.hash & oldCap operation.
Similarly, index should be changed after expansion. The value of the new index is also different from that of the oldIndex. The new value is exactly the value of oldIndex + oldCap. So when index changes, the new index is j + oldCap.
At this point, the resize method ends and the element is inserted in its place.
get()
The get() method is relatively simple, and the most important thing is to find where the key is stored
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// 1. First (n-1) &hash the element position
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// 2. Determine if the first element is the element we are looking for
if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
if((e = first.next) ! =null) {
If the node is a tree, look for elements in the red-black tree
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
4. Find the corresponding node in the linked listdo {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
remove
The process of finding nodes in the remove method is the same as that in the get() method. Here we mainly analyze how to deal with finding nodes
if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
// 1. Delete the tree node. If the deletion is unbalanced, the node will be moved again
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
// The deleted node is the first node in the list, so the second node is directly assigned to the first node
else if (node == p)
tab[index] = node.next;
// The deleted node is the middle node of the list, where p is the prev node of node
else
p.next = node.next;
++modCount;
--size;
afterNodeRemoval(node);
return node;
}
Copy the code
In the remove method, the most complicated part should be removeTreeNode, because after deleting the red-black tree node, it may need to degenerate into a linked list node, or it may need to move the node location because it does not meet the characteristics of red-black tree. There’s a lot of code, so I won’t post it here. But that’s why you don’t use all red-black trees instead of linked lists.
An infinite loop problem occurred during JDK7 capacity expansion
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if(e ! =null) {
src[j] = null;
do {
// The B thread pauses at this point
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
// The element will be placed at the head of the linked list, so the data will be inverted after expansion
newTable[i] = e;
e = next;
} while(e ! =null); }}}Copy the code
The code above is easy to cause an infinite loop during expansion, how is it caused? Suppose there are two threads A and B executing this code, and the array size increases from 2 to 4. Before the expansion, TAB [1]=1–>5–>9.
Thread B continues execution when next = E.next gives up the timeslice and thread A executes the entire code but has not set the internal table to the new newTable.
After thread A completes execution, the elements mounted on TAB [1] are 9–>5–>1. Note that the order is reversed. E = 1, next = 5;
TAB [I] in cycles change order, 1. The TAB [I] = 1, 2. TAB [I] = 5 – > 1, 3) TAB [I] = 9 – – > 5 – > 1
Similarly, thread B is analyzed by the number of cycles
- After the first loop execution,newTable[I]=1, e = 5
- After the second loop is complete: newTable[I]=5- >1, e = 1.
- For the third loop,e has no next, so next points to null. When performing e.n ext = newTable [I] (1 – > 5), was formed 1 — > 5 – > 1 ring, perform newTable [I] = e, the newTable [I] = 1 – > 5 — — > 1.
An infinite loop occurs when a key is found at get in the array, causing 100% CPU problems.
JDK8 doesn’t have this problem, it has an optimization here that uses two Pointers to the head node and one to the tail node, while maintaining the original order of elements. Of course HashMap is still insecure, so ConcurrentHashMap is recommended for multithreaded concurrency.
Your “like” is the biggest support for me, of course, you follow me even better