It is not clear what is the root cause of the insecurity in the case of HashMap multi-threading. This can not endure, decisive search for information and their own hands to operate. You have to do it yourself to remember!! Off-topic: March and April this year is different from the past, now the most contact is the temperature gun, health code, masks and so on. Wearing a mask for a day at work makes me feel oppressed. I really hope the epidemic will pass quickly.
Introduction of a HashMap
A HashMap is a collection object stored as a key-value in Java. It uses an array and a linked list structure (red-black tree was added after 1.8). It allows null keys and values.
HashMap in version 1.7
The reason behind the
Dead loops and data loss occur in 1.7. At the same time, the element insertion method is the head insertion method.
The relevant code
TRANSFER method
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
// Add each chain in oldTable back to newTable
for (Entry<K,V> e : table) {
while(null! = e) {// Get e's next node
Entry<K,V> next = e.next;
// Whether rehash is required
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
// Find the array location in the new table
int i = indexFor(e.hash, newC apacity);
// Assign the array values to the next node of the current e
e.next = newTable[i];
// The current value replaces the element in the array
newTable[i] = e;
// Move to the next nodee = next; }}}Copy the code
Where multithreading problems may occur in the transfer method above are:
- Run with multiple threads. E.next = newTable[I]; Running on a single thread is no problem. After thread A runs the full expansion code, thread B does it
while
When you loop, all the elements are pointing backwards, it’s an infinite loop. At the same time, if there is data after the list, the value of the remaining node will not be found during thread B’s loop (1.8
Inside, two pairs of temporary variables are used to ensure that there is no generationnewTable
The previous is not changing the previous structure).
HashMap in version 1.8
The reason behind the
The 1.8 version of HashMap is modified over 1.7. Although it does not have the same situation as 1.7, it does have a new problem, that is, data overwriting. At the same time, 1.8 uses tail interpolation for element insertion.
The relevant code
PUT method
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict)
Node<K,V>[] tab; Node<K,V> p; int n, i;
// Check whether the table is empty. If it is empty, create a table and get its length
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// If there is no data before the calculated index position, it is directly inserted
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
// The index position is already indexed
Node<K,V> e; K k;
// Check whether the put data is the same as the previous data
// The value of the key is overwritten by its equals()
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// Check if it is a red-black tree. If it is a red-black tree, insert it directly into the tree
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// If it is not a red-black tree, each node is traversed to determine whether the list length is greater than 8. If it is, the list is converted to a red-black tree
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// Determine if the key of each element in the index is the same as the key to be inserted
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}// If e is not null, the loop is not iterated at the end of the loop, indicating that the linked list has the same key, so just overwrite value and return oldValue
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
returnoldValue; }}Therefore, a key-value is inserted and the number of internal structure changes is recorded. The fast-fail mechanism is used to record the number of internal structure changes
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
Examples of places where multithreading can occur in the put method above are:
-
If (p = TAB [I = (n-1) &hash]) == null) if (p = TAB [I = (n-1) &hash]) == null) When A thread A execution after this code suspended due to run out of time slice, and thread B time slice after insert the elements in the subscript place, completed the normal insert, and then thread A time slice, because before has conducted A hash collision judgment, all this time won’t judge, but direct insertion, This results in thread A overwriting the data inserted by thread B, making it unsafe.
-
++size, this code is used to manipulate the current collection size. When thread A and thread B perform the put operation at the same time, assume that the size of the current set is 10. When thread A executes this line of code, it obtains the value of size from the main memory and prepares to perform the +1 operation. However, it has to give up the CPU due to the time slice running out. Thread A takes the CPU again and writes size=11 back to main memory. Then thread A takes the CPU again and writes size=11 back to main memory. When thread B is done with the PUT operation, it writes size=11 back to main memory. Thread A and thread B both performed A PUT operation, but the value of size only increased by 1, which made the thread unsafe due to data overwriting.
The RESIZE method
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
// Only if it is not the first capacity expansion (first capacity expansion in the first put)
if (oldCap > 0) {
// oldCap MAXIMUM_CAPACITY(2^30).
// If oldCap can also be changed to integer. MAX_VALUE = (1<<31) -1. So every time hash& (cap-1). You know that the lowest possible value for any hash value is zero, so there are only 2^30 possible subscripts, and there are 2^30 minus 1 subscripts that are not used.
// If MAX_VALUE(2^31-1) is maximum_value (2^31-1), half of the space is wasted.
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
// Parameter initialization will enter here, mainly to recalculate threshold
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
// No-parameter initialization will enter here
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// Reset threshold
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
/ / capacity
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
// Copy the data to the new table
if(oldTab ! =null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// If there is only one node, it is directly assigned
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// If the node is a tree node type, red-black tree processing is performed
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else {
// The reason why we define two first and two last objects is because the index of the elements in the linked list is either the old index +oldCap or unchanged after expansion
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
// Iterate through the bucket list
do {
next = e.next;
// The subscript is not changed
if ((e.hash & oldCap) == 0) {
if (loTail == null)
// The first node
loHead = e;
else
// Add to the tail
loTail.next = e;
// Adjust the tail element
loTail = e;
}
// Subscript changes
i //Hash map (+oldCap); //Hash map (+oldCap); //Hash map (+oldCap)
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// The original subscript corresponds to the linked list
if(loTail ! =null) {
// The next node is set to null, the code is strict (because after the above processing, the last node next points to itself, so it forms a loop node)
loTail.next = null;
newTab[j] = loHead;
}
// The new subscript corresponds to the linked list
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
Examples of possible multithreading problems in the resize method above are:
- Threads A and B run simultaneously
resize
, thread A completes first and then inserts dataresize
In this case, the data update made by thread A will be lost. Thread A inserts data that thread B has already looped throughresize
The node returned does not contain the insert of thread A.
conclusion
The thread insecurity of HashMap is mainly reflected in the following two aspects:
- In JDK1.7, circular chains and data loss occur when concurrent capacity expansion operations are performed.
- In JDK1.8, data overwrite occurs when concurrent PUT operations are performed.
If you have any other ideas, please comment below. I will check them out in detail. Weekend do not go out, and can save a mask 😃.
Why is HashMap thread unsafe in JDK1.7 and JDK1.8?