This article will take you through the problems associated with using HashMap for multi-threaded concurrency from a security perspective, and take you through the core techniques of ConcurrentHashMap in depth.
Abstract:
HashMap
The core technologyConcurrentHashMap
The core technology- Practical application of segmenting lock
Blog address: sourl.cn/r3RVY8
HashMap
Before JDK 1.7, we used the array and linked list combination structure, as shown in the following figure:
The newly added element is modulo positioned in the Table array and then added to the head of the list so that it can be accessed quickly the next time it is fetched.
When accessing data, it also locates the positions in the array by taking modules, and then iterates through the linked list to obtain the corresponding elements.
If there are too many elements in HasMap, it can lead to a long list at one location. The original O(1) search performance may degrade to O(N), seriously reducing the search efficiency.
To avoid this situation, HasMap elements are automatically expanded and reassigned when the number of HasMap elements meets the following criteria.
// size: the actual number of elements in the HashMap
//capacity: specifies the HashMap capacity, that is, the length of the Table array. The default value is 16
//loadFactor: indicates the loadFactor. The default value is 0.75
size>=capacity*loadFactor
Copy the code
HasMap doubles the capacity and migrates elements from the original array to the new array.
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
// The following code causes a dead chain
e.next = newTable[i];
// Insert into the list header,newTable[i] = e; e = next; }}}Copy the code
Moving elements from the old array to the new array still uses “head insertion”, which causes the new list elements to be sorted in reverse order.
In the case of concurrent expansion of multiple threads, linked lists may form dead links (ring linked lists). Any attempt to find an element would cause the thread to go into an infinite loop, causing CPU usage to spike.
Online detailed analysis of the formation process of the dead chain is more, here will not explain in detail, you can read the following **@ Chen Hao ** article.
Article address: coolshell.cn/articles/96…
JDK1.8 improvement scheme
The underlying structure of JDK1.8 HashMap is completely refactored, using arrays plus linked lists/red-black trees.
The new element still gets the position of the Table array by modulo, and then adds the element to the end of the list. Once the number of linked list elements exceeds 8, it is automatically converted to red black tree, which further improves the search efficiency.
Interview question: Why red black trees here? As opposed to some other binary tree?
Because JDK1.8 lists use the “tail insertion” method, to avoid the possibility of linked lists forming dead links in concurrent expansion.
Is HashMap concurrency safe in JDK1.8?
There is no such thing as concurrent multithreading, and a HashMap can cause data loss.
Here is the JDK1.8 test code:
The output on my computer is as follows, the data has been lost:
From the source code point of view, concurrent process data can be lost for the following reasons:
Overrides concurrent assignments
In the case of concurrency, an assignment from one thread may be overwritten by another thread, resulting in object loss.
Size calculation problem
After each element increment, size will increase by 1. The ++ I method is used here, and natural concurrency is not safe.
There are many possible reasons for object loss problems, but here are just two obvious ones to compare.
Of course, there are data loss issues in JDK1.7, and the reasons are similar.
Once the problem of dead chain occurs and the machine CPU spikes, we can easily find it through system monitoring.
But data loss is harder to spot. Because the data loss process is very long, it usually takes a period of time for the system to run, and in this case, there is no dirty data. Only when there are some weird situations, we can go to the investigation, and this kind of problem investigation is more difficult.
SynchronizedMap
In the case of concurrency, we can use the JDK to provide SynchronizedMap for security.
SynchronizedMap is an internal class and instances can only be created in the following way.
Map m = Collections.synchronizedMap(newHashMap(...) );Copy the code
SynchronizedMap source code:
Each method will be locked using the synchronized keyword to ensure concurrency security.
Because multiple threads share the same lock, only one thread is allowed to read or write at a time, and other threads must wait, which greatly reduces performance.
In most service scenarios, multiple read operations do not conflict, and SynchronizedMap greatly limits read performance.
So SynchronizedMap is rarely used for multi-threaded concurrent scenarios.
ConcurrentHashMap
Since multiple threads share a lock, performance degrades. So imagine if we had more locks to split threads, reduce lock collisions, and increase concurrency.
ConcurrentHashMap uses this approach to ensure data security and efficiency in concurrent processes.
JDK1.7
The JDK1.7 ConcurrentHashMap data structure is as follows:
Segament is a ConcurrentHashMap inner class with the same underlying structure as HashMap. In addition, Segament inherits from ReentrantLock. The class diagram is as follows:
When a new element is added to ConcurrentHashMap, the corresponding Segament is first found based on the key hash value. Then lock Segament directly. If successful, follow the same steps as HashMap.
Because of the lock, all Segament operations are concurrency safe, and since other Segament are not occupied, concurrencyLevel can support thread-safe concurrent reads and writes.
Size statistics problem
Although ConcurrentHashMap introduces segwise locks to solve the problem of multi-threaded concurrency, it also introduces new complexity, resulting in the complexity of calculating the number of ConcurrentHashMap elements.
Since the ConcurrentHashMap elements are actually distributed in Segament, the Segament array sum can only be iterated to count the actual number.
In order to ensure the accuracy of data, we need to lock all Segament in this process, and unlock them after calculation. Doing so, however, will block write operations and degrade ConcurrentHashMap performance.
So the ConcurrentHashMap#size statistic method is optimized.
Every time a Segment is modified (write, delete), increments modCount by 1. As long as the sum of all Segment modcounts obtained by two adjacent computations is the same, it means that there is no write or delete in the two computations and the statistical quantity can be directly returned.
If the Segment is not consistent for three times, there is no choice but to recalculate the Segment.
It should be noted that the size quantity obtained here cannot be 100% accurate. This is because after the Segment is finally unlocked, another thread may enter the write operation. This results in a discrepancy between the number returned and the actual number.
However, it is acceptable to stop writing to all elements for statistical purposes.
Performance issues
Imagine an extreme case where all writes fall into the same Segment. This causes the ConcurrentHashMap to degenerate into SynchronizedMap, which collectively grab locks.
JDK1.8 improvement scheme
After JDK1.8, ConcurrentHashMap eliminates the piecewide lock design, further reducing the occurrence of lock conflicts. In addition, the red-black tree structure is introduced to further improve the search efficiency.
The data structure is as follows:
Each Node in the Table array can be treated as a lock, which avoids Segament degradation.
In addition, once The ConcurrentHashMap is expanded, the Table array elements will increase, the number of locks will increase, and the concurrency will increase.
Writing element source code is complicated, here you can refer to the following flow chart.
In general, JDK1.8 uses CAS and synchronized to ensure concurrency security.
Size method optimization
JDK1.8 ConcurrentHashMap#size
For this method we need to know two important variables:
baseCount
CounterCell[] counterCells
BaseCount records the number of elements, which will be updated using CAS after each element change.
If multiple threads concurrently add new elements and baseCount update conflicts, CounterCell will be enabled to reduce contention by using CAS to update the total to the corresponding position in the counterCells array.
If CAS fails multiple times to update a location in the counterCells array, this indicates that multiple threads are using the location. In this case, conflicts will be reduced again by expanding counterCells.
With this effort, it is very easy to count elements, and no lock is required to count baseCount and counterCells.
On closer inspection, counterCells also reduces multithreaded contention with a similar idea of segmented locking.
Practical application of segmenting lock
ConcurrentHashMap uses segmental locks to reduce the granularity of locks and improve concurrency. We can learn from this design to solve some hot data update problems.
For example, if we have a payment system now, every time a user makes a payment, the merchant’s account balance increases accordingly.
When a large number of users pay at the same time, the same merchant account balance is updated concurrently.
To ensure data security at the database level, row locks will be used for each update. In the case of concurrent updates, only one thread can acquire the lock and update the data, while the other threads have to wait for the lock to be released. This is likely to cause other threads to take too long to update the balance, or even transaction timeout, balance update failure.
This is a typical hot data update problem.
The actual cause of this problem is that multiple threads concurrently rob row locks. If there are multiple row locks, can lock conflicts be reduced?
Yes, here we use the ConcurrentHashMap fragment lock design to create multiple shadow accounts under the merchant’s account.
Then, each time the balance is updated, a shadow account is randomly selected for corresponding update.
In theory, we can create an infinite number of shadow accounts, which means we can increase the concurrency capability indefinitely.
Here we thank **@why** god for putting forward the concept of shadow account. If you are interested, you can search and follow the public account: WHY Technology
The introduction of new solutions in architectural design means the introduction of new complexity, and we must consider these issues and balance the design.
Although the introduction of shadow accounts solves the problem of hot data, it becomes very troublesome to count the total balance of merchants. We have to count the balance of all sub-accounts.
In addition, in the actual business scenario, the merchant balance will not only increase, but also may be deducted accordingly. It is possible that the total merchant balance is sufficient, but the selected shadow account balance is insufficient.
What about this? I’ll leave you to think about that. Do not know each reader has encountered this kind of similar problem, welcome to leave a message to discuss.
If you are interested, we can talk about hot account solutions in more detail in the following articles.
conclusion
HashMap has the possibility of dead links and lost data in the process of multi-threaded concurrent use, so it is not suitable for the scenario of multi-threaded concurrent use. We can use it in the local variable of the method.
SynchronizedMap, while thread-safe, is also poorly suited for use in multiple threads due to its large lock granularity and poor performance.
ConcurrentHashMap is suitable for multithreading because it uses multiple locks to reduce the probability of concurrent contention and improve the concurrency.
Last but not least, don’t use ConcurrentHashMap when it comes to multi-threaded environments. If only Map is used as a global variable, and the variable is initially loaded, then the data does not change. It is recommended to use the invariant collection class Collections#unmodifiableMap or Guava’s ImmutableMap. The advantage of immutable collections is that they prevent other threads from sneaking changes that could cause business problems.
The classic idea of ConcurrentHashMap segmental locking can be applied to hot update scenarios to improve update efficiency.
It is important to remember, however, that when we introduce new solutions to solve problems, new complexity is introduced, leading to other problems. The process is to think through these questions first, and then make some trade-offs.
The resources
- Code out efficient Java development manuals
- www.jasongj.com/java/concur…
One last word (for attention)
See here, click “follow”, click “like”. Not next time, big brother. Writing an article is hard. You need some positive feedback.
If you find something wrong, please leave a message and point it out to me so that I can modify it.
Thank you for reading, I insist on original, very welcome and thank you for your attention
Welcome to pay attention to my public account: procedures to get daily dry goods push. If you are interested in my topics, you can also follow my blog: studyidea.cn