In the previous chapter, we saw that using hashMap can cause a number of problems at high concurrency. So what do we do when we need to use a storage collection class like hashMap?

1. Concurrent containers

In Java’s collection container framework, there are four main categories: List, Set, Queue and Map. The well-known collection classes ArrayList, LinkedList and HashMap are all non-thread-safe.

Problems occur when multiple threads concurrently access these containers. As a result, writing programs in multithreaded environments requires the programmer to manually synchronize wherever these containers are accessed, making it very inconvenient to use them.

The concurrency class container is designed specifically for multithreaded concurrency, using lock segmentation technology, only the location of the operation is synchronized, but other locations that do not operate can still be accessed by other threads, improving the throughput of the program.

CAS algorithm and part of code use synchronized lock to ensure thread safety.

ConcurrentHashMap implementation analysis

2.1 Implementation under 1.7

2.1.1 Constructor and initialization

ConcurrentHashMap consists of a Segment array structure and a HashEntry array structure. Segment is a ReentrantLock that acts as a lock in a ConcurrentHashMap. HashEntry is used to store key-value pair data. A ConcurrentHashMap contains an array of segments. Segment is an array and linked list structure similar to a HashMap. A Segment contains a HashEntry array. Each HashEntry is an element of a linked list. Each Segment guards an element in a HashEntry array. The data structure is as follows

2.1.2 Construction method

ConcurrentHashMap is initialized using initialCapacity, loadFactor, and concurrencyLevel(concurrencyLevel is the estimated concurrencyLevel of the user, ConcurrencyLevel: DEFAULT_CONCURRENCY_LEVEL = 16;) Initialize the segment array, segment offset segmentShift, segment mask segmentMask, and HashEntry array in each segment.

The concurrency level can be interpreted as the maximum number of threads that can simultaneously update ConccurentHashMap without lock contention. The concurrency level is actually the number of Segment locks in ConcurrentHashMap. ConcurrentHashMap has a default concurrency of 16, but the user can also set the concurrency in the constructor. When the user sets the concurrency, ConcurrentHashMap uses a minimum power of 2 greater than or equal to the value as the actual concurrency (if the user sets the concurrency to 17, the actual concurrency is 32).

If the concurrency is set too small, serious lock contention will occur. If the concurrency is set too high, the access in the same Segment will spread to different segments, resulting in a decrease in CPU cache hit ratio and program performance deterioration. (Depending on the number of concurrent threads you have, too many will degrade performance)

The segments array length ssize is calculated using concurrencyLevel. The segments array must be power-of-two size in order to locate the index of the segments array using the bitwise and hashing algorithm. So the segments array must have a minimum value of 2 to the N greater than or equal to concurrencyLevel. If concurrencyLevel is equal to 14, 15, or 16, ssize will always be 16, which means the number of locks in the container will also be 16.

The input parameters initialCapacity are the initialCapacity of ConcurrentHashMap and loadfactor are the loadfactor for each segment. These two parameters are required to initialize each segment in the array constructor. Cap is the length of the HashEntry array in the segment. It is equal to initialCapacity divided by a multiple of ssize c. If c is greater than 1, the value is greater than or equal to c to the N power of 2. That’s 2 to the N.

By default, ssize = 16, initialCapacity = 16, loadFactor = 0.75F, cap = 1, threshold = (int) CAP * loadFactor = 0.

Since ConcurrentHashMap uses Segment locking to protect data in different segments, the Segment must be located using a hash algorithm when inserting and retrieving elements. ConcurrentHashMap first hashes the element’s hashCode once again using a variant of the Wang/Jenkins Hash.

ConcurrentHashMap allows concurrent read operations without locking. The ConcurrentHashMap implementation ensures that HashEntry is nearly immutable and that keywords are volatile.

2.1.3 get operation

The get operation hashes once, then uses the hash value to hash the Segment(using the high part of the hash value) and hash algorithm to hash the table(using the full hash value). The entire get process is not locked, but volatile to ensure that get always gets the latest value.

2.1.4 put operation

The ConcurrentHashMap initializes the first slot segment[0]. For other slots, the ConcurrentHashMap initializes the first slot segment[0] when inserted.

The ensureSegment method takes into account concurrency, where multiple threads simultaneously enter and initialize the same slot segment[k], but only one success is required.

The concrete implementation is

The put method attempts to acquire the lock through tryLock(). If the lock is obtained, node is null and enters the try block. If the lock is not obtained, the scanAndLockForPut method spins to wait for the lock to be obtained.

The scanAndLockForPut method iterates through the linked list of the corresponding Hashcode in the process of trying to obtain the lock. If no HashEntry node with the same key is found after iterating, a HashEntry is created in advance for subsequent PUT operations. If the lock cannot be obtained after a certain number of tryLock attempts, the lock is applied.

After acquiring the lock, the Segment iterates through the list, updating the HashEntry value if a HashEntry node has the same key.

Otherwise, create a new HashEntry node, use header interpolation, and set it as the new head of the list and the original head as the next node of the new head. If the total number of nodes (including the newly created HashEntry) exceeds threshold during the creation process, the rehash() method is called to expand the Segment and the newly created HashEntry is written into the array.

2.1.5 rehash operations

Expansion creates a new array, migrates data, and finally sets newTable to the property table.

To avoid copying all nodes: Since expansion is based on the power index of 2, assuming that the index of the array corresponding to a HashEntry in the Segment before expansion is I and the array capacity is capacity, then the index corresponding to the HashEntry in the new array after expansion can only be I or I +capacity. Therefore, the index of many HashEntry nodes can remain unchanged before and after expansion.

Given that the original table was 4 in length, the distribution of elements in the table looks like this

After expansion, the length of the table becomes 8, so the distribution of elements in the table becomes:

It can be seen that the subscripts of 34 and 56 remain unchanged, while the subscripts of 15,23 and 77 are +4 based on the original subscripts, which can quickly locate and reduce the number of rearrangements.

Concurrency is not considered in this method because the lock has been acquired before the method is executed.

2.1.6 remove operation

Similar to the PUT method, the lock is required before the operation to ensure thread-safety of the operation.

2.1.7 Weak consistency of ConcurrentHashMap

Then the linked list is traversed to determine whether there is a node with the same key and to obtain the value of the node. However, because other threads may have made changes to the list structure during the traversal, get and containsKey may return outdated data, which is a weak consistency feature of ConcurrentHashMap. If required strong consistency, then you have to use the Collections. The synchronizedMap () method.

2.1.8 size and containsValue

These methods are based on the entire ConcurrentHashMap and work in a similar way: loop through all segments without locking, get the values and modcount sum of all segments. The modCount variable is changed before elements are operated on in the put, remove, and clean methods. If the modCount sum of all segments is equal twice in a row, the ConcurrentHashMap is not modified by other threads and the value is returned.

If the number of cycles exceeds the predefined value, lock all segments and unlock them after obtaining the returned value. So in general, you should avoid using the size and containsValue methods in multithreaded environments.

2.2 Implementation under 1.8

2.2.1 improvement

A: Eliminate the segments field, directly use transient volatile HashEntry<K,V>[] TABLE to store data, and use table array elements as locks, thereby reducing the granularity of locks and further reducing the probability of concurrent conflicts. CAS + synchronized is widely used to ensure concurrency security.

Improvement 2: Change the original table array + one-way linked list data structure to table array + one-way linked list + red-black tree structure. The core capability of a hash table is to distribute the key hash evenly across an array. If the hash is uniform, then each queue in the table array is mainly 0 or 1 in length. However, the actual situation is not always so ideal. Although the default loading factor of The ConcurrentHashMap class is 0.75, in the case of too much data or bad luck, there are still some cases where the queue length is too long. If the one-way list is still adopted, the time complexity of a node query is O(n). Therefore, for lists with more than 8(the default), the red-black tree structure used in JDK1.8 can reduce the query time complexity to O(logN), which can improve performance.

Node (Entry 1.7) is used as the data Node of the linked list, which still contains the key, value, Hash, and next attributes. The red-black tree case uses TreeNode (extends Node).

If the first Node in the array is Node or TreeNode, you can determine whether it’s a linked list or a red-black tree.

A threshold used to determine whether a linked list needs to be converted to a red-black tree

A threshold used to determine whether a red-black tree needs to be converted to a linked list

2.2.2 Core data structures and attributes

Node

Node is the core inner class that wraps key-value pairs.

The definition is basically the same as HashEntry in 1.7. The map itself holds an array of Node types

A find method was added to assist the map.get() method. It’s just iterating through the list, and subclasses override this method.

The Segment class is also defined in the map, but just for forward compatibility.

TreeNode

Tree node class, another core data structure. When the list becomes too long, it is converted to a TreeNode.

Different from HashMap in 1.8:

1. Instead of converting to a red-black tree directly, it places the nodes in a TreeBin object, which wraps the red-black tree.

TreeNode extends from the Node class in ConcurrentHashMap, not from the linkedhashMap. Entry<K,V> class in HashMap, that is, TreeNode has a next pointer.

TreeBin

Responsible for TreeNode nodes. It replaces the root of the TreeNode, meaning that in the actual ConcurrentHashMap “array”, the TreeBin object is stored, not the TreeNode object. This class also has a read-write lock mechanism.

Special ForwardingNode

A special Node with a hash value of -1 that stores references to nextTable. When a table is expanded, the ForwardingNode is used as a placeholder to indicate that the current node is null or has been moved.

sizeCtl

Used to control table initialization and expansion operations.

A negative value indicates that initialization or capacity expansion is in progress

  • -1 indicates initialization
  • -n Indicates that N-1 threads are performing capacity expansion

0 is the default value, indicating that the table has not been initialized

A positive number indicates that the initial size or the number of elements in the Map has reached the point where expansion is needed.

2.2.3 Core methods

2.2.4 Construction method

It can be found that when a new map instance is created, it does not create the array and other related components, but only carries out simple property Settings. Similarly, the size of the table must be a power of 2.

The actual initialization occurs when an element is inserted into the ConcurrentHashMap. For example, when the put, computeIfAbsent, compute, and merge methods are called, the call time is to check the table== NULL.

2.2.5 get operation

The get method is relatively simple. When a key is given to determine a value, two conditions must be met: the keys are the same and the hash value is the same. For nodes that may be in a linked list or tree, search for them separately.

2.2.6 put operation

In summary, the put method is, use a HashMap put method, calculated according to the newly inserted hash value point position in the table I, if I position is empty, it directly, otherwise the judge, if I position is the tree node, according to the way to insert a new node of the tree, otherwise the inserted into the list at the end of the I.

In the overall process, it is defined that no key or value is allowed to be put into the table if the key or value is null. For each value put into the table, the spread method is used to perform a hash calculation on the hashcode of the key, so as to determine the position of the value in the table.

If the location is empty, it is placed directly, and no lock operation is required.

If there is a node at this location, it indicates that a hash collision has occurred, and the type of the node is determined first. If it is a linked list node, the resulting node is the head of the list of nodes with the same hash value. You need to iterate backwards to determine where this new value is added. If the hash and key values are the same as those of the newly added node, you only need to update the value. Otherwise, iterate backwards until the end of the list inserts the node. If the list length is greater than 8 after adding this node, the list is converted to a red-black tree. If the node type is already a tree node, the tree node insertion method is called to insert the new value.

2.2.7 initialization

As mentioned earlier, there is no actual initialization in the constructor; the actual initialization occurs when an element is inserted into the ConcurrentHashMap. The implementation method is initTable

2.2.8 transfer

When the capacity of ConcurrentHashMap is insufficient, expand the table. The basic idea of this method is similar to HashMap, but it is much more complicated because it supports concurrent scaling. We will not go into the source code to tell, just the general principle.

Why concurrent capacity expansion? Since capacity expansion always involves copying from one “array” to another, if this operation can be done concurrently, the concurrent processing can be used to reduce the time impact of capacity expansion.

The capacity expansion operation is divided into two parts:

The first part is to build a nextTable, which is twice as large.

The second part is copying elements from the original table into nextTable, which allows multithreading.

The entire capacity expansion process is traversal and replication:

If the node is null or has been processed, it will be set to forwardNode. When the thread is ready to expand, it will find that the node is forwardNode. Skip this node and continue to look for unprocessed nodes.

If the location is Node (fH >=0), it is a linked list. Construct an anti-linked list and place them in I and I +n of nextTable respectively

If the location is TreeBin (fH <0), do a reverse order and determine whether red-black tree lists are required. Place the result at I and I +n in nextTable respectively

After all nodes have been replicated, make nextTable the new table and update sizeCtl to 0.75 times the new size.

In fact, concurrent expansion is to divide the data migration task into multiple small migration tasks. In the realization, a variable stride is used as the step size control, and each thread is responsible for migrating part of it each time.

2.2.9 emove method

The basic flow of the remove method is similar to that of the PUT method, except that the operation changes from inserting data to removing data, and if there is a red-black tree, it checks whether the red-black tree needs to be turned into a linked list step. Don’t repeat it.

2.2.10 treeifyBin method

Use to convert long linked lists into TreeBin objects. However, it is not a direct conversion, but a capacity judgment, if the capacity does not meet the conversion requirements, directly expand the operation and return; If the criteria are met, the structure of the list is converted to TreeBin. Unlike HashMap, it does not put Treenodes directly into the red-black tree. Instead, it uses a small container called TreeBin to encapsulate all Treenodes.

2.2.11 size method

AddCount () = addCount() = addCount(); addCount() = addCount(); JDK1.7 uses the size() method to calculate size. It doesn’t make much sense to calculate size in a concurrent collection because size is changing in real time.

Implementationally, the core method for calculating size is sumCount()

As you can see, variables of type baseCount, and counterCells are used to count quantities. BaseCount is used to record the number of containers, while counterCells is used to record the number of CAS that failed to update baseCounter due to high concurrency. The changes to these two variables are reflected in the addCount() method, which roughly looks like this:

1. Add CAS to baseCount.

2. If concurrency causes baseCount CAS to fail, use counterCells.

3. If counterCells CAS fails, the fullAddCount method will continue in an infinite loop until it succeeds.

Three HashTable

Synchronized is added purely to the hashMaP method to ensure thread-safety. Hashtables are very inefficient in the context of competitive threads. Because when one thread accesses the synchronization method of HashTable, other threads also access the synchronization method of HashTable, and enter the blocking or polling state. For example, thread 1 uses PUT to add elements, thread 2 cannot use either put method to add elements or GET method to obtain elements. Therefore, the more fierce the competition, the lower the efficiency.

4 FaQs

Q: What is the difference between HashMap and HashTable?

HashMap is thread-safe. HashTable is thread-safe.

(2) HashTable is not as efficient as HashMap because it is thread safe;

(3) HashMap allows only one record to be null, and HashTable does not allow multiple records to be NULL.

(4) The default array size of HashMap is 16, and the default array size of HashTable is 11.

(5) A HashMap recalculates the hash value, whereas a HashTable uses the object’s hashCode directly

Q: What is another thread-safe class in Java that is very similar to HashMap? Also thread-safe, how does it differ from HashTable in thread synchronization?

A: ConcurrentHashMap class (A thread-safe and efficient implementation of HashMap provided in Java and distributed in java.util.concurrent).

HashTable uses the synchronize keyword to lock an object.

For ConcurrentHashMap, segmented locking is used in JDK 1.7. JDK 1.8 directly adopts CAS (lock-free algorithm) + synchronized, also adopts segwise locking and greatly reduces the granularity of locking.

The difference between HashMap and ConcurrentHashMap?

A: There’s not much difference in principle except locking.

In addition, a HashMap allows NULL key-value pairs, but ConCurrentHashMap does not.

In data structure, red black tree related node class

Q: Why is ConcurrentHashMap more efficient than HashTable?

A: HashTable uses A single lock to handle concurrency. Multiple threads compete for A lock and block easily.

ConcurrentHashMap 

JDK 1.7 uses ReentrantLock + Segment + HashEntry, which is equivalent to dividing a HashMap into segments and assigning one lock to each Segment, to support multithreaded access. Lock granularity: Based on Segment and contains multiple Hashentries.

JDK 1.8 uses CAS + synchronized + Node + red-black tree. Lock granularity: Node (the first Node) (implement map. Entry<K,V>). Lock granularity has been reduced.

Q: What is the lock mechanism for ConcurrentHashMap (JDK 1.7 VS JDK 1.8)?

In JDK 1.7, the mechanism of Segment locking is adopted to realize concurrent update operations. The underlying storage structure of array + linked list is adopted, including two core static internal classes Segment and HashEntry.

The Segment inherits a ReentrantLock that acts as a lock. Each Segment object guards buckets in each hash map.

(2) HashEntry is used to encapsulate key-value pairs of the mapping table;

③ Each bucket is a linked list of HashEntry objects.

In JDK 1.8, Node + CAS + Synchronized is used to ensure concurrency security. Delete class Segment, use table array to store key-value pairs. When the length of a HashEntry list exceeds TREEIFY_THRESHOLD, the list is converted to a red-black tree to improve performance. The bottom layer changes to array + linked list + red-black tree.

Q: ConcurrentHashMap In JDK 1.8, why use synchronized instead of ReentrantLock?

A:

1. The JVM development team made a number of performance optimizations for Synchronized in 1.8, and synchronized optimization based on JVM has more space and is more natural.

2. Api-based ReentrantLock can consume more memory for the JVM’s memory stress under a large number of data operations.

Q: ConcurrentHashMap?

A:

①, important constants:

private transient volatile int sizeCtl;

If the value is negative, -1 indicates that initialization is underway, and -n indicates that N-1 threads are expanding.

When the value is 0, the table is not initialized.

If the value is another positive value, it indicates the initial size or the next capacity expansion size.

② Data structure:

Node is the basic unit of storage structure, inheriting Entry from HashMap, used to store data.

TreeNode inherits Node, but its data structure is changed into a binary tree structure, which is used to store data in a red-black tree.

The TreeBin is a container that encapsulates the TreeNode and provides control over conditions and locks for converting red-black trees.

Put (); put();

1. If it is not initialized, the initTable() method is called to initialize it.

2. If there is no hash conflict, CAS is inserted without lock.

3. If you need to expand capacity, expand capacity first.

4. If there is hash conflict, lock is added to ensure thread safety. There are two cases: one is to iterate to the end of the linked list and insert; the other is to insert according to the red-black tree structure;

5. If the number of the list is greater than the threshold 8, the structure of the list should be transformed into a red-black tree, and break enters the loop again

6. If the add is successful, call the addCount() method to count the size and check whether it needs to be expanded.

Transfer () : The default capacity is 16. During expansion, the capacity becomes two times of the original capacity.

HelpTransfer () : It is more efficient to call multiple worker threads to help with capacity expansion.

Get ();

1. Compute the hash value, locate the table index, and return the hash value if the first node matches the hash value.

2. If the node is being expanded, the forwardingNode.find () method will be called to find the node and return if the node matches.

3. If none of the above matches, the node is traversed, and if it matches, then null is returned.

Q: What is the concurrency of ConcurrentHashMap?

A: The maximum number of threads in 1.7 that A program can run simultaneously updating ConccurentHashMap without lock contention. The default is 16 and can be set in the constructor. When the user sets the concurrency, ConcurrentHashMap uses a minimum power of 2 greater than or equal to the value as the actual concurrency (if the user sets the concurrency to 17, the actual concurrency is 32).

In 1.8, concurrency does not have much practical significance. The main use is to increase the initial capacity to the concurrency size when the initial capacity is less than the concurrency size.

Five, the summary

This chapter introduces the source of ConcurrentHashMap and its differences from HashMap. And why we don’t usually use HashTable. In this chapter, we hope that you can learn the design idea of segmentation table and the application of CAS operation, understand the container and its method, and know how to choose the most appropriate container and method in the appropriate scenario.

— — — — — — — — — — — line — — — — — — — — — — — — — —

The next section introduces other concurrent containers and is the final article on concurrency. I will continue to update related topics if I learn new knowledge points later. And update the old blog to make it more detailed. Make it accessible to every developer.