This is the 29th day of my participation in Gwen Challenge
If the profile
- ConcurrentHashMap ConcurrentHashMap is a thread-safe and efficient implementation of HashMap provided in Java and is widely used. The underlying data structure of the classic open source framework Spring is implemented using ConcurrentHashMap.
- ConcurrentHashMap is frequently used in concurrent programming scenarios. This paper analyzes the implementation principle of ConcurrentHashMap and analyzes its implementation principle (JDK1.7).
- It’s already a step up from its thread-safe big brother, HashTable, and as a result its locks are more detailed than the synchronized locks that HashTable adds to almost every method, which can undoubtedly affect performance.
Realize the principle of
As we all know, HashTable is a very efficient data structure with O(1) complexity. In Java development, HashMap and HashTable are the most common and frequently used ones, but they are not reasonable to use in concurrent scenarios with fierce thread competition.
-
HashMap: Let’s start with HashMap. HashMap is thread unsafe. In a concurrent environment, it can form a looped list (possibly during capacity expansion), causing the CPU to idle during GET operations, so using HashMap in a concurrent environment is dangerous.
-
**HashTable: The implementation of HashTable and HashMap is almost the same.
- HashTable does not allow null keys and values;
- HashTable is thread-safe.
- All related operations of get/put are synchronized, which is equivalent to adding a large lock to the entire HashTable. When multithreading access, as long as one thread accesses or operates on the object, other threads can only block. This is equivalent to serializing all operations, which can result in poor performance in a competitive concurrency scenario.
The main reason for the poor performance of HashTable is that all operations need to compete for the same lock. If there are multiple locks in the container, each lock locks a segment of data, so that there is no lock contention when multiple threads are accessing different segments of data, thus effectively improving concurrency efficiency. This is the “segmented locking” idea used by ConcurrentHashMap.
The backbone of ConcurrentHashMap is an array of segments.
final Segment<K,V>[] segments;
Copy the code
-
A Segment inherits a ReentrantLock, so it is a ReentrantLock.
-
ConcurrentHashMap, a Segment is a subhash table, and a HashEntry array is maintained in the Segment. In concurrent environment, the operation of different segments does not consider lock contention.
The default ConcurrentLevel is 16, which in theory allows 16 threads to execute concurrently
Thread synchronization should only be considered for operations on the same Segment, but not for operations on different segments.
A Segment is similar to a HashMap in that a Segment maintains a HashEntry array
transient volatile HashEntry<K,V>[] table;
Copy the code
HashEntry is the smallest logical processing unit we’ve mentioned so far.
A ConcurrentHashMap maintains an array of segments, and a Segment maintains a HashEntry array.
static final class HashEntry<K.V> {
final int hash;
final K key;
volatile V value;
volatile HashEntry<K,V> next;
// Other omissions
}
Copy the code
The Segment is similar to a HashTable, so some attributes are similar to a HashMap (loadFactor, threshold, etc.)
public Segment(float lf, int threshold, HashEntry<K,V>[] tab) {
this.loadFactor = lf;// Load factor
this.threshold = threshold;/ / threshold
this.table = tab;// The trunk array is a HashEntry array
}
Copy the code
Let’s look at the constructor of ConcurrentHashMap
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if(! (loadFactor >0) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
//MAX_SEGMENTS = 1<<16=65536, that is, the maximum number of concurrent connections is 65536
if (concurrencyLevel > MAX_SEGMENTS)
concurrencyLevel = MAX_SEGMENTS;
// Sshif = ssize,sshift=4; ssize=32,sshif=5
int sshift = 0;
//ssize is the length of the segments array, calculated based on concurrentLevel
int ssize = 1;
while (ssize < concurrencyLevel) {
++sshift;
ssize <<= 1;
}
// The segmentShift and segmentMask variables are in
// Use it to locate the segment
this.segmentShift = 32 - sshift;
this.segmentMask = ssize - 1;
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
// Calculate the size of cap, that is, the length of the array of hashentries in the Segment,
Cap must also be 2 to the n.
int c = initialCapacity / ssize;
if (c * ssize < initialCapacity)
++c;
int cap = MIN_SEGMENT_TABLE_CAPACITY;
while (cap < c)
cap <<= 1;
// Create an array of segments and initialize the first Segment.
// Delay initialization of the rest of the Segment
Segment<K,V> s0 =
new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
(HashEntry<K,V>[])new HashEntry[cap]);
Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
UNSAFE.putOrderedObject(ss, SBASE, s0);
this.segments = ss;
}
Copy the code
The initialization method has three parameters. The default values are used if the user does not specify them. InitialCapacity is 16, loadFactor is 0.75, and concurrentLevel is 16.
The size of the Segment array ssize is determined by concurrentLevel, but is not necessarily equal to concurrentLevel. Ssize must be greater than or equal to the lowest power of 2 of concurrentLevel.
For example, by default concurrentLevel is 16, ssize is 16. If concurrentLevel is 14, ssize is 16. If concurrentLevel is 17, ssize is 32.
Why must the size of a Segment array be a power of 2? The index of a Segment can be located using the bitwise and hashing algorithm.
Important attributes
Let’s start with a couple of important attributes that we won’t cover with HashMap, but let’s focus on sizeCtl. It is a highly visible attribute in ConcurrentHashMap because it is a control identifier that has different uses in different places and has different values that represent different meanings.
A negative value indicates that initialization or capacity expansion is in progress
-1 indicates initializing. -N Indicates that N-1 threads are expanding capacity
A positive value or 0 indicates that the hash table has not been initialized. This value indicates the size of the initialization or next expansion, which is similar to the concept of expansion threshold. As you will see later, its value is always 0.75 times the capacity of the current ConcurrentHashMap, which corresponds to loadFactor.
Next, let’s look at the put method
public V put(K key, V value) {
Segment<K,V> s;
//concurrentHashMap Key /value cannot be empty
if (value == null)
throw new NullPointerException();
// The hash function rehashes the key's hashCode to avoid bad, irrational hashcodes and ensure a uniform hash
int hash = hash(key);
// Return hash value unsigned right shift segmentShift bit with segment mask bit arithmetic,
/ / positioning segment
int j = (hash >>> segmentShift) & segmentMask;
if ((s = (Segment<K,V>)
// nonvolatile; recheck
UNSAFE.getObject(segments, (j << SSHIFT) + SBASE)) == null)
// in ensureSegment
s = ensureSegment(j);
return s.put(key, hash, value, false);
}
Copy the code
The main logic of PUT is just two steps:
- Locate the segment and ensure that the positioned segment is initialized
- Call the Segment put method.
About segmentShift and segmentMask
Int j =(hash >>> segmentShift) & segmentMask.
-
SegmentMask: segmentMask. If the segments array length is 16, the segmentMask is 16-1=15. The segments length is 32, and the segment mask is 32-1=31. In this way, all the bits are 1, which better guarantees the uniformity of the hash
-
SegmentShift: 2 to the sshift power equals ssize, segmentShift=32-sshift. If the segments length is 16, segmentShift=32-4=28; If the segments length is 32, segmentShift=32-5=27. The calculated hash value is up to 32 bits. Unsigned right segmentShift means that only the high bits are reserved (the rest are useless) and then the segmentMask bits are computed to locate the Segment.
Get/put method
The get method
public V get(Object key) {
Segment<K,V> s;
HashEntry<K,V>[] tab;
int h = hash(key);
long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
// Locate Segment and HashEntry
if((s = (Segment<K,V>) UNSAFE.getObjectVolatile(segments, u)) ! =null&& (tab = s.table) ! =null) {
for (HashEntry<K,V> e = (HashEntry<K,V>)
UNSAFE.getObjectVolatile
(tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE); e ! =null; e = e.next) {
K k;
if ((k = e.key) == key || (e.hash == h && key.equals(k)))
returne.value; }}return null;
}
Copy the code
The get method does not need to be locked, because the shared variables involved are volatile, which guarantees memory visibility and therefore does not read stale data.
The concurrentHashMap proxy puts the lock on the Segment. It’s just that the lock granularity is fine.
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null :
scanAndLockForPut(key, hash, value);
// If tryLock fails, the list of HashEnry locations will be traversed (traversing is done to make the LIST cached by the CPU), and if not found, HashEntry will be created.
// After tryLock is tried a certain number of times (as determined by the MAX_SCAN_RETRIES variable), the lock occurs. If the linked list header changes due to the actions of other threads during traversal, it needs to be traversed again.
V oldValue;
try {
HashEntry<K,V>[] tab = table;
int index = (tab.length - 1) & hash;
// Locate HashEntry, you can see,
// This hash value is used for locating segments and Segmen
//t is used to locate HashEntry,
// Use only the higher bits in the Segment.
HashEntry<K,V> first = entryAt(tab, index);
for (HashEntry<K,V> e = first;;) {
if(e ! =null) {
K k;
if ((k = e.key) == key ||
(e.hash == hash && key.equals(k))) {
oldValue = e.value;
if(! onlyIfAbsent) { e.value = value; ++modCount; }break;
}
e = e.next;
}
else {
if(node ! =null)
node.setNext(first);
else
node = new HashEntry<K,V>(hash, key, value, first);
int c = count + 1;
// If C exceeds threshold, expand capacity and rehash. The capacity after expansion is twice the current capacity. In this way, the hash of previously hashed entries can be avoided to the maximum extent, which will be analyzed in detail in another article without further details. The process of expanding and rehashing is resource-intensive.
if (c > threshold && tab.length < MAXIMUM_CAPACITY)
rehash(node);
else
setEntryAt(tab, index, node);
++modCount;
count = c;
oldValue = null;
break; }}}finally {
unlock();
}
return oldValue;
}
Copy the code
conclusion
ConcurrentHashMap is a thread-safe and efficient HashTable solution, especially the “segment locking” solution, which provides a significant performance improvement over the full table locking of HashTable.