preface
Why do read operations on ConcurrentHashMap not need to be locked?
The first time I saw this problem I was really confused. I know how to make ConCurrentHashMap thread safe for put, but I haven’t really looked at how to make get thread safe for get
Analysis of the
How does ConcurrentHashMap ensure thread-safety? What methods do I use to ensure thread-safety from initialization and new elements
I met ConcurrentHashMap
First of all, let’s look at what ConcurrentHashMap is used in that scenario. From the name, we know that the first class should be in the java.util.concurrent package (JUC), an important package for common thread-safe collections, and HashMap.
Data structure comparison
HashMap we know that a hash table is a collection of two columns. Is a data structure that can quickly locate a value based on its key value. Since we talk about this here, then we should be clear about why there is such a data structure, it is reasonable to exist, then there must be a need to produce such a data structure, then compare the performance of common data structure in the time complexity of adding and deleting
- Array: Array is to use a contiguous address space to store the elements, you can specify a subscript to quickly find value, the time complexity is O (1), but by the given value to look up, the time complexity is O (n), because want to traverse the entire array to do match, if is orderly array We can check it on the reduced dichotomy method such as time, It’s O(long n), but for normal insert and delete operations, it’s possible to move an array element at that time and this is O(n) on average.
- List: Linked list storage structure is associated with an end node, so this time don’t need a contiguous address space and arrays, this way or do arrays and lists the contrast, the array to store the need to open up a contiguous address space to store data, but for some big objects Will be assigned a contiguous address space, It’s hard to allocate if the time space is tight, but for a storage structure like a linked list, it’s ok because it doesn’t have to create a contiguous address space, it just has to do it piece by piece, with the head and tail nodes. However, the structure of linked list also has certain defects, that is, it takes up more space to maintain the head and tail nodes, and it is easy to produce space debris. For this structure optimization, we can from Redis quicklist to some ideas, interested in yourself can see, powerful Redis is how to deal with the underlying data storage. I’ll blog about it later when I have time. Linked lists are faster to insert and delete because of their storage structure, O(1), but when you query, you have to start from scratch, O(n).
- Binary tree: If it is a balanced binary tree, the average time complexity of insertion, deletion and query is O(long n). HashMap and ConcurrentHashMap use linked lists and red-black trees when storing hash conflicting lists. When the length of the list is larger than 8, it changes to red-black trees. When the element of the red-black tree is less than 6, it will transform the linked list. I believe that the source partner should know why the specific is 8,6, which must be balanced by the efficiency of the insert and query considerations ~
- Hash table: Compared with the above data structures, hash tables have an advantage in this respect, where the insert, delete and query methods are O(1), of course, without considering the hash conflict or from the conflicting list to compare the query. Therefore, it is important to calculate the hash function of the slot position in the hash table. Whether this function is good or not directly determines the probability of hash conflict.
How do we calculate the slot position in several common hash tables
- HashTable: [(key.hashCode() & 0x7fffff) % tab.length]; // 0x7FFFFFFF means that the maximum value of int in binary notation is all 1 and the first is 0, which is the sign bit, I did the bit and operation with the hash value to ensure that the first value must be 0 and I got a positive number. You can look it up on the Internet for specific reasons. It is mainly for the range problem.
- [(h = key.hashcode ()) ^ (h >>> 16)]] [(n-1) & hash], why does h move 16 bits to the right of the comment? It is said above that it is in order to make the influence of the high mask downward as much as possible, and reduce hash conflicts.
- ConcurrentHashMap: [(h=key.hashCode() ^ (h >>> 16)) &0x7fffffff] calculate index: [(n-1) &hash)]
A HashMap and ConcurrentHashMap
First of all, both classes implement the Map interface and inherit AbstractMap abstract classes, so we don’t have to worry about switching from HashMap to ConcurrentHashMap because the methods are almost the same. The reason to use ConcurrentHashMap must be because HashMap is not safe in multiple threads, which is actually crap. Expansion card slot inside when the value of the position is to recalculate the redistribution, but the multithreaded situations may you calculate good location When such as card slot Expansion will cause you to have taken place in this time the location of the store has a problem, under the new expansion after the array Again according to the corresponding value key may be can not find a way, Of course, this is just one case, and there are many more. At this point we use the ConcurrentHashMap in JUC.
ConcurrentHashMap is the thread-safe equivalent of a HashMap. Let’s take a look at how ConcurrentHashMap keeps threads safe.
ConcurrentHashMap
InitTable initialization
In my last post on the HashMap capacity setting, I said that the array initialization of a HashMap actually happens on the first PUT. So let’s see what ConcurrentHashMap looks like. What’s the code
private static final int MAXIMUM_CAPACITY = 1 << 30;
/** * Creates a new, empty map with the default initial table size (16). */
public ConcurrentHashMap(a) {}public ConcurrentHashMap(int initialCapacity) {
if (initialCapacity < 0)
throw new IllegalArgumentException();
int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1))? MAXIMUM_CAPACITY : tableSizeFor(initialCapacity + (initialCapacity >>>1) + 1));
this.sizeCtl = cap;
}
private static final int tableSizeFor(int c) {
int n = c - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
If initialCapacity is 10, initialCapacity >>> 1 means to move right 1 bit. The binary of 10 is: 1010. The decimal value of 0101 is: 5 so tableSizeFor is going to be 10+5+1=16, tableSizeFor and what that means is I mentioned in the last blog is that you compute the smallest power of two that is greater than the value that’s passed in and if you pass in 7 it’s going to be 8 and if you pass in 17 it’s going to be 32.
We know that array initialization is always done the first time we add an element, so let’s look at the initialization method in put
/** * Initializes table, using the size recorded in sizeCtl. */
private final Node<K,V>[] initTable() {
Node<K,V>[] tab; int sc;
while ((tab = table) == null || tab.length == 0) {// Spin handling guarantees some initialization success
if ((sc = sizeCtl) < 0)< 0 indicates that a thread is initializing to release CPU resources
Thread.yield(); // lost initialization race; just spin
else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
try {
if ((tab = table) == null || tab.length == 0) {
int n = (sc > 0)? sc : DEFAULT_CAPACITY;@SuppressWarnings("unchecked")
Node<K,V>[] nt = (Node<K,V>[])newNode<? ,? >[n]; table = tab = nt; sc = n - (n >>>2); }}finally {
sizeCtl = sc;
}
break; }}return tab;
}
Copy the code
So the first thing we see here is that we’re using a While here to determine whether or not the TAB is initialized, which is this one which is a spin handler, to make sure it’s initialized. SizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1, sizeCtl = -1 SIZECTL values have been changed to match the expected values, so the next poll will be conducted. In this case, TAB may be initialized and the loop will exit. If TAB is not initialized, the condition sc is less than 0 and the CPU resources will be allocated again until the initialization is complete.
Let’s talk about array initialization. Once we initialize the table, we’re still checking whether or not it should be initialized. So here’s a double Check.
One of the cases where sc is greater than 0 is when we initialize initialCapacity, we get sizeCtl and we use that to initialize the array size.
If initialCapacity is not passed in, the default value DEFAULT_CAPACITY is used, which is 16.
See here a simple piece of code, see someone else write more strictly, spin is not empty spin also made a timely release of CPU resources, see the benefits of source code is to learn excellent code!!
With that done, let’s briefly summarize what ConcurrentHashMap does to ensure thread-safety
- The spin guarantees certain initialization
- CAS guarantees that only one thread initializes the array at a time
- The secondary check ensures that the table state is correct when the array is initialized
Thread safety of PUT
Ok, so with the initialization out of the way let’s look at how the core puts elements in.
Take a look at the code:
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());// Calculate the hash value
int binCount = 0;
for (Node<K,V>[] tab = table;;) {// This is also a spin operation to ensure the success of the new element
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();// Perform initialization
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {<! This is also a CAS operation where the current node value isnullWhen you can put elements in the nic slot -->if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)//MOVED is a fixed value to indicate that the node is being expanded. The node has been MOVED and cannot be operated. You need to wait until the node is expanded
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) { // Synchronized lock is used
//Hash post-conflict processing
}
if(binCount ! =0) {
if (binCount >= TREEIFY_THRESHOLD)// If the length of the list is greater than equal 8, it will be converted to a red-black tree
treeifyBin(tab, i);
if(oldVal ! =null)
return oldVal;
break;
}
}
}
addCount(1L, binCount);// Determine whether to expand the number
return null;
}
Copy the code
Some of the specific operations I have added comments, here is just with the same means of initialization table to ensure the safety of multiple threads
- Spin Spin guarantees that the element will be put in successfully
- The CSA CAS operation ensures that the current node is null when data is inserted into the table. If another thread manipulates the value of the slot during this time, the value of the I position in the TABLE in the CAS must not be null, which is inconsistent with the expected value, then the CAS execution fails. The next loop finds that the slot already has a value and goes to the Synchronized method block after the Hash collision. This ensures that adding the first element to a new slot is thread-safe
- Synchronized locks ensure thread-safe execution
Transfer Thread safety during expansion
ConcurrentHashMap and HashMap have the same timing for expansion operation. In both cases, they judge whether the critical value is reached after adding the original value. If so, they expand the capacity by two times
/** * Moves and/or copies the nodes in each bin to new table. See * above for explanation. */
private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
int n = tab.length, stride;
if ((stride = (NCPU > 1)? (n >>>3) / NCPU : n) < MIN_TRANSFER_STRIDE)
stride = MIN_TRANSFER_STRIDE; // Set the node step size of the CPU operable array to 16<! NextTab is the new array -->if (nextTab == null) { // initiating
try {
@SuppressWarnings("unchecked")
Node<K,V>[] nt = (Node<K,V>[])newNode<? ,? >[n <<1];// The value of n << 1 equals n*2
nextTab = nt;
} catch (Throwable ex) { // try to cope with OOME
sizeCtl = Integer.MAX_VALUE;
return;
}
nextTable = nextTab;// A transition table during expansion
transferIndex = n;// transferIndex is the node of the first card slot to be transferred during expansion. The operation on the original array is reversed.
}
int nextn = nextTab.length;
/** * ForwardingNode inherited Node but the hash value of all nodes has been set to MOVED, indicating that the current Node is in expansion * above for explanation. */
ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
boolean advance = true;
boolean finishing = false; // to ensure sweep before committing nextTab
for (int i = 0, bound = 0;;) {
Node<K,V> f; int fh;
while(advance) { <! NextIndex is the slot position where the copy starts from the tail each time -1-- - >intnextIndex, nextBound; <! -- -- I >= bound bound is the last value to be executed each time finishingif (--i >= bound || finishing)
advance = false; <! Index is copied backwards to the beginning, so it is less than or equal0Is the end of the calculation -->else if ((nextIndex = transferIndex) <= 0) {
i = -1;
advance = false;
}
else if (U.compareAndSwapInt
(this, TRANSFERINDEX, nextIndex,
nextBound = (nextIndex > stride ?
nextIndex - stride : 0))) {
bound = nextBound;
i = nextIndex - 1;// Subtract one at a time
advance = false; }}if (i < 0 || i >= n || i + n >= nextn) {
int sc;
if (finishing) {
nextTable = null;// Assign the temporary table to null
table = nextTab;// The new table is assigned after the end
sizeCtl = (n << 1) - (n >>> 1);//
return;
}
if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
return;
finishing = advance = true;
i = n; // recheck before commit}}else if ((f = tabAt(tab, i)) == null)
advance = casTabAt(tab, i, null, fwd);
else if ((fh = f.hash) == MOVED)
advance = true; // already processed
else {
synchronized (f) {
if (tabAt(tab, i) == f) {
Node<K,V> ln, hn;
if (fh >= 0) {
// it is a linked list
}
else if (f instanceof TreeBin) {
// red black tree
}
}
}
}
}
}
Copy the code
The code for expansion is a bit verbose and takes a while to understand. The general idea is as follows:
- The copy starts at the end of the original array
- Synchronized will be used to lock slots of the original array when copying the array, so that data in the original array cannot be modified to ensure thread safety. After successfully copying to the new array, slots of the original array will be set as move node.
- If there’s a put in there and the current slot is a move, it’s going to wait, and we talked about that in put
- Until all the nodes of the original array are copied into the new array, then the new array is assigned to the array container to complete the copy
To sum up, during array expansion, Synchronized locks are mainly used to lock slots and prevent other threads from operating them. After a slot is successfully copied, it will be identified as a transfer node. In this way, when a new PUT operation comes, the slot status is move, and the put operation will wait until the expansion is complete.
get
Finally, let’s look at the get method
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());// Get the hash value
if((tab = table) ! =null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) ! =null) {<! Hash hash = hash hash = hash hash = hash hash = hash hash = hash hash = hash hashif ((eh = e.hash) == h) {
if((ek = e.key) == key || (ek ! =null && key.equals(ek)))
returne.val; } <! -- The hash value of the node is less than0A transfer node or a red-black tree -->else if (eh < 0)
return(p = e.find(h, key)) ! =null ? p.val : null; <! Check if it is a linked list.while((e = e.next) ! =null) {
if(e.hash == h && ((ek = e.key) == key || (ek ! =null && key.equals(ek))))
returne.val; }}return null;
}
Copy the code
After looking at the whole get method, we see that this method is no different from the get method in our HashMap. There are no lock CAS and so on. How does that work
Let’s go ahead and look at the code
**
* The array of bins. Lazily initialized upon first insertion.
* Size is always a power of two. Accessed directly by iterators.
*/
transient volatile Node<K,V>[] table;
/** * The next table to use; non-null only while resizing. */
private transient volatile Node<K,V>[] nextTable;
/** * Base counter value, used mainly when there is no contention, * but also as a fallback during table initialization * races. Updated via CAS. */
private transient volatile long baseCount;
/** * Table initialization and resizing control. When negative, the * table is being initialized or resized: -1 for initialization, * else -(1 + the number of active resizing threads). Otherwise, * when table is null, holds the initial table size to use upon * creation, or 0 for default. After initialization, holds the * next element count value upon which to resize the table. */
private transient volatile int sizeCtl;
/** * The next table index (plus one) to split while resizing. */
private transient volatile int transferIndex;
/** * Spinlock (locked via CAS) used when resizing and/or creating CounterCells. */
private transient volatile int cellsBusy;
/** * Table of counter cells. When non-null, size is a power of 2. */
private transient volatile CounterCell[] counterCells;
Copy the code
Did you notice that the member variables of these classes have the special keyword volatile, which I’m sure you all know what it means to ensure the visibility of memory between multiple threads, which means that thread A changes the value of the variable and thread B immediately knows the change and gets the latest, How to implement this is mainly to use the memory barrier, cache consistency and other technologies, not the focus of this article, we will talk about later.
But we know that the volatile modifier, if it’s the base value type that guarantees the visibility of the value type that’s being modified by multiple threads but here we have the array container table which is an array object which is a reference type which guarantees the visibility of the array object, There is no way to know if the element inside the array object has changed
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
volatile V val;
volatile Node<K,V> next;
Node(int hash, K key, V val, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.val = val;
this.next = next; }}Copy the code
When I look at this I know that the val node and the next node are used with the volatitle keyword, which means that when we change the value of the node and when we insert the value, the value is visible between threads. In this way, when we use the GET method, we can get the latest value without any lock, which completes the synchronization in multi-thread and ensures the security in multi-thread.
When used correctly, semi-volatitle is a perfect lock substitute for concurrency to improve performance in multi-threading.
conclusion
ConcurrentHashMap uses CAS+ spin +synchronized+ multiple check to ensure thread safety during initialization, addition and expansion. When data is read, volatitle is used to make element nodes visible between multiple threads. To get the latest value!
Finished another wahoo !!!! Keep it up