CPU bus lock
Synchronized blocks guarantee that only one thread can enter the contention area at a time. Synchronized blocks also guarantee that all variables in the block will be read from main memory. Updates to all variables are flushed into main memory when a thread exits the block. It does not matter whether these variables are volatile or not.
The processor uses bus locks to solve this problem. A bus lock is the use of a LOCK# signal provided by a processor. When one processor outputs this signal on the bus, requests from other processors will be blocked and the processor can exclusively use the shared memory. A bus lock locks the communication between the CPU and the memory. During the lock period, other processors cannot operate on the data of other memory addresses. Therefore, the cost of a bus lock is high.
CPU MESI protocol + memory barrier
- The msci protocol (Ensure CPU cache consistency) CPU follows by default
In a multi-core CPU, data in the memory may have data copies in multiple cores. When a core is modified, data inconsistency occurs. The consistency protocol is used to ensure the consistency of shared data between multiple CPU caches.
When the CPU reads data from the cache, it checks whether the cache of other cpus has the same address. If the address is the same, the CPU checks the status of the cache line to determine whether the data is valid. If the data is not valid, the CPU obtains data from the main memory or pushes the data cache-to-cache according to the consistency protocol.
-
Memory barriers (ensuring visibility and order **) that are explicitly specified by Volatile instructions
-
Disallow instruction reordering at the JVM and CPU execution levels.
-
Flush the current processor cache line of data to system main memory immediately.
Visibility means that in a multithreaded environment, shared variable operations are visible to each thread in memory, meaning that each thread has the latest value for volatile variables. And each thread’s changes to volatile variables are flushed directly to main memory.
Reference: www.jianshu.com/p/2643c9ea1…
CAS shared variable update method principle
Based on the implementation of volatile keyword, the atomicity problem of volatile was solved with lightweight spin. Thus, three features of concurrency security are realized to ensure concurrency security.
Cas operation semantics at various levels:
Java language level |
For example, the getAndAddInt method |
Virtual machine bytecode |
CompareAndSwapInt Local method |
Operating system/assembly instructions |
The compare-and-swap atomic instruction |
The CPU hardware |
Java language level directly call CPU instruction implementation, do not need user mode to kernel mode switch. But there is the CPU cost of polling.
public final int getAndAddInt(Object var1, long var2, int var4) { int var5; do { var5 = this.getIntVolatile(var1, var2); } while(! this.compareAndSwapInt(var1, var2, var5, var5 + var4)); return var5;Copy the code
See from this code:
unsafe.compareAndSwapInt(this, valueOffset, expect, update)
Copy the code
CAS is the abbreviation of the English word CompareAndSwap, which means compare and replace in Chinese. CAS needs to have three operands: the memory address V, the old expected value A, and the target value B to be updated.
When the CAS instruction executes, the value of memory address V is changed to B if and only if the value of memory address V is equal to the expected value A, otherwise nothing is done. The entire compare and replace operation is an atomic operation.
If the value in var1 is equal to the value in var5, no other thread has changed the variable. Change the value to var5 + var4, where var4 is the updated incremental value; Otherwise, if there is no update, the CAS continues spinning (essentially a while loop), which is also an atomic operation.
Example analysis:
- Set the original value of AtomicInteger to A, and we know from the Java memory model that thread 1 and thread 2 each hold A copy of the value of A.
- Thread 1 gets value A via getIntVolatile(var1, var2), and thread 1 is suspended.
- Thread 2 also uses getIntVolatile(var1, var2) to obtain value A, compareAndSwapInt to compare the memory value to A, and successfully change the memory value to B.
- When thread 1 returns to compareAndSwapInt, it finds that the value A in its hand is inconsistent with the value B in memory, indicating that the value has been modified by another thread in advance.
- Thread 1 reexecutes getIntVolatile(var1, var2) to get value again. Since value is volatile, thread A always sees changes made to it by other threads. Thread A continues to compareAndSwapInt for comparison replacement until it succeeds.
The above method takes several important parameters:
(1) The this, Unsafe object itself, is required to retrieve the memory offset address of the value.
(2) valueOffset, value Memory offset address of the variable.
(3) expect, expect the value of update.
(4) update the latest value to be updated.
If the value in the atomic variable is equal to expect, it is updated with the update value and returns true, otherwise false is returned.
Of course, if you are interested, you can continue to learn more about which hardware CPU instructions are used.
The underlying hardware ensures atomicity through a single processor instruction by semantically implementing multiple operations in CAS at the hardware level. These instructions are as follows:
(1) Test and Set (tetst-and-set)
(2) Fetch and Increment
(3) Swap
(4) Compare and Swap
(5) Load Linked/ store-conditional
Most of the first three processors are already implemented, and the last two are new additions to modern processors. And there are significant differences in instructions depending on the architecture.
Problems with CAS
- ABA problem
I have two threads A and B sharing data I = 1; Thread A reads I, thread B reads I, does +1-1, and writes the result back to memory. Thread A finds that the data is still 1 through case and continues to operate.
- The polling time problem.
If the CPU is not released by the waiting thread, the CPU idling the thread for a long time will waste CPU resources if the shared data update cannot be started in a scenario with large concurrency.
Reference: bbs.huaweicloud.com/blogs/13615…
Principle of AQS tool class
AQS (AbstractQueuedSynchronizer * *) is a framework used to build the lock and synchronizer * *, using AQS can simple and widely used to efficiently construct a lot of synchronizer.
ReentrantLock, Semaphore, and others such as ReentrantReadWriteLock, SynchronousQueue, FutureTask are all based on AQS.
Of course, we can also use AQS to construct synchronizers for our own needs very easily and easily.
There are two important members of AQS:
- ** Member variable state. **** is used to indicate the current state of the lock, volatile, and **** uses CAS for state. A **state of 0 means that no thread is holding the lock. The thread that holds the lock increments state by one and decays state by one when it releases the lock. Multiple hold releases add and subtract multiple times.
- ** There is also a bidirectional linked list, ** linked list except the head node, each node records the thread information, representing a waiting thread. This is a linked list of FIFOs.
private volatile int state; // Share variables with volatile to ensure thread visibility and orderCopy the code
The core idea of AQS is that if the requested shared resource is idle, the thread of the current requested resource is set as a valid worker thread, and the shared resource is set to the locked state. If the requested shared resource is occupied, then a mechanism is needed for thread blocking and waiting and lock allocation when woken up. This mechanism is implemented by CLH queue lock, which is to queue the thread that can not acquire the lock temporarily.
CLH(Craig,Landin,and Hagersten) queue is a virtual bidirectional queue (virtual bidirectional queue has no queue instance, only the association relationship between nodes). AQS encapsulates each thread requesting shared resources into a Node (Node) of a CLH lock queue to achieve lock allocation.
Watch a AQS (AbstractQueuedSynchronizer) schematic diagram:
Different custom synchronizers compete for shared resources in different ways. The implementation of custom synchronizer only needs to realize the acquisition and release of shared resource state. As for the maintenance of specific thread waiting queue (such as failure to acquire resources in queue/wake up queue, etc.), AQS has been implemented at the top level.
In general, custom synchronizers are either exclusive or shared methods, and they only need to implement either Tryacquire-TryRelease or tryAcquireShared.
AQS defines two resource sharing modes
In the synchronous queue, there are two modes, namely exclusive mode and shared mode. The difference between the two modes lies in whether AQS delivers wake up when waking up thread nodes. These two modes correspond to exclusive lock and shared lock respectively.
-
Exclusive: Only one thread can execute, such as ReentrantLock. And can be divided into fair locks and unfair locks:
-
Fair lock: The first to come gets the lock according to the order in which the threads are queued
-
Unfair lock: when a thread wants to acquire a lock, it directly grabs the lock regardless of the queue order
-
Share: Multiple threads can execute simultaneously, such as Semaphore/CountDownLatch. Semaphore, CountDownLatCh, CyclicBarrier, and ReadWriteLock are all covered later.
ReentrantReadWriteLock can be considered a composite because a ReentrantReadWriteLock is a read/write lock that allows multiple threads to read a resource simultaneously.
Reference:
Juejin. Cn/post / 684490…
zhuanlan.zhihu.com/p/48295486
Principle of COW container (copy-on-write)
A CopyOnWrite container is a container for copying while writing.The popular understanding isWhen we add elements to a container, we do not directly add elements to the current container. Instead, we Copy the current container to create a new container, and then add elements to the new container. After adding elements, we reference the original container to the new container.
The advantage of this is that we can do concurrent reads to the CopyOnWrite container without locking, since no elements are being added to the current container. So the CopyOnWrite container is also an idea of read-write separation, reading and writing different containers.
There is no need to lock the read, and if another thread is adding data to CopyOnWriteArrayList at the time of reading, the read will still read the old data, because the read is already determined to be the old object.
The CopyOnWrite concurrent container is used in concurrent scenarios where more reads and less writes are required. Such as whitelist, blacklist and other scenarios.
The CopyOnWrite container has many advantages, but it also has two problems: memory footprint and data consistency. The CopyOnWrite container only guarantees final data consistency, not real-time data consistency. So if you want to write data that is immediately readable, don’t use the CopyOnWrite container.
/** * Appends the specified element to the end of this list. * * @param e element to be appended to this list * @return {@code true} (as specified by {@link Collection#add}) */ public boolean add(E e) { final ReentrantLock lock = this.lock; lock.lock(); try { Object[] elements = getArray(); int len = elements.length; Object[] newElements = Arrays.copyOf(elements, len + 1); newElements[len] = e; setArray(newElements); return true; } finally { lock.unlock(); } } /** * {@inheritDoc} * * @throws IndexOutOfBoundsException {@inheritDoc} */ public E get(int index) { return get(getArray(), index); }Copy the code