Reciting the eight-part essay recently, I found myself limited to the use of the Java synchronization mechanism, and part of the low-level understanding like picking up ideas. So I found the time to sort out a little bit, and summarize the narrative today.
Why is concurrency needed?
If you read my previous article talk about Java network programming, you will know, concurrent were introduced in order to better use of CPU, including but not limited to the current thread blocking I/O scheduling new threads to increase system throughput, use multi-core CPU, detachable task computing, different priority programs work together.
Concurrency is not parallel, many books and many articles have said this, parallel is really concurrent execution, concurrency is just alternating tasks at the same time, parallel includes concurrency, but concurrency can’t be done parallel is really “running together.”
Problems with concurrency? To solve?
Concurrency brings problems, the biggest of which is synchronization, and of course scheduling problems. Resource allocation is essentially a synchronization problem. Let’s leave the scheduling problem aside, because the scheduling algorithm depends on what the current system is trying to accomplish, such as interactive systems using time-slice rotation algorithms, priority algorithms, shortest process first algorithms, etc.
Now let’s focus on synchronization. Synchronization operates on multiple threads, with the aim of giving each thread safe, ordered (and unordered) access to critical section resources. I don’t want to talk about gobbledegook here, just “make sure each thread reads and writes the shared data correctly”.
The solution to the synchronization problem is nothing more than locking, but there is also a communication-based approach, such as Golang’s design model, the CSP(Revised Sequential Processes) model, which is implemented based on pipes. The design philosophy is “communicate to share memory, not communicate to share memory”. Java has a similar pipeline, but the actual development uses locks, so we focus on the former as well.
Synchronization primitives
The definition of synchronization has just been given, so what exactly does it mean?
Here I’ll use the simple but most common example: ++ I:
private static int count = 0;
public static void main(String[] args) {
Runnable runnable = new Runnable() {
@Override
public void run(a) {
for (int i = 0; i < 10000; ++ i) { ++ count; }}};new Thread(runnable).start();
new Thread(runnable).start();
// Give the thread some time to run
LockSupport.parkNanos(Duration.ofMillis(50).toNanos());
System.out.println(count);
}
Copy the code
If nothing else, your machine output will be less than 10000*2; If it happens to be 20,000, you can buy a lottery ticket and try it. My machine is faster, and readers can make the interval between stops a bit larger, so as long as it’s not old-fashioned, 100ms is more than enough.
Why output <20000? It also starts with the JVM memory model, which you can see in my article for an in-depth look at JVMS – concurrency, threads, locks.
In summary, a variable increment consists of three subprocedures, read, +1, and write. Read =>load=>use; Write also includes three subprocedures: assign=>store=>write. None of the above processes, including its children, is atomic and can be inserted into other operations, so there are two threads that read the same value and then each +1, missing a +1 update.
synchronously
The solution to the above problem is simple, let’s add a lock:
private static int count = 0;
public static void main(String[] args) {
Runnable runnable = new Runnable() {
@Override
public void run(a) {
// Notice here
synchronized (this) {
for (int i = 0; i < 10000; ++ i) { ++ count; }}}};new Thread(runnable).start();
new Thread(runnable).start();
LockSupport.parkNanos(Duration.ofMillis(50).toNanos());
System.out.println(count);
}
Copy the code
At this point the output of 20,000 is stable and no longer luck, since we use the synchronized keyword to restrict access to the variable.
In addition to Java’s own synchronization keyword, sunchronized, we also have a bunch of lock utility classes available under the JDK and dispatch package.
volatile
Just as an aside. If we only want to limit the strict order of variables, speaking, reading and writing, or say, want to be visible to other threads variable updates immediately, you can use the volatile keyword, it does not belong to the category of synchronization, about how to understand, speaking, reading and writing order strictly, and why it cannot be achieved synchronization, can still walk I mentioned above, Let’s just say it more easily.
By not splitting read=>load=>use and assign=>store=>write, updates to a write operation must be read.
Again, you need to understand the JVM memory model. Then look at a picture:
Here is a demo code:
import java.time.Duration;
import java.util.concurrent.locks.LockSupport;
/ * * *@author CodeWithBuff
*/
public class Solution {
public static void main(String[] args) throws InterruptedException {
TestClass testClass = new TestClass();
Thread thread = new Thread(testClass);
thread.start();
LockSupport.parkNanos(Duration.ofMillis(10).toNanos());
testClass.setFlag();
LockSupport.parkNanos(Duration.ofMillis(10).toNanos());
System.out.println("main done");
}
private static class TestClass implements Runnable {
private
// volatile
boolean flag = true;
public void setFlag(a) { flag = ! flag; }@Override
public void run(a) {
int a = 0;
while (flag) {
a++;
}
System.out.println("get: "+ a); }}}Copy the code
In this demo code, the flag in the while loop is read from working memory first, and if nothing else, the flag will always be true in working memory. Even if a new value is set later, it may not be written in time or it may be read again, causing the flag in the while to remain true.
In addition, volatile can also disallow reordering, that is, the order in which volatile variables are executed, equivalent to the order in which they were written. This is common in double-checked synchronization singletons. Here’s an example:
import java.time.Duration;
import java.util.concurrent.locks.LockSupport;
/ * * *@author CodeWithBuff
*/
public class Solution {
public static void main(String[] args) throws InterruptedException {
T1 t1 = new T1();
T2 t2 = new T2();
for (int i = 0; i < 10000; ++ i) {
Thread thread1 = new Thread(t1);
Thread thread2 = new Thread(t2);
thread1.start();
thread2.start();
thread1.join();
thread2.join();
if (x == 0 && y == 0) {
System.err.println("resorted: " + x + "," + y);
}
a = b = x = y = 0;
}
LockSupport.parkNanos(Duration.ofMillis(10).toNanos());
}
private
// volatile
static int a = 0;
private
// volatile
static int b = 0;
private
// volatile
static int x = 0;
private
// volatile
static int y = 0;
private static class T1 implements Runnable {
@Override
public void run(a) {
// Make sure T1 and T2 are running together according to your CPU performance
for (long i = 0; i < 130000; ++ i) {
;
}
a = 12; x = b; }}private static class T2 implements Runnable {
@Override
public void run(a) {
b = 21; y = a; }}}Copy the code
Unsurprisingly, you will see:
First of all, you think it’s impossible, because either x=21, y=0, or x=0, y=12, or x=21, y=12; So why does this happen?
Because the instructions were reordered, the actual order might be: x=b, y=a, a=12, b=21. So x and y are both 0. Try adding volatile, and this will not happen, because volatile disallows the associated reordering.
The singleton pattern of double-checked locks also has this problem, causing another thread to use the uninitialized instance, and the solution is to use volatile. Synchronized only guarantees that a decorated block of code will be synchronized within it, not that it will be reordered.
Now, the principle of volatile.
The visibility principle, which I just explained, is achieved by atomic read and atomic write, so that the three suboperations can’t be split; Each time you must write to memory and read from memory to achieve update visibility.
Forbidding reordering is a memory barrier technique.
A Memory Barrier (or sometimes called a Memory Fence) is a CPU instruction that controls reordering and Memory visibility under certain conditions. The Java compiler also disallows reordering according to the rules of the memory barrier.
Memory barriers can be classified into the following types:
- 1 ️LoadLoad barrier: for statements Load1; LoadLoad; Load2, ensure that the data to be read by Load1 is read before the data to be read by Load2 and subsequent read operations are accessed.
- 2 ️StoreStore barrier: Store1 for such statements; StoreStore; Store2. Before Store2 and subsequent writes, ensure that writes to Store1 are visible to other processors.
- 3 ️LoadStore barrier: for the statement Load1; LoadStore; Store2, ensure that the data to be read by Load1 is finished before Store2 and subsequent writes spawn out.
- 4 ️StoreLoad barrier: Store1 for such statements; StoreLoad; Load2, before Load2 and all subsequent reads are performed, ensure that the write to Store1 is visible to all processors. Its overhead is the largest of the four barriers. In most processor implementations, this barrier is a universal barrier that doubles as the other three memory barriers.
Synchronized
There are two types of synchronization in Java: synchronized, and the classes below java.util.concurrent.*. Let’s look at the first one.
Synchronized can be used in three ways:
- 1 ️ operates on the code block
- 2 ️ applies to instance methods
- 3 ️ operates on static methods
Heavyweight lock
When synchronized operates on a block of code, you can select the lock object, usually provided by yourself; When synchronized applies to instance methods, the lock object is the current instance; When synchronized applies to static methods, the lock object is a Class object of the Class.
Now let’s talk about implementation at the bytecode level. Methods, whether instance methods or static methods, are implemented by adding the ACC_SYNCHRONIZED access identifier. For code blocks, this is done through the Monitorenter and Monitorexit directives.
Either way, it all boils down to a monitor operation on the lock object; To acquire the lock is to acquire the monitor, and to reenter is to add the monitor counter +1, and to release it is -1, until it reaches 0. Now let’s talk more about objects and Monitor.
The first thing you need to know is the memory layout of the Java object. The mutex pointer of the MarkWord’s heavyweight lock in the object header points to the ObjectMonitor, which is the implementation of Monitor.
Java uses an ObjectMonitor object to represent the monitor bound to the lock object as follows:
ObjectMonitor() {
_header = NULL;
_count = 0; // Record the number of times the owner acquired the lock
_waiters = 0,
_recursions = 0; // Number of reentrants
_object = NULL;
_owner = NULL; // Indicate who holds the lock
_WaitSet = NULL; // Threads in the wait state are added to the _WaitSet
_WaitSetLock = 0 ;
_Responsible = NULL ;
_succ = NULL ;
_cxq = NULL ;
FreeNext = NULL ;
_EntryList = NULL ; // Threads waiting for a lock block are added to the list
_SpinFreq = 0 ;
_SpinClock = 0 ;
OwnerIsThread = 0 ;
}
Copy the code
With this structure in mind, it is easy to understand the implementation and use of wait queues and blocking queues.
If a thread wants to acquire a lock, it first goes to the EntryList, the lock is acquired, and a wait call goes to the WaitSet until it wakes up and enters the EntryList. This is a general process, as to acquire the lock, release the operation is the operating system implementation.
ObjectMonitor is implemented more in cooperation with the operating system. ObjectMonitor provides resources, and mutex implements operations on the ObjectMonitor’s resources. There is no relationship between the different ObjectMonitor, the lock operation is delegated to the mutex. The different resources operate independently of each other.
This is an implementation of the synchronized heavyweight lock. Synchronized => lock object =>ObjectMonitor=> operating system mutex
Now look at the optimizations that the JVM has made for synchronized. Because heavyweight locks are delegated to the operating system and inevitably involve thread context and system calls, synchronized’s heavyweight locks are quite large.
However, many times we don’t have to use full lock/unlock, or many times we don’t have enough competition to use a heavyweight lock every time, so the JVM has three optimizations for this: spin locking, bias locking (deprecated after JDK15), and lightweight locking. All three types of locks, despite the name lock, are essentially unlocked, or do not use mutex reoperations to lock.
Adaptive spin lock
Wait for the lock by spinning for a certain amount of time, so you can wait for the lock by waiting for a while rather than blocking; On this basis, we can improve, learn the time of each spin, summarize the rule, and dynamically change the number of spins, which is the adaptive spin lock.
Biased locking
If a thread acquires the lock, it is biased in its favor, and when it acquires it again, it can continue directly by looking at its current preference. Cancel a second lock on the same thread by recording who held the lock last time. There is no release of biased lock, it is not a lock, it is a record. The lock is only released when another thread requests it.
Heavy bias
Ignore the record clearing because the second thread acquired the lock, and continue to favor the first thread. However, this is not unconditionally biased. If the number of bias locks released reaches a threshold, a record is triggered, and when the record reaches the threshold, the bias lock is cancelled.
Lightweight lock
When threads are executed interchangeover, there is no need to lock the record, but lock the record in the form of a thread pointer set by the CAS to MarkWord. This allows the current thread to enter the synchronized block by recording which thread is currently using the synchronized block, without using a mutex to do so.
To summarize the use of three types of locks:
- 1 ️ adaptive spin lock, stops the current thread when the lock is unavailable through a short busy wait rather than blocking directly, reducing the performance loss of thread scheduling; And you can dynamically change the spin time. Can be inserted into the wait period of the following three operations.
- 2 ️ is biased to lock, which is applicable to the scenario with only one thread. The operation of re-locking is reduced by recording the thread that ran the synchronization code last time. When there is a second thread, it will be automatically invalid, because it is only applicable to the scenario with only one thread. Without it, the CAS replaces the thread pointer (lightweight lock) or mutex (heavyweight lock) operation every time, which is obviously not appropriate when there is only one thread.
- 3 ️ lightweight lock is suitable for the scenario of alternate execution. Each thread does not compete with each other, and resource locking can be realized by replacing the thread ID of the lock with simple CAS. When concurrent competition occurs, it will be invalid, because lightweight lock cannot block other threads and other operations.
- 4 ️ heavyweight lock is suitable for highly competitive scenarios. Mutex +Monitor is used to implement heavyweight operations such as blocking, waiting and waking up.
Lock can only be upgraded rather than degraded, and can only be in the order of 2 ️=>3 ️=>4 ️. When the scene of a certain lock is no longer matched, it will be automatically upgraded.
As for how each lock is implemented, the details, how it’s upgraded, it’s complicated, it’s complicated. Here is a copy of my handwriting (I drew it late at night, the handwriting is ugly) :
Here are a few articles for your reference:
Read this article suddenly enlightened, understand the Java bias lock, lightweight lock, heavyweight lock
In-depth understanding of Java synchronization implementation principles
Why do lightweight locks and heavyweight locks inflate each other? – Adsf’s answer – Zhihu
Java.Util.Concurrent
JUC provides a bunch of lock utility classes out of the box, which tend to be more powerful but perform just as well as synchronized. Prior to the introduction of optimizations, JUC was faster. Let’s look at two examples: ReentrantLock and Semaphore.
AQS
AQS (AbstractQueuedSynchronizer) is to realize the lock function is the most basic class, all JUC lock is to inherit it.
Essentially, it’s implemented atomic operations through the CAS operation on Unsafe, which is implemented in C++, and then underneath that is X86.
Acquire (int) acquire(INT)
public final void acquire(int arg) {
if(! tryAcquire(arg)) acquire(null, arg, false.false.false.0L);
}
final int acquire(Node node, int arg, boolean shared,
boolean interruptible, boolean timed, long time) {... }Copy the code
And release(int) :
public final boolean release(int arg) {
if (tryRelease(arg)) {
signalNext(head);
return true;
}
return false;
}
Copy the code
As can be seen from the above, each time the request/release of resources must be tryXxx operation first, then determine whether it is satisfied, if not, then the relevant blocking queue operation.
With regard to blocking queues, AQS internally maintains a virtual queue, which is not traversable like a linked list, but is made up of threads that have been suspended and failed to request resources.
Each thread will call acquire(……) when the request fails. Method, and then process it in a loop:
final int acquire(AbstractQueuedSynchronizer.Node node, int arg, boolean shared,
boolean interruptible, boolean timed, long time) {
Thread current = Thread.currentThread();
byte spins = 0, postSpins = 0;
boolean interrupted = false, first = false;
AbstractQueuedSynchronizer.Node pred = null;
for(; ;) {if (first || pred == null) {
// When awakened (either because there are enough resources, or because the thread has been interrupted), it tries again and the response is interrupted
// If the application fails, continue to sleep. If the application is interrupted, cancel the application
// do work
}
if (node == null) {
// Construct a node based on whether it is exclusive or shared; A node is simply the current thread + state + front node + back node, ready to join the queue
} else if (pred == null) {
/ / team
} else if(first && spins ! =0) {
// Spin wait for a while, an optimization strategy
} else if (node.status == 0) {
// Set the status
} else {
// Use LockSupport to suspend the thread}}return cancelAcquire(node, interrupted, interruptible);
}
Copy the code
The above is a simplified version, but it is basically a acquire actual workflow. When a thread is interrupted and resources are available here, the thread resumes, loops again, and the first IF is triggered.
The signalNext for release() is simple. Find the first element and wake up the first thread in the queue using the locksupport.unpark method.
This method is left to the AQS subclass, which is responsible for making some requests in this method, releasing the logic. If the request fails/the release is successful, the thread should wake up, then tryXxx is followed by acquire(…). The /signalNext() method performs subsequent processing.
In addition, there is the tryXxxShared method for shared mode. The main difference between private mode and shared mode is whether multiple threads are allowed to run, typically reentrant locks and semaphores. As long as the number of semaphore resources is sufficient for the next thread to request, then one more thread can run instead of only one thread as in reentrant mode.
ReentrantLock
ReentrantLock implements exclusive mode. Its lock operation is aqs.acquire (1), and its unlock operation is aqs.release (1). TryLock does not enter AQS acquire(…) Go inside.
The call hierarchy is easy to understand by looking at a graph:
Lock:
Unlock:
If it is interruptible, set it to enable interrupt response in AQS. Acquire.
Semaphore
Semaphore implements the shared mode, as usual, looking directly at the call hierarchy:
Application resources:
Forget it. I don’t want to write it. I’ll memorize the eight-part essay
reference
Simplest and understandable example of volatile keyword in Java
Analysis of Volatile Memory Barriers and Implementation Principles (JMM and MESI)
Study on Java memory access reordering