An overview of the
Before formally introducing this chapter, I’d like to introduce the multi-core architecture of modern X86 processors, as shown here:
L1 as data and instruction cache, L2 as each core unique cache, fast, small capacity; L3, as a core shared cache, is slower; They’re both faster than main memory.
The reason for this has to do with the multiple levels of storage in modern computer architectures, which I won’t explain here, except that each CPU has its own little cache that doesn’t talk to each other.
With multithreading, the OS usually allocates different threads to different cores and allocates cache space first for performance, so how do you ensure that the cache of each core holds the same shared variable? This is where the cache consistency protocol is needed. There are a number of protocols. The basic idea is that if a variable is stored in multiple L2 caches, when one core changes its value, the other cpus can use this protocol to expire their caches so that they can read the latest value.
So that’s our knowledge base, now let’s look at the Java memory model.
Java Memory Model
Main memory and working memory
The main purpose of the Java memory model is to define the rules for accessing variables in a program, such as how they are evaluated and written from memory. The Java memory model refers to variables that can be accessed by multiple threads, such as instance variables, static variables, and array elements, but not those that are thread private.
The Java memory model has the following requirements for thread read/write access variables:
- 1 ️ All variables must be stored in main storage.
- 2 ️ Each thread has its own working memory, read and write to variables must first be carried out in the working memory. Unable to operate main memory directly.
- 3 ️ The working memory of each thread is thread private and cannot be accessed by other threads.
Intermemory operations
The JVM defines eight behaviors for how memory interactions between main memory and working memory occur:
operation | scope | instructions |
---|---|---|
lock | Main memory | Identifies a variable as being occupied exclusively by a thread |
unlock | Main memory | Unlocks the thread from this variable |
read | Main memory | Transfer variable values from main memory to working memory |
load | The working memory | Store the read to variable value in the working memory copy |
use | The working memory | Variable values in working memory are passed to the execution engine for code use |
assign | The working memory | Assigns the value obtained by the execution engine to a variable in working memory |
store | The working memory | Transfer values from working memory to main memory |
write | Main memory | Writes the value of the store transfer to main memory |
The JVM requires that read precedes LOAD, and store precedes write. This is just sequential, not sequential, and can be interleft with other operations.
In addition, there are the following rules:
- 1 read, load is not allowed on ️; Store and write appear separately.
- 2 ️ does not allow a thread to discard its most recent assign operation.
- 3 ️ does not allow a thread to synchronize working memory back to main memory without assign.
- 4 ️ A new variable can only be born autonomously and an uninitialized variable is not allowed to be directly used in the working memory.
- 5 8% ️lock supports only a single thread and lock reentry.
- 6 ️ Locking on a variable clears the value of the variable corresponding to the working memory of the executing thread.
- 7 ️ Unlock must be preceded by a lock, and variables locked by other threads cannot be unlocked.
- 8 ️ Variables must be synchronized before unlock is executed.
Special rules for volatile
They are volatile. Java has a separate set of rules for volatile variables.
What properties does a variable have when it is defined as volatile?
- Visible to all threads: means that when any thread makes a change to it, the other threads immediately see the update.
- Disable reorder: means that reorder optimization for this variable is disabled, which renders serialization not only within a thread, but also across multiple threads (see below if you don’t understand).
The first is that each thread reads a variable from the working memory and writes to the working memory first. However, we know that the three steps of the read/write operation of ordinary variables can be interwoven with other operations, so it may not read/write in a timely manner, or simply read/write to their own working memory instead of reading/writing to main memory, so that the update operation cannot be seen/implemented in a timely manner.
Volatile variables are forced to read->load->use; No other operations are allowed between assign->store->write and must be completed together without waiting. In this way the visibility principle can be implemented.
Many people mistakenly think that this guarantees concurrency (including me at first). If you think about it, however, volatile only guarantees that any read that begins after an update can read that value. This implies that volatile read/write operations are atomic, and that any read that begins after a write must read the new value.
But does this really guarantee concurrency? In the case of increment, increment is actually broken down into three steps: read, +1 and save, and assign. If two threads are reading at the same time, they must read the same value, and their reads precede the writes, so they cannot read the new values written by the other thread. And this is the root cause. The problem is solved, so volatile guarantees atomic writes, but not concurrency.
So it’s safe to assume that volatile requires that variables be written independently of their current value, and that’s true.
As for the second, we can imagine the following situation:
boolean flag = false;
// Run on thread A
public void f(a) {
// Work, do some time-consuming work
boolean flag = true;
}
// Run on thread B
public void ff(a) {
while(! flag) {// Work, do other work}}Copy the code
The above code, the ff rely on the flag state, but because the instruction reordering (instruction reordering is in pursuit of assembly line load, the will to reordering of instructions, rearrangement before and after the rearrangement has no effect on the results, but this is only a single thread), may flag before they can be time-consuming work to perform value is true. Then ff might not be implemented at all. This reordering certainly doesn’t affect thread A, but it does in multiple threads.
The prohibition of reordering instructions is implemented through a memory barrier. How to do this is a matter of CPU and hardware knowledge. The JMM provides its own set of memory barriers and can be supported by modern processors.
We can hazard a guess about the special rules that volatile is given by the JVM to operate variables:
- Volatile variables cannot be assigned ->store->write with read->load->use, unlike normal variables. This ensures that every write is written to main memory.
- Read ->load->use of volatile variables may not be assigned ->store->write. This ensures that every read must be read from main memory.
- Write operations on volatile trigger the invalidation of variables in other threads’ working memory, forcing them to read main memory.
- Set memory barriers so that operations following volatile variables cannot be queued.
To sum up:
- Assignments to volatile must be written back to main memory immediately.
- Each time a volatile variable is read, it must be read from main memory so that the latest value can be read.
- Volatile variables are not reordered.
Atomicity, visibility, orderliness
Atomicity: Reads and writes to basic data types are generally considered atomicity, and synchronized code blocks are also atomicity.
Visibility: when one thread modifies a variable, it is immediately visible to other threads, except for volatile, final and synchronized code blocks. For Synchronized, think about the JMM requirement for unlock operations mentioned earlier.
Order: in addition to the order that volatile prohibits reordering commands, synchronized also provides order; It should be noted here that synchronized modifies code blocks that can be reordered within them, and that their order is different from that of volatile. Synchronized emphasizes orderly entry between threads into critical sections, and volatile emphasizes that variable reads/writes are not reordered.
Antecedent principle
With the antecedent principle, it is easy to resolve all questions about whether there is a conflict between two threads. Two operations are thread-safe if they occur in advance of each other, or if they can be deduced. Take a look at some of the “natural” antecedent principles that exist in Java.
- Rule of program order: the flow of control written in front precedes the flow of control that follows, ⚠️ this refers to the flow of control, not the flow of instructions, such as if-else, for, while and so on.
- The lock rule: the unlock that starts with the lock precedes the lock that starts with the unlock. In other words, if the unlock starts before the lock, then the lock is triggered only after the unlock has finished. The entire unlock is atomic.
- The rule of volatile: A read on a volatile occurs before a write; that is, if the write begins before the read, then the read must end.
- Thread start rule: Thread::start precedes any other operation in the Thread.
- Thread termination rule: Any method in the thread after the termination detection in the thread.
- Thread interrupt rule: Interrupt calls in a thread precede interrupt detection in a method.
- Object finalization rule: Initialization of an object starts before Finalize ().
- Transitivity: If A precedes B and B precedes C, then A precedes C.
The Java and thread
Java threads are implemented using kernel threads, which are not exactly part of a process; each thread is a scheduling entity. However, it is too expensive to use kernel threads directly, so a mini thread of lightweight process was born on top of this, each lightweight process corresponds to a kernel thread, so there is such a relationship:
Since Java thread implementations are kernel threads, thread scheduling, allocation, waking, and blocking are left entirely to the OS.
Java defines six states for a thread, and a thread can only be in one of them at a time:
Waiting, Blocked and TimeWaiting need to be explained here. When the wait() method is called, Waiting is entered; When the lock is not acquired, it is Blocked. When thread.sleep () or lockSupport.park () is called, a TimeWaiting session is entered.
Thread safety
First look at the definition of thread safety: when multiple threads access to an object at the same time, if don’t have to consider these threads in the runtime environment scheduling and execution alternately, also do not need to undertake additional synchronization, or any other coordinated operation in the caller, call the object’s behavior can get the right results, it is said that an object is thread-safe.
Thread safety in Java
Before discussing thread-safety in Java, let’s classify variables shared between threads into classes:
- The immutable. In the purest and simplest form of thread-safety, final-modified variables cannot be changed and are absolutely thread-safe.
- Absolutely thread-safe. Classes that meet this requirement are guaranteed to be thread-safe no matter how the caller calls them, but the vast majority of classes in Java are relatively thread-safe.
- Relatively thread-safe. This is consistent with most of the class libraries in Java
- Thread compatibility. This requires the caller to be thread-safe, which is what most of Java’s undeclared thread-safe libraries are, such as HashMap, ArrayList, and so on.
- Thread opposites. However guaranteed, it cannot be used in multithreading, which is rare in Java and should be avoided.
Thread safe implementation
The mutex synchronization
Exclusion is the method, synchronization is the goal. In Java, the most basic mutex synchronization is the synchronized keyword, along with reentrant locks and a bunch of utility classes under the java.util. Lock package. Another of my articles has a detailed use, you can see in the home page.
These are heavyweight locks and should be avoided in general.
Non-blocking synchronization (no locking)
Mutex synchronization belongs to the pessimistic policy. All locks used in this policy are pessimistic locks. Pessimistic locks assume that competition will occur every time. So the question is, if there’s less competition, or no competition at all, isn’t locking a burden? Are there any good solutions?
The answer is a lock-free implementation, or non-blocking synchronization. Here, we only discuss Java’s CAS(Compare And Set) comparison And setting. The CAS, implemented with X86 instructions, takes three parameters, the first is the memory address of the expected value, the second is the old value to be expected, and the third is the new value to be set. If and only if the value in the memory address is equal to the old value, the new value will be replaced. If multiple operations are performed simultaneously, only one operation will be successful, and this operation will be returned immediately. If the operation is set successfully, it will return true, otherwise false(indicating that no other threads competed).
Now that we know that this operation returns immediately, and that only true/false returns, we can write a loop that tries again when false is returned, so we’ll keep trying to set the new value until success. At this point, the thread is not blocked, and the single access operation is implemented.
The downside is that if there is a lot of competition, or if the lock is held for a long time, the CPU is wasted because the polling process itself is a spin lock, idling the CPU.
Sure, CAS can cause ABA problems, but most of the time ABA problems don’t affect use, and even if they do, it’s better to stick with sync locks.
Asynchronous scheme
The most typical is ThreadLocal, which makes variables unique to each thread.
The other one is the reentrant code, and the reentrant code is the code that, no matter how many times it’s called, as long as the input is the same, the output is the same, we can say that this code is reentrant.
Lock the optimization
Spin locks and adaptive spin locks
Spin-locking is idling the CPU, waiting for the lock to be released. Why do you do that? Because sometimes a synchronized block of code may execute very quickly, a thread may only need to wait a few microseconds for the lock to be released. There is no need to do thread blocking until the lock is released, because thread scheduling is also costly. Then I can run the thread in an infinite loop for a while and apply again later.
The adaptive spin lock is to determine the spin duration through the analysis of historical information on the basis of spin lock, so that the waiting time of spin is more “suitable”.
Lock elimination
Lock elimination relies primarily on escape analysis. If a code or a shared variable is not accessible by other threads, then synchronization on that variable can be cancelled.
Lock coarsening
In addition to reducing the scope of synchronization as much as possible, it is sometimes possible to combine multiple synchronized blocks with close scope into a single synchronized block, since locking and unlocking are also costly.
Biased locking
In less competitive scenarios, the same lock may be acquired again by the same thread, which is the theoretical support for biased locking.
Switching between biased and lightweight locks and heavyweight locks requires an understanding of how the locking mechanism works. Once again, we introduce the Java object header layout:
- MarkWord: Contains GC generation age, hashCode(which the JVM is responsible for generating), and other information.
- Type pointer + array length (array type only) : Points to the type of the object in the method area.
When we lock, we are actually locking an object. When we acquire the lock, we acquire the lock of the object. The information about the lock of the object is in the object header.
The object header also contains information about what type of lock should be used to lock the object, whether it is biased or not.
To lock in access to resources will record on resources object when the object is biased towards the thread, biased locking will not take the initiative to release, every time such biased locking into will determine whether the resources towards their own, if it is to their own do not require additional operations, can be directly into the synchronous operation.
Bias lock acquisition process:
- 1 whether the bias lock flag bit in the 8% ️ access Mark Word is set to 1 and whether the lock flag bit is 01 — it is confirmed to be biased.
- 2 if the state of 8% ️ is biased, test whether the thread ID points to the current thread; if yes, enter Step 5 ️; otherwise, enter step 3 ️.
- 3 ️ If the thread ID does not point to the current thread, the lock is contested by CAS operation. If the competition is successful, set the thread ID in Mark Word to the current thread ID, and then execute 5 ️; If the competition fails, go to 4 ️.
- 4 ️ If CAS acquisition is biased to lock and fails, it indicates competition. When safepoint is reached, the thread that acquired the biased lock is suspended, the biased lock is upgraded to a lightweight lock, and the thread that was blocked at the safepoint continues to execute the synchronization code.
- 5 ️ Executes the synchronization code.
Lightweight lock
Lightweight lock acquisition process:
- 1 ️ When the code enters the synchronization block, if the Lock state of the synchronization object is biased, the virtual machine will first establish a space named Lock Record in the stack frame of the current thread to store the copy of the current Mark Word of the Lock object. The government is calling the Lock Record and the Swat Mark Word officially. The state of the thread stack and object header is as follows:
- 2 After successful copy of ️, the VM will use CAS operation to try to update the Mark Word of the object header to the pointer pointing to Lock Record, and point the owner pointer in Lock Record to the Mark Word of the object header. If the update is successful, perform Step 3 ️; otherwise, perform Step 4 ️.
- 3 ️ If the update action is successful, the thread will own the lock of the object, and the lock flag bit of the object Mark Word is set to “00”, which means that the object is in lightweight locked state. At this time, the state of the thread stack and object head is as follows:
- 4 ️ If the update operation fails, the virtual machine will first check whether the Mark Word of the object points to the stack frame of the current thread, if so, it means that the current thread has owned the lock of the object and it is in the reentrant state. The first part of Lock Record (Swat Mark Word) is set to NULL, which plays the role of a reentrant counter. The following figure shows the lock record when reentrant occurs three times. On the left is the lock object, and on the right is the stack frame of the current thread. After reentrant, it ends. You can then proceed directly to the synchronization block.
- 5 ️ If it does not mean that the lock object has been preempted by other threads, which means that there are multiple threads competing for the lock at this time, then it will spin and wait for the lock. After a certain number of times, the lock object still hasn’t been acquired, which means that competition has occurred and needs to be expanded into a heavyweight lock.
Lightweight lock unlocking process:
- 1 ️ attempts to replace the current Mark Word object copied in the thread through CAS operation.
- 2 ️ If the replacement is successful, the whole synchronization process is complete.
- 3 ️ If the replacement fails, it means that other threads have tried to acquire the lock (the lock has expanded at this time), so the suspended thread should be awakened at the same time of releasing the lock.
Finally, the lock upgrade is one-way, which can be: biased lock -> lightweight lock -> heavyweight lock.
Finally, my ability is limited, see a good article about the lock optimization explanation, put here, for readers to read.