1 source
- Source: Java high Concurrency programming in detail multithreading and architecture design, by Wang Wenjun
- Chapters: Chapters 12 and 13
This paper is a compilation of notes in two chapters.
2 CPU
The cache
2.1 Cache Model
All operation of the computer operation is done by the CPU, CPU instruction execution process need to involve data read and write operation, but can only access the data in the memory, CPU and the speed of memory and CPU speed is far wrong etc, have cache model, namely joined the buffer layer between the CPU and memory. The general modern CPU cache layer is divided into three levels, respectively called L1 cache, L2 cache and L3 cache. The schematic diagram is as follows:
L1
Cache: Level 3 cache has the fastest access speed, but the smallest capacityL1
Caches are also divided into data caches (L1d
.data
Initial) and instruction cache (L1i
.instruction
The first letter)L2
Cache: speed ratioL1
Slow, but volume ratioL1
Large in modern multicoreCPU
,L2
They are generally monopolized by mononuclear nucleiL3
Cache: The slowest of the three levels of cache, but the largest capacity, modernCPU
There are alsoL3
It’s a multi-core shared design, likezen3
Architecture design
The emergence of cache is to solve the poor efficiency of CPU to directly access memory, CPU for operation, will need to copy a data from main memory into the cache, because the cache access faster than memory, at the time of calculation need only read the cache and will update to the cache as a result, end of the operation and then flushed to main memory as a result, In this way, the calculation efficiency is greatly improved. The overall interaction diagram is briefly shown as follows:
2.2 Cache Consistency
Although the advent of caching greatly improved throughput, it also introduced a new problem, namely cache inconsistency. For example, the simplest i++ operation requires a copy of the memory data to the cache, the CPU reads the cache value and updates it, first writing the value to the cache, and then refreshing the new value to the memory after the operation is complete. The specific process is as follows:
- Read from memory
i
In the cache CPU
Read cachei
The values in the- right
i
Add one - Write the result back to the cache
- The data is then flushed to main memory
This I ++ operation is not a problem in a single thread, but in multithreading, because each thread has its own working memory (also called local memory, which is the thread’s own cache), the variable I has a copy in the local memory of multiple threads. If there are two threads performing the I ++ operation:
- Let’s say two threads are A and B, and let’s say
i
The initial value is 0 - Thread A reads from memory
i
Is put into the cachei
Thread B has a value of 0 in the cache - When the two threads increment at the same time, the cache of thread A and thread B,
i
All of these values are 1 - Two threads will
i
Write to main memoryi
Is assigned to 1 twice - The end result is
i
A value of 1
This is a typical cache inconsistency problem. The main solutions are:
- The bus lock
- Cache consistency protocol
2.2.1 Bus locking
This is a pessimistic implementation, in which the processor issues a lock instruction that locks the bus, which, upon receiving the instruction, blocks requests from other processors until the processor holding the lock completes its operation. The characteristic is that only one processor that grabs the bus lock runs, but this way is inefficient. Once one processor gets the lock, other processors can only block and wait, which will affect the performance of multi-core processors.
2.2.2 Cache Consistency Protocol
The illustration is as follows:
The best known of the cache consistency protocols is the MESI protocol, which ensures that copies of shared variables used in each cache are consistent. The idea is that when the CPU operates on data in the cache and finds that the variable is a shared variable, it does the following:
- Read: Do nothing but read data from the cache into a register
- Write: sends a signal to inform others
CPU
Set the variable’s cache line to an invalid state (Invalid
), otherCPU
To read this variable, it needs to be fetched again from main memory
Specifically, MESI specifies that the cache line is marked with four states:
M
:Modified
To be modifiedE
:Exclusive
, exclusiveS
:Shared
, the SharedI
:Invalid
, invalid
Detailed MESI implementations are beyond the scope of this article, and can be found here or here.
3 JMM
After looking at the CPU cache, the JMM, or Java memory model, specifies how the JVM works with the computer’s main memory. It also determines when writes to shared variables by one thread are visible to other threads. The JMM defines an abstract relationship between threads and main memory, as follows:
- Shared variables are stored in main memory and can be accessed by each thread
- Each thread has private working memory, or local memory
- Working memory stores only the thread’s copy of the shared variable
- Threads cannot operate on main memory directly. They cannot write to main memory until they have operated on working memory first
- Working memory sum
JMM
The memory model is also an abstract concept that doesn’t really exist, covering caches, registers, compile-time optimizations, and hardware
The schematic diagram is as follows:
Similar to MESI, if one thread changes a shared variable and flusher it to main memory, other threads reading from working memory discover that the cache is invalid and read from main memory to working memory again.
The following diagram shows the relationship between the JVM and computer hardware allocation:
Three features of concurrent programming
We’re halfway through the article and we’re not at Volatile? Before you wait, let’s take a look at three important features of concurrent programming that will help you understand volatile correctly.
4.1 atomic
Atomicity is that in one or more operations:
- Or all operations are performed without interruption by any factor
- Or none of the operations are performed
A typical example is A transfer between two people. For example, A transfers 1000 yuan to B. Then there are two basic operations:
- 1000 yuan was deducted from A’s account
- B’s account is increased by 1000 yuan
So either they both succeed or they both fail, so you can’t have A situation where you have 1000 deducted from account A but you have the same amount in account B, and you can’t have A situation where you have the same amount in account A and you have 1000 added to account B.
Note that two atomic operations combined are not necessarily atomic, such as i++. Essentially, i++ involves three operations:
get i
i+1
set i
All three operations are atomic, but taken together (i++) are not.
4.2 the visibility
Another important feature is visibility, which means that if one thread makes a change to a shared variable, other threads can immediately see the updated value.
A simple example is as follows:
public class Main {
private int x = 0;
private static final int MAX = 100000;
public static void main(String[] args) throws InterruptedException {
Main m = new Main();
Thread thread0 = new Thread(()->{
while(m.x < MAX) { ++m.x; }}); Thread thread1 =new Thread(()->{
while(m.x < MAX){
}
System.out.println("finish");
});
thread1.start();
TimeUnit.MILLISECONDS.sleep(1); thread0.start(); }}Copy the code
Thread1 will always run, because thread1 reads x into the working memory and determines the value of the working memory. Since thread0 changes the value of the working memory, it is not visible to thread1, so it never prints finish.
4.3 order
Orderliness refers to the order in which code is executed. Due to JVM optimizations, the order in which code is written is not necessarily the order in which the code is run, as in the following four statements:
int x = 10;
int y = 0;
x++;
y = 20;
Copy the code
It is possible that y=20 is executed before x++, which is instruction reordering. In general, the processor in order to improve the efficiency of the program, is likely to enter the code instruction to do some optimization, not in strict accordance with the written order to execute the code, but can ensure that the final result is code and the expected results, of course, reorder also has certain rules, need to strictly abide by the data dependencies between instructions and not being able to sort, such as:
int x = 10;
int y = 0;
x++;
y = x+1;
Copy the code
Y =x+1 cannot be executed better than x++.
Reordering in a single thread does not change the expected value, but in multithreading, if orderliness is not guaranteed, it can be very problematic:
private boolean initialized = false;
private Context context;
public Context load(a){
if(! initialized){ context = loadContext(); initialized =true;
}
return context;
}
Copy the code
Initialized =true sort before context=loadContext(), assuming that two threads A and B are accessing simultaneously and loadContext() takes some time, then:
- After thread A passes the judgment, it sets the Boolean variable to
true
, and then toloadContext()
operation - Thread B because the Boolean variable is set to
true
, will directly return an unfinishedcontext
5 volatile
So, finally, volatile. The goal of all this is to understand and understand volatile once and for all. This section is divided into four sections:
volatile
The semantics of the- How do you ensure order and visibility
- Realize the principle of
- Usage scenarios
- with
synchronized
The difference between
Let’s start with the semantics of volatile.
5.1 the semantic
Instance variables or class variables that are volatile have two levels of semantics:
- This ensures visibility between threads operating on shared variables
- Disallow reordering of instructions
5.2 How to ensure visibility and order
First the conclusion:
volatile
Visibility is guaranteedvolatile
Order is guaranteedvolatile
Atomicity is not guaranteed
The following are introduced respectively.
5.2.1 visibility
In Java, you can ensure visibility by:
volatile
: When a variable isvolatile
When decorating, read operations to a shared resource are performed directly in main memory (and, to be exact, into working memory, but must be re-read from main memory if other threads make changes). Write operations modify working memory first, but flush to main memory immediately after modificationsynchronized
:synchronized
It also ensures visibility, ensures that only one thread at a time obtains the lock, then executes the synchronization method, and ensures that changes to variables are flushed to main memory before the lock is released- Use explicit locks
Lock
:Lock
thelock
Method ensures that only one thread at a time can acquire the lock and then execute the synchronized method, and that changes to variables are flushed to main memory before the lock is released
To be specific, take a look at the previous example:
public class Main {
private int x = 0;
private static final int MAX = 100000;
public static void main(String[] args) throws InterruptedException {
Main m = new Main();
Thread thread0 = new Thread(()->{
while(m.x < MAX) { ++m.x; }}); Thread thread1 =new Thread(()->{
while(m.x < MAX){
}
System.out.println("finish");
});
thread1.start();
TimeUnit.MILLISECONDS.sleep(1); thread0.start(); }}Copy the code
If x were volatile, it would not be visible to thread1, because the changes to x would be visible to thread1.
5.2.2 order
The JMM allows compile-time and processor reordering of instructions, which can be problematic in multi-threaded situations. Java also provides three mechanisms to ensure order:
volatile
synchronized
- The explicit lock
Lock
Another thing to mention about orderliness is the happens-before principle. The happends-before principle states that if the order of execution of two operations cannot be deduced from this principle, then order is not guaranteed and the JVM or processor can reorder at will. The purpose of this is to maximize the parallelism of the program, as follows:
- Sequence rule: Within a thread, code is executed in the order in which it is written, written after subsequent actions and written after previous actions
- Locking rule: If a lock is locked
unlock
The operation must first occur on the same locklock
operation volatile
Variable rule: Write to a variable before read to it- Transfer rule: If operation A precedes operation B and operation B precedes operation C, then operation A precedes operation C
- Thread start rule:
Thread
The object’sstart()
Method takes place in advance of any action on the thread - Thread interrupt rule: Execute on thread
interrupt()
Method is definitely better than catching an interrupt signal, in other words, if an interrupt signal is received, it must have been called before theninterrupt()
- Thread termination rule: All operations in a thread must occur prior to thread termination detection, i.e., execution of a logical unit must occur prior to thread termination
- Object finalization rule: the completion of an object initialization occurs first
finalize()
before
For volatile, instruction reordering is directly prohibited, but instructions that have no dependencies on volatile can be reordered at will. For example:
int x = 0;
int y = 1;
//private volatile int z;
z = 20;
x++;
y--;
Copy the code
There’s no requirement to define x or y before z=20, just that x is equal to 0 and y is equal to 1 when you do z=20, or x++ or y– just make sure they do it after z=20.
5.2.3 requires atomic
In Java, all reads and assignments to variables of primitive data types are atomic, as are reads and assignments to variables of reference types, but:
- The assignment of one variable to another is not atomic because it involves reading a variable and writing a variable. The two atomic operations combined are not atomic operations
- Multiple atomic operations together are not atomic operations, for example
i++
JMM
Only basic read and assignment operations are guaranteed atomicity, nothing else is guaranteed, if atomicity is required, it can be usedsynchronized
orLock
Or,JUC
Atomic operation class under the package
That is, volatile does not guarantee atomicity, as shown in the following example:
public class Main {
private volatile int x = 0;
private static final CountDownLatch latch = new CountDownLatch(10);
public void inc(a) {
++x;
}
public static void main(String[] args) throws InterruptedException {
Main m = new Main();
IntStream.range(0.10).forEach(i -> {
new Thread(() -> {
for (int j = 0; j < 1000; j++) { m.inc(); } latch.countDown(); }).start(); }); latch.await(); System.out.println(m.x); }}Copy the code
The final output value of x will be less than 10000, and the result of each run will be different. As for the reason, we can start from two threads A and B, as shown below:
0-t1
: Thread A willx
Read into working memory at this timex=0
t1-t2
: thread A finishes the time slice,CPU
Schedule thread B, and thread B willx
Read into working memory at this timex=0
t2-t3
: thread B in working memoryx
Perform auto-increment and update to working memoryt3-t4
: thread B finishes the time slice,CPU
I’m scheduling thread A, and I’m scheduling thread A in working memoryx
Since the increaset4-t5
: thread A writes the value in the working memory back to the main memoryx=1
t5
After: thread A finishes the time slice,CPU
Schedule thread B, thread B also writes its working memory back to main memory, again into main memoryx
Assignment 1
In other words, with multithreading, there are two increments but actually only one change. It is also easy to change x to 10000 and synchronized:
new Thread(() -> {
synchronized (m) {
for (int j = 0; j < 1000; j++) {
m.inc();
}
}
latch.countDown();
}).start();
Copy the code
5.3 Implementation Principles
We already know that volatile guarantees orderliness and visibility. How does that work?
The answer is a lock; Prefix, which is actually equivalent to a memory barrier. The memory barrier provides the following guarantees for instruction execution:
- Ensure that instruction reordering does not place the following code in front of the memory barrier
- Ensure that instruction reordering does not place the code in front of it behind the memory barrier
- Make sure that all the preceding code completes when the memory barrier modification instruction is executed
- Forces value changes in thread working memory to be flushed to main memory
- If it is a write operation, the cached data in the working memory of other threads will be invalidated
5.4 Application Scenarios
A typical use scenario is to use a switch to close a thread, as shown in the following example:
public class ThreadTest extends Thread{
private volatile boolean started = true;
@Override
public void run(a) {
while (started){
}
}
public void shutdown(a){
this.started = false; }}Copy the code
If the Boolean variable is not volatile, it is very likely that the new Boolean value will not flush into main memory and the thread will not terminate.
5.5 withsynchronized
The difference between
- Differences in use:
volatile
Can only be used to modify instance variables or class variables, but cannot be used to modify methods, method parameters, local variables, etc. Other variables that can be modified arenull
. butsynchronized
Cannot be used to modify variables, can only modify methods or blocks, andmonitor
Object cannot benull
- Assurance of atomicity:
volatile
There’s no guarantee of atomicity, butsynchronized
Can guarantee - Guarantees of visibility:
volatile
withsynchronized
All guarantee visibility, butsynchronized
Is the use ofJVM
instructionmonitor enter
/monitor exit
Promised, inmonitor exit
, all shared resources are flushed to main memory, andvolatile
Is through thelock;
Machine instructions implemented to force other threads working memory invalidation, need to load into main memory - 2. A guarantee of order:
volatile
To prohibitJVM
And the processor reorders it, andsynchronized
The order of the guarantee is obtained by serial execution of the program, and insynchronized
Code in a code block can also undergo instruction reordering - Other differences:
volatile
Does not block the thread, butsynchronized
will