The purpose of this article is to find out that some students are not very clear about the happens-before rule for volatile. This article will take about 7 minutes for those students who have a certain understanding of the atomicity, orderliness, visibility and other concepts of the JMM memory model, but have a vague understanding of volatile
Start with a question: How does ReentrantLock implement the same memory visibility semantics as synchronized locks? That is, the modification of the shared variable value of synchronized operation can be guaranteed to be seen by other threads immediately after the lock is released. Can ReentrantLock be guaranteed and how can it be guaranteed?
Of course it can be guaranteed, otherwise there is no synchronization lock, how to guarantee is the content of this article to talk about
Misconception: Some students think that volatile writes to shared variables only ensure that the current variable is visible in memory, flush the current variable’s cache line back to memory, and invalid other caches
A few necessary ranking explanations
- Atomicity: In Java, the 32-bit JVM’s reads and assignments to 32-bit primitives are atomic, meaning that they are uninterruptible and either all or none are performed. (Volatile also guarantees atomicity for 64-bit long and double reads and assignments in this environment. Note that read and assign are single step operations, not auto-increment calculations that need to be disassembled for multi-step operations.)
- Instruction reordering: Compiler phases and processor reordering of instructions, there is no data dependent instructions can happen reordering, reorder can lead to program errors, the following figure, between 1 and 2, 3, and 4 are likely reorder, how much 4 step operation, the optimization for processor has may consider, first calculate a * a, to judge the if condition, It then decides whether to assign I or not, and if there is no synchronization, the result is likely to be wrong
class ReorderExample {
int a = 0;
boolean flag = false;
public void writer() {
a = 1; // 1
flag = true; // 2
}
public void reader() {
if(flag) { // 3 int i = a * a; // 4}}}Copy the code
-
CPU cache model
1) CPU search process: L1 cache->L2 cache->L3 cache-> main memory (There is also store/ Load Buffer before L1, which will be explained briefly below)
2) Cache line: The cache is composed of cache lines. Generally, a cache line has 64 bytes. The CPU access cache is operated in the smallest unit of cache behavior
3) Due to the existence of the cache structure, if there is no perfect cache consistency protocol guarantee, it will lead to memory visibility problems of multiple threads -
JVM memory model
1) Because threads have copies of variables (including shared variables) in working memory (i.e. local memory in the figure), the execution results of shared variable operations by different threads are uncertain without memory synchronization (cache consistency protocol)
2) Thread working memory is just a conceptual model of the JVM (to accommodate different machine structures and operating systems). JAVA threads are implemented with the help of the underlying operating system thread. One JVM thread corresponds to one operating system thread, and thread working memory is actually an abstraction of CPU registers and cache -
The cache consistency protocol solves the cache inconsistency problem in the following two ways: 1) by locking the bus (inefficient) 2) by the cache consistency protocol (core idea: When a CPU writes data, if it finds that the variable is a shared variable and a copy of the variable exists in other cpus, it sends a signal to inform other cpus to set the cache line of the variable to invalid state. Later, when other cpus need to read the variable, they find that the cache line of the variable is invalid. 3) The cache consistency protocol cannot guarantee the visibility of memory completely, because there are structures such as store Buffer and invalidate queue to improve performance, which affect the visibility of memory. This leads to memory barriers. 4) Without going into the details (I don’t understand the details), search for MESI and why memory barriers are needed
-
The memory barrier
1) Memory barrier: A technique that makes the state of memory in one CPU processing unit visible to other processing units, preventing the reordering of read and write actions in memory
2) Store Memory Barrier: the process of writing the values of the current Store buffer back to main Memory, blocking
3) Load Memory Barrier: The processor processes the invalidate queue in a blocking manner
Note: Above is the JMM memory barrier abstract specification. The JVM inserts different instructions for different operating systems to achieve the desired memory barrier effect
4) LoadLoad: Ensure that data to be read by Load1 can be read by Load2 and subsequent load instructions. Processors that can execute preload instructions or/and support out-of-order processing typically require an explicit declaration of a Loadload barrier, because waiting load instructions in these processors can bypass waiting storage instructions, and on processors that always guarantee processing order, this barrier is equivalent to no operation
5) StoreStore: Ensure that Store1 data is visible to other processors (e.g. refreshing data to main memory) before Store2 and subsequent Store instructions operate on related data. In general, a processor needs to use StoreStore barriers if it cannot guarantee sequential flushing of data from write buffers or/and caches to other processors and main memory
6) LoadStore: Ensure that Load1 data is read before Store2 and subsequent Store instructions are refreshed. LoadStore barriers are required on out-of-order processors that wait for Store instructions to bypass loads
7) StoreLoad Barriers: Ensure that data from Store1 is visible to other processors before it is read by Load2 and subsequent Load instructions. The StoreLoad barrier prevents a subsequent load instruction from incorrectly using Store1 data, rather than another processor writing new data to the same memory location
8)As you can see in the figure above, X86 only operates on the StoreLoad barrier, and the underlying implementation of other barriers is no-OP
9) Under x86 architecture, Volatile write operation in assembly code (open virtual machine – XX: + UnlockDiagnosticVMOptions – XX: + PrintAssembly – XX: MaxInlineSize = 0 parameter to check the assembly code) to insert a StoreLoad barrierlock addl $0x0,(%rsp)
Summary: multithreading disorder summary
- Out of order caused by instruction reordering, which is resolved by blocking instruction reordering through the memory barrier
- The internal cache technology adopted by modern processors causes that the changes of data cannot be timely reflected in the out-of-order of main memory, which is solved by cache consistency protocol and memory barrier
In order to ensure correct execution of multithreaded programs, the JMM defines the happens-before rule, which reordering needs to follow
Happens-before rules
- Procedure order rules: A piece of code in a single thread of execution as a result of the sequence of (under the condition of only a single thread execution procedures, though is likely to reorder the instructions, the results of the thread end still is consistent with the results of the program order, because the instruction reordering will only there is no data dependent instructions for reordering, therefore, under the condition of a single thread, Programs appear to execute in order, but this rule does not guarantee correctness in multithreaded execution.
- Monitor lock rule: unlock a monitor happens-before each subsequent lock on the same monitor (under either single-thread or multi-thread execution conditions, the same lock in the locked state must wait for the thread holder to release the lock before other threads can compete for the lock again)
- Rule for volatile variables: Writes to a volatile field are happens-before subsequent reads to the same volatile. (A thread can always immediately read the last write to a volatile variable by itself or another thread.)
- 4. If A happens-before B and B happens-before C, then A happens-before C
The underlying implementation of volatile ensures the happens-before rule with the help of memory barriers and cache consistency protocols
Going back to the original question, how does ReentrantLock implement the memory visibility semantics of locks?
- The ReentrantLock synchronization mechanism requires locking () to acquire the lock, then entering the synchronization block, and finally exiting the synchronization by calling unlock() in the finally block
- The lock/unlock () () in the operation with the aid of AbstractQueuedSynchronizer volatile int state variables
- The underlying lock() call to the Unsafe class compareAndSwapInt(), which is atomic and has the same read-write semantics as volatile, so other threads can immediately see the state change
- In unlock(), state is reduced by one, and assignment of int is atomic and volatile, so other threads can see the state change immediately
- According to the happens-before rule (1, 3, 4), memory changes in synchronized blocks are also immediately visible to other threads, thereby implementing the memory visibility semantics of the lock
- Similarly, in a multithreaded environment, reader() guarantees x to be 42
class VolatileExample {
int x = 0;
volatile boolean v = false;
public void writer() {
x = 42;
v = true;
}
public void reader() {
if (v == true) {
//uses x - guaranteed to see 42.
}
}
}
Copy the code