Learn more about volatile, memory barriers, and happens-before rules

Hello everyone, I practice Java for two and a half years of the south orange, from a Java even have several data structures do not understand super white, to now understand a little bit of advanced white, learned a lot of things. The more knowledge is shared, the more valuable it is. During this period of time, I summarized (including learning from other leaders, quote) some key points in my daily study and work (in my own opinion), hoping to bring some help to you

These are the three previous articles on the JVM for those of you who haven’t read them

Overcoming the JVM — JVM Objects and Their Access location (PART 1)
Garbage Collection for JVM (part 2)
JVM garbage Collector (part 3)

If you need it, you can add it to my official account, where the latest articles will be found in the first time. You can also ask me for mind map

As you all know, in the Alibaba Development manual, when implementing singletons with lazy initialization in concurrent cases, the target attribute is declared as volatile.

The volatile keyword is used in Java to ensure visibility of variables and prevent instruction reordering.

1. Ensure visibility of variables

Before we know how volatile ensures visibility of variables, we need to know two reasons why memory is not visible:

1. The running speed of CPU is much faster than the reading and writing speed of memory. In order to prevent CPU from waiting for reading and writing data from memory, there is a cache cache (actually a multi-level register) between modern CPU and memory, as shown below:

While running, a thread copies data from its main memory to its internal cache. If thread B loads data into the internal cache, thread A modifies the data. Thread B does not reload variables from main memory and sees variables from its cache, so thread B cannot read the updated values of thread A.

On a multiprocessor, in order to ensure that each processor cache is consistent, will realize the cache coherence protocol, each processor by sniffing the spread of the data on the bus to check the value of the cache is expired, when the processor found himself cache line corresponding to the memory address has been changed, and will be set for the current processor cache line in invalid state, When the processor modifies the data, it reads the data from system memory back into the processor cache. A volatile variable is a mechanism by which each thread can obtain its latest value. However, we also know that volatile guarantees visibility, not atomicity. Multiple threads reading the value of this shared variable at the same time, even if the visibility of other threads’ changes is guaranteed, there is no guarantee that threads will read the same value and then overwrite each other’s value.

Second, to prevent rearrangement of instructions

Let’s look at the reordering.

1, define,

Instruction reordering means that during program execution, the compiler and CPU may reorder instructions for performance reasons.

Before introducing instruction rearrangement, let’s first introduce eight kinds of instructions for memory interaction. Virtual machine implementations must ensure that every operation is atomic and non-divisible (exceptions are allowed for load, store, read, and write operations on some platforms for double and long variables).

instruction	content
Lock up	A variable acting on main memory that identifies a variable as thread-exclusive
Read (read)	Function on a main memory variable, which transfers the value of a variable from main memory to the thread’s working memory for subsequent load action
Load	Variables that operate on working memory and place the read operation on variables from main memory into a copy of the variables in working memory
Use STH.	Function on a variable in working memory, which transfers the variable in working memory to the execution engine and performs this operation whenever the virtual machine reaches a bytecode instruction that needs to use the value of the variable
Assign	Function on a variable in working memory, which assigns a value received from the execution engine to a copy of the variable in working memory. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable
Store	Function on a variable in working memory, which passes a value from a variable in working memory to main memory for subsequent write use
Write (write)	Applied to a variable in main memory, which puts the value of a variable in main memory that the store operation fetched from working memory
UNLOCK	A variable applied to main memory that frees a locked variable before it can be locked by another thread
As shown in the figure:

Since an operation can be broken down into many steps, multiple instructions are not necessarily executed sequentially, because it is inefficient to execute one instruction at a time. Just as we learned how to allocate time for tasks like cooking food and boiling water when we were young, memory allocates time wisely.

I wanted to give you a whole example of reordering, but my computer couldn’t make it reorder, whether I wrote it or someone else’s code. But we all know that instruction reordering does exist (cpus do reorder, but that reordering can’t be observed or controlled).

General reordering can be divided into the following three types:

Reordering of compiler optimizations. The compiler can rearrange the execution order of statements without changing the semantics of single-threaded programs.
Instruction level parallel reordering. Modern processors use instruction-level parallelism to superimpose multiple instructions. If there is no data dependency, the processor can change the execution order of the corresponding machine instructions.
3. Memory system reordering. Because the processor uses caching and read/write buffers, this makes it appear that load and store operations may be performed out of order.

2, the principle of

Let’s look at the code before and after volatile, using the double-checked lock code provided by the Ali protocol. We compiled the first one with no volatile and the second one with volatile, and then took out the assembly code for them.

The volatile modifier is not used

  0x000000010d29e93b: mov    %rax,%r10
  0x000000010d29e93e: shr    $0x3,%r10
  0x000000010d29e942: mov    %r10d,0x68(%rsi)
  0x000000010d29e946: shr    $0x9,%rsi
  0x000000010d29e94a: movabs $0xfe403000,%rax
  0x000000010d29e954: movb   $0x0,(%rsi,%rax,1) 
                                                
Copy the code

Use volatile modifier

  0x0000000114353959: mov    %rax,%r10
  0x000000011435395c: shr    $0x3,%r10
  0x0000000114353960: mov    %r10d,0x68(%rsi)
  0x0000000114353964: shr    $0x9,%rsi
  0x0000000114353968: movabs $0x10db6e000,%rax
  0x0000000114353972: movb   $0x0,(%rsi,%rax,1)
  0x0000000114353976: lock addl $0x0,(%rsp)    
Copy the code

Lock addL $0x0,(% RSP),(% RSP). This operation acts as a memory barrier. Instructions cannot be reordered to the location before the barrier. The lock prefix enforces an atomic operation that causes the local CPU’s cache to be written to memory and causes another CPU to invalidate its cache. So by doing this null operation, you make the previous volatile variable visible to other cpus.

From the point of view of hardware architecture, instruction reordering means that THE CPU will send multiple instructions to the corresponding points not in the order specified by the program, but not any instruction reordering. The CPU needs to be able to correctly process instructions to ensure that the program can get the correct execution results. When the lock addL $0x0 (% RSP) instruction synchronizes changes to memory, it means that all valuable operations have already been performed, thus creating the effect that instruction reordering cannot cross the memory barrier.

Memory barrier

Since instruction reordering and visibility depend on Lock, and the Lock instruction introduces a memory barrier, let’s learn what a memory barrier is.

1, define,

Memory barriers: Ensures that the read/write in front of the barrier must be executed before the read/write behind the barrier, notifies Volatile values, that each read is read from main memory, and that each write is written synchronously to main memory.

Memory barriers are classified into write barriers and Store Memory barriers. This is also called a smP_wMB Load Memory Barrier, which enforces writing to the cache or writes subsequent to this command until the previous content is flushed to the cache: Force the completion of an invalid queue (invalidated after volatile writes), also known as SMP_RMB

Barrier type	Order sample	instructions
LoadLoadBarriers	Load1; LoadLoad; Load2	This barrier ensures that Load1 data is loaded before Load2 and all subsequent load instructions
StoreStoreBarriers	Store1; StoreStore; Store2	This barrier ensures that Store1 immediately flushers data to memory (making it visible to other processors) before Store2 and all subsequent operations to store instructions

LoadStoreBarriers	Load1; LoadStore; Store2	Ensure that Load1 loads data before Store2 and all subsequent store instructions flush data into memory

StoreLoadBarriers	Store1; StoreLoad; Load1	This barrier ensures that Store1 immediately flusher data to memory before Load2 and all subsequent loading instructions. It makes all memory access instructions (store instructions and access instructions) prior to the barrier complete before executing memory access instructions behind the barrier

2, the principle of

Memory barriers in Java:

1. After volatile reads, all variable reads and writes are not reordered to the front.
2. All volatile reads and writes are complete before volatile reads.
3. After volatile writes, volatile variable reads and writes are not reordered to the front.
4. All variable reads and writes are complete before volatile writes.

According to JMM rules and related analysis of memory barrier, the following conclusions are drawn:

1. Insert a StoreStore barrier before each volatile write. This ensures that all previous normal writes are flushed to memory before volatile writes are performed.
Insert a StoreLoad barrier after each volatile write. This avoids reordering volatile writes from potentially volatile reads and writes that follow.
Insert a LoadLoad barrier after each volatile read. This prevents reordering of volatile reads and subsequent plain reads.
Insert a LoadStore barrier after each volatile read. This prevents reordering of volatile reads and subsequent plain writes.

As shown below:

3, as-if-serial semantics

But with volatile, the speed of the program must be affected. So when does reordering not occur other than with volatile? This is where the as-if-serial semantics are introduced.

The as-if-serial semantics mean that the execution result of a (single-threaded) program cannot be changed, no matter how much reordering is done (to provide parallelism by the compiler and processor).

If two operations access the same variable, and one of them is a write operation, then the two operations have data dependence. Read write; 2. Write after writing. 3. Data dependence exists in all three operations. If reordering has an effect on the final result, the compiler and processor will adhere to the data dependence when reordering, and the compiler and processor will not change the order of execution of the two operations that have a data dependence relationship.

int a=1;
int b=2;
int c =a+b;
Copy the code

The as-if-serial semantics protect single-threaded programs. Compilers, runtime, and processors that adhere to the AS-IF-serial semantics together create the illusion for programmers who write single-threaded programs that single-threaded programs are executed in sequence. For example, the code calculated above, in a single thread, would look like the code is executed line by line, but in fact there is no data dependence on lines A and B and might be reordered, i.e. a and B are not executed sequentially. The as-IF-serial semantics let programmers not worry about reordering problems in a single thread interfering with them, nor about memory visibility issues.

At the end of the day, as-IF-Serial semantics are just a basic architectural definition that can be compared to earth’s oxygen ratio of about 21%.

Reordering can be divided into two categories:

Reordering that alters the results of program execution.

Reordering that does not change the results of program execution.

The JMM takes a different approach to these two different types of reordering.

The JMM requires that the compiler and processor must forbid reordering that changes the results of program execution.
The JMM has no compiler or processor requirements for reordering that does not change the result of program execution (the JMM allows optimized reordering)

Volatile encapsulates memory semantics to enable sequential and visible reading and writing of volatile keywords. It guarantees what we call visibility in multiple threads, but there is no way to guarantee synchronization of changes in multiple threads, because synchronization has to satisfy atomicity in addition to order and visibility.

Happens-before rule

In the Java memory model, ordering can be ensured by volatile and synchronized, but if all ordering is done by these two keys alone, some operations become cumbersome, but we don’t feel that way when we write Java code. This is because the Java language has a happens-before principle. So what exactly is happens-before?

The concept of happens-before was first developed by Leslie Lamport in his seminal paper “Time, Clocks and the Ordering of Events in a Distributed System.” Jsr-133 (the JavaTM memory model and threading specification, developed by the JSR-133 expert group) uses the concept of happens-before to specify the order of execution between two operations.

1, define,

Happens-before means that the result of the previous operation is visible to subsequent operations. It is a way of expressing the visibility of memory between multiple threads. So we can assume that in the JMM, if the result of one operation needs to be visible to another, there must be a happens-before relationship between the two operations.

The specific definition is:

1. If an action happens-before another, the execution result of the first action is visible to the second action, and the first action takes place before the second action.
2. The existence of a happens-before relationship between two operations does not mean that Java platform implementations must be executed in the order specified by the happens-before relationship. The reorder is not illegal (that is, the JMM allows it) if the result of the reorder is the same as the result of the happens-before relationship.

8 Rules of happens-before

Eight rule definitions:

Procedure order rule: every action in a thread happens-before any subsequent action in that thread. (The execution result of a piece of code in a thread is ordered)
3. Monitor lock rule: a lock is unlocked, happens-before a lock is later locked. (Lock before unlocking)
Rule for volatile variables: Writes to volatile variables must happen before subsequent reads to volatile variables. (Read and write operations are not reordered, and the result of the write operation must be visible to the reading thread.)
4, If A happens — before B, then B happens — before C, then A happens — before C.
Start () rule: if thread A performs threadb.start () (starts ThreadB), then thread A’s threadb.start () operation happens before any operation in ThreadB.
Join() rule: if thread A performs threadb.join () and returns successfully, any happens-before in ThreadB will return successfully from threadb.join ().
The interrupt rule: a call to the thread interrupted() method detects the interruption before the code in the interrupted thread detects the interruption.
8. Object Finalize rule: The finalization of an object (the end of constructor execution) precedes the start of the Finalize () method that generates it.

The JMM has as few constraints on the compiler and processor as possible. As you can see from the above analysis, the JMM is following a basic principle: the compiler and processor can be optimized as long as the execution results of the program (i.e., single-threaded programs and properly synchronized multithreaded programs) are not changed. For example, if the compiler, after careful analysis, determines that a lock can only be accessed by a single thread, that lock can be eliminated. For example, if the compiler determines, after careful analysis, that a volatile variable can only be accessed by a single thread, the compiler can treat that volatile variable as a normal variable. These optimizations will not change the execution result of the program, but also improve the execution efficiency of the program.

3. Happens-before and JMM

A happens-before rule corresponds to one or more compiler and handler reordering rules. For Java programmers, the happens-before rule is straightforward.

conclusion

During this period of time, I also read a lot of articles and videos in order to write this article, and many articles are in conflict with each other. I cannot guarantee the correctness of many points, so I can only read the JSR133 document by myself. After reading it, I recommend you not to read ORZ. Learning is like rowing upstream. If you don’t advance, you’ll fall back. If you don’t have the courage to jump out of your comfort zone (including learning something really hard), then learning is over.

Learn more about volatile, memory barriers, and happens-before rules

1. Ensure visibility of variables

Second, to prevent rearrangement of instructions

1, define,

2, the principle of

Memory barrier

1, define,

2, the principle of

3, as-if-serial semantics

Happens-before rule

1, define,

8 Rules of happens-before

3. Happens-before and JMM

conclusion

Related Posts

Introduction to Dubbo and Installation of ZooKeeper

If you don’t understand these 11, don’t say you’re proficient in Python decorators

Mysql optimization