Click “like” to see, form a habit, the public account search [dime technology] pay attention to more original technical articles. This article has been included in GitHub org_Hejianhui /JavaStudy.

preface

  • Concurrent programming begins with a holistic understanding of the underlying workings of an operating system

In the last article we looked at the hardware and operating system aspects of concurrent programming from the bottom up. In this article we will continue to look at the JMM model and the essential interview questions about the Volatile keyword.

What is the JMM model?

The Java Memory Model (JMM) is an abstract, nonreal concept that describes a set of rules or specifications that define how variables in a program (including instance fields, static fields, and the elements that make up array objects) can be accessed. The JVM’s running entity is a thread, and when each thread is created, the JVM creates a working memory (sometimes called stack space) for it to store data that is private to the thread. The Java memory model dictates that all variables are stored in the main memory, which is a shared memory area accessible to all threads. But threads (reading assignment, etc.) to the operation of the variables must be conducted in the working memory, the first thing to variable from main memory to take an examination of to increase the working memory space, and operation, the variable operation to complete before you write variables back to main memory, cannot be directly operating variables in main memory, working memory to store the variables in the main memory copy copy, Working memory is the private data area of each thread, so different threads cannot access each other’s working memory. Communication between threads (passing values) must be done through main memory.

The JMM is different from the JVM memory region pattern

The partitioning of memory regions in the JMM and THE JVM is a different conceptual hierarchy. More specifically, the JMM describes a set of rules that govern how variables are accessed in shared and private data regions. The JMM revolves around atomicity, order, and visibility. The only similarity between the JMM and the Java memory region is the shared data region and the private data region. In the JMM, the main memory belongs to the shared data region, which to some extent should include the heap and method region, while the working memory data thread private data region, At some point this should include program counters, virtual machine stacks, and local method stacks.

Thread, working memory, main memory working interaction diagram (based on JMM specification) as follows:

Main memory

The main store is Java instance objects. All thread-created instance objects are stored in main memory, whether they are member variables or local variables (also known as local variables) in methods, as well as shared class information, constants, and static variables. Since the data area is shared, multiple threads accessing the same variable can cause thread-safety issues.

The working memory

Currently main storage methods of all local variable information stored in the main memory (working memory copy of the variables in the copy), each thread can only access to their working memory, which the local variables of the thread is not visible to other threads, even two threads execute the same code, they will be in their respective working memory to create belong to the local variables of the current thread, It also includes bytecode line number indicators and information about Native methods. Note that since working memory is the private data of each thread, threads cannot access working memory with each other, so data stored in working memory is not thread-safe.

Member methods in an instance object that include local variables are primitive data types (Boolean, type, short, char, int, long, float, double). Will be stored directly in the frame stack of working memory, while object instances will be stored in main memory (shared data area, heap). However, a member variable of an instance object is stored in the heap, regardless of whether it is a primitive or wrapper type (Integer, Double, etc.) or a reference type. Static variables and information about the class itself will be stored in main memory.

It is important to note that instance objects in main memory can be shared by multiple threads. If two threads call the same method of the same object at the same time, then both threads will make a copy of the data to the immediate working memory, and perform the operation later before flushing to main memory. The model is shown in the figure below:

The relationship between Java memory model and hardware memory architecture

From the previous hardware memory architecture, The Java memory model, and the implementation of Java multithreading, we should realize that the execution of multithreading eventually maps to the hardware processor, but the Java memory model and the hardware memory architecture are not exactly the same. Only for hardware memory registers, the concept of cache memory, primary memory, not the working memory (thread the private data area) and main memory (heap memory), that means the division of the memory of the Java memory model for hardware memory and does not have any effect, because the JMM is an abstract concept, is a set of rules that does not actually exist, Both the working memory data and the main memory data are stored in the main memory of the computer hardware. Of course, it may also be stored in the CPU cache or registers. Therefore, in general, the Java memory model and the memory architecture of the computer hardware are interrelated. Is a kind of abstract concept division and real physical hardware intersection. (Note that the same is true for Java memory partitioning.)

The need for the JMM

After understanding the Java memory region division, hardware memory architecture, the implementation principle of Java multithreading and the specific relationship between the Java memory model, then talk about the necessity of the Existence of the Java memory model.

Because the JVM to run the program of the entity is a thread, and every thread creation when the JVM to create a working memory (called the stack space) in some places, and threads used to store the private data, threads and the main memory, the variables in the operation should be done indirectly by working memory, is the main process variables from the main memory each thread its own copy of the working memory space, After the operation is complete, the variable is written back to main memory. If two threads operate on the variable of an instance object in main memory at the same time, thread-safety problems may occur.

Suppose there is A shared variable x in main memory. Now there are two threads, A and B, respectively operating on the variable x=1. A/B has A copy of the shared variable X in their respective working memory. Suppose thread A wants to change the value of x to 2, and thread B wants to read the value of x. Does thread B read the value of 2 that thread A updated or the value of 1 that updated the money?

The answer is: not sure. Namely B thread may read A thread to update the value of 1 of money, is also likely to read A thread on the updated value 2, this is because the working memory is each thread of the private data area, and thread A variable x, and operation is the first variable copy from main memory to A thread’s working memory, and then to operating variables, Write variable X back to main memory after the operation is complete. The same is true for thread B, which may cause data consistency between main memory and working memory. Suppose that thread B reads x=1 from the working memory, but if thread A writes x=2 back to main memory, thread B starts to read. So what thread B reads is x=2, but what happens when it gets there?

Case as shown below:The Java memory model defines the following eight operations to implement the detailed protocol for direct interaction between main and working memory, that is, how a variable is copied from main memory to working memory and synchronized from working memory to main memory.

Data synchronization eight atom operation

  1. Lock: a variable that acts on main memory, marking a variable as a thread-exclusive state;
  2. Unlock: A variable that operates on main memory. It releases a locked variable so that it can be locked by another thread.
  3. Read: a variable acting on main memory that transfers a variable value from main memory to the thread’s working memory for use by subsequent loads;
  4. Load: a variable operating on working memory that places the value of the variable from main memory in the read operation;
  5. Use: variable applied to working memory, passing the value of a variable in working memory to the execution engine;
  6. Assign: a working memory variable that assigns a value received from the execution engine to the working memory variable;
  7. Store: a variable applied to working memory that transfers the value of a variable in working memory to main memory for subsequent write operations;
  8. Wirte (write) : variable operating on working memory that transfers store operations from a variable value in working memory to a variable in main memory.
  • If you want to copy a variable from main memory to working memory, read and load operations need to be performed sequentially.
  • If variables are synchronized from working memory to main memory, store and write operations need to be performed sequentially.

But the Java memory model only requires that these operations be performed sequentially, not sequentially.

Synchronization Rule Analysis

  1. It is not allowed for a thread to synchronize data from working memory back to main memory without any assign operation.
  2. A new variable can only be created in main memory. It is not allowed to use an uninitialized (load or assign) variable in working memory. Before implementing use and store operations on a variable, assign and load operations must be performed.
  3. A variable can be locked by only one thread at a time. However, the lock operation cannot be repeated by the same thread for multiple times. After multiple lock operations are performed, the variable is unlocked only when unlock operations are performed for the same number of times. Lock and unlock must come in pairs;
  4. If you lock a variable, the value of the variable will be emptied from the working memory. You need to load or assign the variable again before the execution engine can use it.
  5. Unlock cannot be performed on a variable that has not been previously locked by a lock operation. It is also not allowed to unlock a variable that has been locked by another thread;
  6. Before an unlock operation can be performed on a variable, the variable must be synchronized to main memory (store and write operations).

Visibility, atomicity and orderliness of concurrent programming

atomic

Atomicity refers to the fact that an operation cannot be interrupted, even in a multi-threaded environment, so that once an operation is started, it cannot be affected by other threads.

In Java, reads and assignments to variables of primitive data types are atomic operations. It is important to note that for 32-bit systems, long data and double data (for primitive data: Byte, short, int, float, Boolean, char are atomic operations), they are not atomic, that is, if there are two threads reading and writing long or double at the same time, it will interfere with each other, because for 32-bit virtual machines, Each atomic read/write operation is 32 bits, while long and double are 64 bits of storage units. As a result, when a thread writes, after completing the first 32 bits of atomic operation, when it is the turn of B thread to read, it only reads the last 32 bits of data. In this way, it may read back a variable that is neither the original value nor the value modified by the thread. It may be the value of a “half variable,” where 64-bit data is read twice by two threads. But don’t worry too much, because reading “half a variable” is rare, at least in today’s commercial virtual machines, almost all 64-bit data reads and writes are performed as atomic operations, so don’t worry too much about this problem, just know what’s going on.

X=10; // Atomicity (simply read and assign numbers to variables)
Y = x; // Assign values to each other between variables, not atomic operations
X++; // Perform calculations on variables
X=x+1;
Copy the code

visibility

Visibility is easy to understand once you understand the instruction rearrangement phenomenon. Visibility refers to whether when one thread changes the value of a shared variable, other threads can immediately know the changed value. For serial programs, visibility is nonexistent because we change the value of a variable in any operation, and subsequent operations can read the variable with the new value changed.

However, in A multi-threaded environment, this is not necessarily the case. As we have analyzed previously, since the operation of A thread on A shared variable is copied to its working memory and then written back to the main memory, there may be A thread A that has modified the value of the shared variable X and has not written back to the main memory. Another thread B in main memory of the same Shared variables x, but this time A thread working memory Shared variables x is not visible to thread B, this kind of working memory and the main memory synchronization delays will cause the visibility problem, another instruction rearrangement and compiler optimizations may also caused the visibility problem, through the analysis of the front, We know that reordering of both compiler optimizations and processor optimizations can lead to out-of-order execution and thus visibility problems in multithreaded environments.

order

Order is for a single thread of execution code, we always think that is in order to perform code execution, there is nothing wrong with this understanding and, more for single-threaded indeed, but for a multithreaded environment, out-of-order phenomena may occur, because the program compiled according to the phenomenon of machine code instructions, might appear after rearrangement, After rearrangement of instructions with the original order is not consistent, want to know is that in a Java program, if in this thread, all operations as an orderly behavior, if it is a multithreaded environment, observed in one thread to another thread, all operations are disorderly, within the first half of the sentence refers to a single thread to ensure serial semantic consistency, The last half sentence is instruction rearrangement and synchronization delay between working memory and main memory.

How does the JMM address atomicity, visibility, and orderliness

Atomicity problem

In addition to the atomicity provided by the JVM itself for read and write operations on basic data types, atomicity can be achieved through synchronized and Lock. Synchronized and Lock ensure that only one thread accesses the code block at any one time.

Visibility problem

The volatile keyword guarantees visibility. When a shared variable is decorated with the volatile keyword, it guarantees that the changed value is immediately seen by other threads. That is, the value is immediately updated to main memory, and when other threads need to read it, it will read the new value from memory. Synchronized and Lock also guarantee visibility because they guarantee that only one thread can access a shared resource at any one time and flush modified variables into memory before it releases the Lock.

Order problem

In Java, you can use the volatile keyword to ensure some “order.” Synchronized and Lock ensure that only one thread executes the synchronized code at any one time. Synchronization and Lock ensure that only one thread executes the synchronized code sequentially.

Java memory model

Each thread has its own working memory, and all operations on variables by the thread must be done in the working memory, not directly in the main memory. And each thread cannot access the working memory of other threads. The Java memory model has some innate “orderliness,” that is, orderliness that can be guaranteed without any means, which is often referred to as the happens-before principle. If the order of two operations cannot be deduced from the happens-before principle, they are not guaranteed to be ordered, and the virtual machine can reorder them at will.

Instruction reordering

The Java language specification specifies that SEQUENTIAL semantics are maintained within JVM threads. That is, as long as the final result of the program is equal to the result of its sequentialization, the order of instruction execution can be inconsistent with the order of code, which is called instruction reordering.

What is the significance of instruction reorder? The JVM can reorder machine instructions appropriately according to processing characteristics (CPU multi-level cache, multi-core processor, etc.) to make machine instructions more consistent with THE EXECUTION characteristics of THE CPU, and maximize the performance of the machine.

Below is a schematic diagram of the sequence of instructions from source to final execution:

The as – if – serial semantics

The as-if-serial semantics mean that the execution result of a (single-threaded) program cannot be changed, no matter how much reordering is done (to improve parallelism by the compiler and processor). The compiler, runtime, and processor must comply with the AS-IF-Serial semantics.

To comply with the as-if-serial semantics, the compiler and processor do not reorder operations that have data dependencies because such reordering changes the execution result. However, if there are no data dependencies between the operations, they can be reordered by the compiler and processor.

Happens-before principle

Writing concurrent programs that rely solely on synchronized and volatile keywords for atomicity, visibility, and order can seem cumbersome. Fortunately, starting with JDK 5, Java uses the new JSR-133 memory model, The happens-before principle is provided to assist in ensuring atomicity, visibility, and orderliness of program execution. It is a judgment that data is very competitive and thread is very safe. The happens-before principle reads as follows:

  1. The principle of program sequence, that is, semantic serialization must be guaranteed within a thread, that is, code is executed in order.
  2. Lock rules: An unlock operation must take place before a lock is added to the same lock. That is, if a lock is added after a lock is unlocked, the action to add the lock must follow the action to unlock the same lock.
  3. The rule that writes to volatile variables occur first on reads ensures that volatile variables are visible. Simply put, each time a thread accesses a volatile variable, it forces the value of the variable to be read from main memory. When the variable changes, it forces the most recent value to be flushed to main memory. At any time, different threads can always see the latest value of the variable.
  4. Thread start rule, A thread’s start() method precedes each of its actions, that is, if thread A modifies the value of A shared variable before executing thread B’s start method, then thread A’s changes to the shared variable are visible to thread B when thread B executes the start method.
  5. Transitivity, A precedes B, B precedes C, so A must precede C.
  6. The Thread termination principle states that all operations on a Thread precede the termination of the Thread. The purpose of thread.join () is to wait for the termination of the currently executing Thread. Assuming that the shared variable is modified before thread B terminates, thread A returns successfully from thread B’s join method, and thread B’s changes to the shared variable will be visible to thread A.
  7. Interrupt rule: A call to the interrupt() method occurs only after the interrupted Thread’s code has checked for an interrupt event. This method can be used to detect whether a Thread is interrupted or interrupted.
  8. Object finalization rule, object constructor execution, finalization before Finalize () method.

Finalize () is a method in Object, which will be called before the garbage collector collects the memory of the Object. When an Object is declared dead by a virtual machine, finalize() will be called first, and then the Object can deal with the last things before it is dead (the Object can get rid of the fate of death at this time).

Volatile memory semantics

Volatile is a lightweight synchronization mechanism provided by the Java Virtual machine. The volatile keyword does two things:

  1. Ensure that volatile shared variables are always visible to all threads, meaning that when one thread changes the value of a volatile shared variable, the new value is immediately known to the other threads.
  2. Tense instruction reordering optimization.

Visibility of volatile

For the visibility of volatile, we must mean that volatile variables are always immediately visible to all threads, and that writes to volatile variables are always immediately visible to other threads.

Example: Thread A changes the initFlag property and thread B senses it immediately

package com.niuh.jmm;

import lombok.extern.slf4j.Slf4j;

/ * * *@description: -server -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,*Jmm03_CodeVisibility.refresh
 * -Djava.compiler=NONE
 **/
@Slf4j
public class Jmm03_CodeVisibility {

    private static boolean initFlag = false;

    private volatile static int counter = 0;

    public static void refresh(a) {
        log.info("refresh data.......");
        initFlag = true;
        log.info("refresh data success.......");
    }

    public static void main(String[] args) {
        / / thread A
        Thread threadA = new Thread(() -> {
            while(! initFlag) {//System.out.println("runing");
                counter++;
            }
            log.info("Thread:" + Thread.currentThread().getName()
                    + "Current thread sniffs state change of initFlag");
        }, "threadA");
        threadA.start();

        // Middle sleep 500hs
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        / / thread B
        Thread threadB = new Thread(() -> {
            refresh();
        }, "threadB"); threadB.start(); }}Copy the code

Combined with the eight atom operation of data synchronization introduced above, let’s analyze it:

After thread A starts:

  • Step 1: Perform read,Applied to main memoryThe variableinitFlagMake a copy from main memory, not in working memory, but in the bus. The following figure
  • Step 2: Perform load operation on the working memory and place the variables copied in the previous step into the working memory.
  • Step 3: Perform the use operationApplied to working memory, passing variables in working memory to the execution engine, which, in the case of thread A, determinesinitFlag = true? It’s not equal to, the cycle keeps going

The execution process is shown as follows:

After thread B starts:

  • Step 1: Perform read,Applied to main memory, copied from main memoryinitFlagVariables, which are not in working memory yet, so this is in preparation for load;
  • Step 2: Perform load operation on the working memory and place the copied variables into the working memory.
  • Step 3: UseApplied to working memory, passes the variable of working memory to the execution engine, which determineswhile(! initFlag), then execute the loop body;
  • Step 4: AssignApplied to working memory, assigns the value received from the execution engine to a variable in working memory, that is, settinginifFlag = true ;
  • Step 5: Execute the Store operation,Applied to working memory, will work in memory variablesinitFlag = truePass to main memory;
  • Step 6: Perform write operation on working memory to write variables to main memory.

Volatile does not guarantee atomicity

/ / sample
public class VolatileVisibility {
    public static volatile int i =0;
    public static void increase(a){ i++; }}Copy the code

In concurrent scenarios, any change to the I variable is immediately reacted to by other threads, but if multiple threads call increase() at the same time, thread-safety issues arise. After all, the I ++ operation is not atomic, so it reads the value and writes back a new value, which is equivalent to adding 1 to the original value. It was done in two parts. If the second thread reads the value of I between the first thread reading the old value and writing back the new value, the second thread will see the same value as the first thread and perform the increment of the same value. This will result in a thread-safe failure, so the synchronized modifier must be used for the increase method. In order to ensure thread safety, it should be noted that once synchronized modification is used, since sunchronized itself has the same property as volatile, namely visibility, volatile modification variables can be completely eliminated in this case.

Example: 10 threads are started, each thread is added to 1000,10 threads for 10000

package com.niuh.jmm;

/** * volatile guarantees visibility, not atomicity */
public class Jmm04_CodeAtomic {

    private volatile static int counter = 0;
    static Object object = new Object();

    public static void main(String[] args) {

        for (int i = 0; i < 10; i++) {
            Thread thread = new Thread(() -> {
                for (int j = 0; j < 1000; j++) {
                    synchronized (object) {
                        counter++;// There are three steps - read, add, write back}}}); thread.start(); }try {
            Thread.sleep(3000);
        } catch(InterruptedException e) { e.printStackTrace(); } System.out.println(counter); }}Copy the code

The actual result is less than 10000 because there are concurrent operations. At this point, if I add the keyword volatile to counter, is atomicity guaranteed?

private volatile static int counter = 0;
Copy the code

We find that it is still not 10000, suggesting that volatile does not guarantee atomicity.

There is only one operation per thread, counter++, why can’t atomicity be guaranteed?

Actually counter++ is not done in one step. He did it in multiple steps. Let’s use the diagram below to illustrateThread A loads variables into working memory via read, load, and sends variables to the execution engine via user, which executes counter++, Thread B starts, loads variables into working memory via read, load, sends variables to the execution engine via user, and then performs the copy operation assign, stroe, write. We see that this is n steps. Although it seems like a simple sentence.

When thread B executes store to send data back to the main memory, thread A will be notified to discard Counter ++. At this time, counter has been added by 1, and the added counter will be discarded, resulting in 1 less total data.

Volatile disallows reordering optimizations

Another function of the volatile keyword is to disable instruction reordering to avoid out-of-order execution in multithreaded environments. We have already analyzed instruction reordering, but we will briefly explain how volatile disables instruction reordering. Let’s start with the concept of Memory barriers.

Memory barriers at the hardware layer

Intel hardware provides a series of memory barriers, mainly including:

  1. Lfence is a Load Barrier;
  2. Sfence is a Store Barrier;
  3. Mfence, is an all-purpose barrier with both lfence and sfence capabilities;
  4. Lock prefix, Lock is not a memory barrier, but it can perform a similar function. Lock locks the CPU bus and cache, which can be understood as a CPU instruction level Lock. It can be followed by ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, AND XCHG AND other commands.

Memory barriers for the JVM

Different hardware implements memory barriers in different ways, and the Java memory model shields the differences between these underlying hardware platforms, leaving the JVM to produce machine code for each platform. There are four classes of memory barrier instructions available in the JVM:

Order sample instructions
Load1; LoadLoad; Load2 Ensure that the read operation of load1 and load2 and subsequent read operations are executed before
Store1; StoreStore; Store2 Before performing write operations on Store2 and store1, ensure that the write operations on store1 have been flushed to the main memory
Load1; LoadStore; Store2 Ensure that the read operation of load1 is complete before store2 and subsequent write operations
Store1; StoreLoad; Load2 Load2 and subsequent reads cannot be executed until writes to store1 have been flushed to main memory

A memory barrier, also known as a memory barrier, is a CPU instruction that serves two purposes:

  1. One is to ensure the execution sequence of specific operations;
  2. The second is to ensure memory visibility for certain variables (using this feature to achieve memory visibility for volatile).

Because both the compiler and the processor can perform instruction rearrangement optimization. If you insert a Memory Barrier between instructions, the compiler and CPU will be so fast that no instructions can be reordered with that Barrier. This means that inserting a Barrier prevents reordering optimizations for instructions before and after the Barrier.

Another effect of a Memory Barrier is to force the cache of various cpus to be flushed out, so that any thread on the CPU can read the latest version of the data.

In short, it is through memory barriers that volatile variables implement their in-memory semantics, namely visibility and prohibition of reordering optimizations.

Let’s look at a very typical example of DCL that disables rearrangement optimization as follows:

public class DoubleCheckLock {
    private volatile static DoubleCheckLock instance;
    private DoubleCheckLock(a){}
    public static DoubleCheckLock getInstance(a){
        // First check
        if (instance==null) {/ / synchronize
            synchronized (DoubleCheckLock.class){
                if (instance == null) {// Where problems can occur in multithreaded environments
                    instance = newDoubleCheckLock(); }}}returninstance; }}Copy the code

The above code is a classic singleton of double detection code, this code in the single-threaded environment and no problem, but if the multithreaded environment may appear thread safety issues. This is because the instance reference object may not have been initialized until a thread performs the first check and reads instance without null.

For more information about singleton patterns, see Design Pattern Series-Singleton Patterns.

Because instance = new DoubleCheckLock(); This can be done in the following 3 steps (pseudocode)

memory = allocate(); // 1. Allocate object memory space
instance(memory); // 2. Initialize the object
instance = memory; // 3. Set instance to the newly allocated memory address. = null
Copy the code

Since there may be a reorder between step 1 and Step 2, as follows:

memory=allocate();//1. Allocate object memory space
instance=memory;//3. Set instance to the newly allocated memory address. =null, but the object has not been initialized yet!
instance(memory);//2. Initialize the object
Copy the code

This reordering optimization is allowed because steps 2 and 3 have no data dependencies and the pointing result of the program does not change in a single thread, either before or after the reordering. However, instruction reordering only guarantees consistency of sequential semantic execution (single thread), but does not care about consistency of semantics across multiple threads. Therefore, when a thread accesses instance without null, the instance instance may not have been initialized, thus causing thread-safety problems. The solution is simply to use volatile to prevent instance variables from being reordered and optimized by execution instructions.

// Disable instruction reordering optimization
private volatile static DoubleCheckLock instance;
Copy the code

Implementation of volatile memory semantics

As mentioned earlier, reordering is divided into compiler reordering and processor reordering. To implement volatile memory semantics, the JMM restricts both types of reorder separately.

Here is a list of volatile reordering rules for compilers by the JMM.

The first operation Second operation: normal read and write The second operation: volatile read The second operation: volatile write
General speaking, reading and writing You can rearrange You can rearrange It cannot be rearranged
Volatile read It cannot be rearranged It cannot be rearranged It cannot be rearranged
Volatile write You can rearrange It cannot be rearranged It cannot be rearranged

For example, the last cell on the second line means that in a program, if the first operation is a read or write to a common variable and the second operation is a volatile write, the compiler cannot reorder the two operations.

As can be seen from the above picture:

  • When the second operation is a volatile write, there is no reordering, regardless of the first operation. This rule ensures that operations prior to volatile writes will not be reordered by the compiler after volatile writes.
  • When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered. This rule ensures that operations after volatile reads are not reordered by the compiler before volatie reads.
  • When the first operation is volatile write and the second operation is volatile read or write, reordering cannot be performed.

To implement the memory semantics of volatile, compilations insert memory barriers into instruction sequences to prevent reordering of a particular type of handler when the bytecode is generated. It is almost impossible for the compiler to find an optimal arrangement that minimizes the total number of insertion barriers. To this end, the JMM takes a conservative approach. The following is a JMM memory barrier insertion strategy based on a conservative policy.

  • Insert a StoreStore barrier before each volatile write;
  • Insert a StoreLoad barrier after each volatile write;
  • Insert a LoadLoad barrier after each volatile read;
  • Insert a LoadStore barrier after each volatile read;

The above memory barrier insertion strategy is conservative, but it ensures that volatile memory semantics are correct in any program on any processor platform.

Now, under the conservative strategy,Volatile write insertSchematic diagram of instruction sequence generated after memory barrierThe StoreStore barrier above ensures that all normal writes preceding volatile writes are visible to any processor before volatile writes. This is because the StoreStore barrier ensures that all of the normal writes above are flushed to main memory before volatile writes.

What’s interesting here is that volatile writes the StoreLoad barrier behind it. The purpose of this barrier is to prevent reordering of volatile writes from potentially volatile read/write operations. This is because the compiler is often unable to accurately determine the need to insert a StoreLoad barrier after a volatile write (i.e., a method return immediately after a volatile write). To ensure that the memory semantics of volatile are properly implemented, the JMM takes a conservative approach of inserting a StoreLoad barrier after each volatile write or in front of each volatile read. In the interest of overall execution efficiency, the JMM ultimately chose to insert a StoreLoad barrier at the end of each volatile write, because the common usage pattern for volatile write-read memory semantics is: One writer thread writes to a volatile variable, and multiple threads read the same volatile variable. Choosing to insert the StoreLoad barrier after volatile writes can provide significant performance gains when the number of reader threads greatly exceeds the number of writer threads. This is one of the implementation characteristics of the JMM: ensure correctness first, and then pursue execution efficiency.

The following figure shows that under the conservative strategy,Volatile read insertSchematic diagram of instruction sequence generated after memory barrierIn the figure above, the LoadLoad barrier prevents the processor from reordering volatile reads from the normal reads below. The LoadStore barrier prevents the processor from reordering volatile reads above from normal writes below.

The above memory barrier insertion strategy for volatile writes and volatile reads is very conservative. During actual execution, the compiler can omit unnecessary barriers as long as the memory semantics of volatile write-read are not changed.

The following example code is used to illustrate.

class VolatileBarrierExample {
       int a;
       volatile int v1 = 1;
       volatile int v2 = 2;
       void readAndWrite(a) {
           inti = v1;// The first volatile read
           intj = v2;// The second volatile read
           a = i + j;         / / write ordinary
           v1 = i + 1;// The first volatile write
           v2 = j * 2;// The second volatile write}}Copy the code

forreadAndWrite()Method, the compiler can make the following optimizations when generating bytecode.Note that the final StoreLoad barrier cannot be omitted. Because the method returns immediately after the second volatile write. At this point, the compiler may not be able to accurately determine whether volatile reads or writes will occur in the next ten days. To be safe, the compiler usually inserts a StoreLoad barrier here.

The optimization above is for any processor platform, and since different processors have different “tightness” processor memory models, the insertion of memory barriers can be further optimized depending on the specific processor memory model. For example, after X86 processing, all barriers in the figure above are omitted except for the last StoreLoad barrier.

Volatile reads and writes under the previous conservative policy can be optimized for X86 processors as shown in the figure below. X86 processors only reorder read-write operations. X86 does not resort read-read, read-write, and write-write, so the memory barrier for these three types of operations is omitted in the X86 processor. In X86, the JMM can correctly implement the memory semantics of volatile write-read simply by inserting a StoreLoad barrier after volatile writes, which means that in X86 processors, Volatile writes are much more expensive than volatile reads (because of the barrier cost of executing StoreLoad).

The resources

  • The Art of Concurrent Programming

PS: The above code is submitted to Github: github.com/Niuh-Study/…

GitHub Org_Hejianhui /JavaStudy GitHub Hejianhui /JavaStudy