preface

The singleton design pattern for writing a double-checked lock could be solved in a minute, but the problem is why I need to use volatile in front of a class variable. It is mainly used to solve two problems, namely, variable memory visibility and instruction reordering prohibition. Why it’s so powerful, two of the three elements of concurrent programming besides atomicity, is that a little volatile can tell you more than you might think.

Memory visibility versus JMM

// Program Listing 1.1
public class Test {
    // Notice that I did not use volatile here
    private static  boolean tag = false;

    public static void main(String[] args) {

        new Thread(()->{
            System.out.println("Child thread begin");

            while(! tag){ } System.out.println("Child thread end");
        }).start();
        
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        tag = true; }}Copy the code

So let’s look at the program listing above, I’ve defined a tag that defaults to false, I’ve started a child thread and if the tag is false then the end of the child thread will never be printed, and in the main thread the tag is set to true, Run this code and you’ll find that no matter how many times you run “child thread end” it will never print, and the reason for that is because in Java there is a concept of the Java memory model called JMM.

  • The main goal of the Java memory model is to define the access rules for various variables in the program, the low-level details of how the virtual machine takes variables out of main memory and stores them in memory.

  • The JMM states that all variables are stored in main memory, and that each thread has its own working memory, which holds a copy of the main memory variables used by the thread.

  • All operations of a thread on variables must be handled in the working memory of the thread. Variables in the main memory cannot be read directly, and variables in the working memory of different threads cannot be accessed. Data transfer between threads needs to be completed in the main memory.

  • Variables here do not contain local variables or method parameters, because they are themselves private to the thread and do not share, so there is no race problem.

Because of the visibility problem of the Java memory model, in the above program list, although the variable tag value is changed to true in the main thread, it may not be synchronized to the main memory in time. Even if it is synchronized to the main memory, the working memory of thread may not get the latest data copy in the main memory in time. So thread is copied with the value false as before, and the expected “child thread end” is not printed.

Eight atomic operations in the JMM

Fetching and reading data from main memory can be broken down into eight atomic operations in the JMM model:

  • Read: Reads data from main memory
  • Load: Writes data read from main memory to working memory
  • Use: Reads data from working memory for computation
  • Assign: Reassigns the calculated value to the working memory
  • Store: Writes data from working memory to main memory
  • Write: Assigns values of stored data variables to shared variables in main memory
  • Lock: Locks the main memory variable
  • Unlock: Unlocks the main memory variable

Memory visibility problem resolved

Using the volatile keyword solves the memory visibility problem in Listing 1.1 by adding the volatile modifier to the tag variable. Volatile variables have two properties:

  • Guarantee this variable is visible to all threads, this refers to when one thread to modify the value of this variable so the change is immediately visible to other threads, such as thread A modified the value of A variable and then write new value back to the main memory, thread B back in the thread A new value to the main memory in from main memory to read operations, A new variable values to be visible to thread B.

  • Ban order rearrangement optimization, Java code you wrote will not necessarily in the order you write operation, the JVM will, according to a series of complicated algorithm to optimize the order of the code you wrote, if there is no dependency between two lines of code so that the two lines of code could be reorder “instructions”, that is, the order of the two lines of code is upside down, There are no problems with reordering in single-threaded mode, but strange problems often arise in concurrent scenarios, and it is necessary to avoid reordering in scenarios such as the singleton design pattern for double check locks.

So if you solve the problem and the memory is visible, do you get the correct value?

// Program Listing 1.2
public class Test {

    private static volatile int sum = 0;

    public static void main(String[] args) {

        for (int i = 0; i<20; i++){ Thread myThread =new Thread(()->{
                for (int j =0; j<1000;j++){
                    accumulation();
                }
            });
            myThread.start();
        }

        while (Thread.activeCount()>1){
            Thread.yield();
        }

        System.out.println(sum+"");
    }

    static void accumulation(a){ sum++; }}Copy the code

If you look at the code above, I have used volatile to modify sum. I have solved the memory visibility problem by ensuring that sum is changed in one thread and is available to all other threads, so the final value should be 20000 as written in my code. It turns out that there are a lot of different things going on, 20,000 and 18951 or whatever. Why? In fact, the Java code we write will eventually be compiled into a Class file, and the Java program will eventually execute its bytecode instructions. Using Jclasslib can help us look at the bytecode. Here we look at the bytecode of the Accumulation function, as shown in Listing 1.3

// Program Listing 1.3
0 getstatic #11 <Test.sum>
3 iconst_1
4 iadd
5 putstatic #11 <Test.sum>
8 return
Copy the code

The getStatic command takes sum to the top of the operand stack. Volatile ensures that sum is correct at this step. However, when executing iconst_1 and iADD instructions, i.e., pushing and adding, other threads may have executed these two instructions first, increasing the sum value, thus causing the final value inconsistent with our expected value. As you can see from this example, volatile does solve the memory visibility problem but does not mean that the variables it modiates are thread-safe.

Why volatile is necessary for the singleton design pattern for double checklock

/ * * *@author by tengfei on 2021/11/16.
 * @descriptionListing 1.4 */ shows a singleton design pattern program for double checklock
public class Singleton {

    private static volatile Singleton singleton;

    private  Singleton(a){}public static Singleton getSingleton(a){
        if (singleton == null) {synchronized (Singleton.class){
                if (singleton == null){
                    singleton = newSingleton(); }}}returnsingleton; }}Copy the code

The purpose of synchronized locking is to avoid the repeated creation of Singleton objects in the case of multiple threads. The problem is that if you don’t volatile, the new operation will be “semi-initialized”. In short, your initialization is buggy and fails. You can understand why it is semi-initialized from a bytecode perspective.

// Program Listing 1.5
10 monitorenter
11 getstatic #2 <Singleton.singleton>
14 ifnonnull 27 (+13)
17 new #3 <Singleton>
20 dup
21 invokespecial #4 <Singleton.<init>>
24 putstatic #2 <Singleton.singleton>
27 aload_0
28 monitorexit
Copy the code

The core lies in instructions 17, 20, 21 and 24

  1. The new directive creates a Singleton object by using the #3 symbol reference to find the Singleton Class in the constant pool, but at this point the object is just a shell and does not execute the constructor function.

  2. Run the DUP command to allocate memory space.

  3. Executing invokespecial calls the init method, which executes the constructor.

  4. Putstatic assigns the singleton variable

The process of creating an object by performing the new operation is basically the following four steps. The problem is in the third and fourth steps. You can see that the two instructions, Invokespecial and putStatic, both deal with symbolic references and do not have direct dependencies. Therefore, there is the possibility of reordering instructions. Step 4 is executed first and then step 3 is executed in the order of 1, 2, 4, and 3. When step 4 is executed before step 3 is executed, another thread calls the getSingleton method. Therefore, if init has not yet been executed, the object will assume that some code defined in the constructor has not yet been executed. Therefore, null Pointers will occur. To prevent this from happening, volatile will be used to disable instruction reordering.

JMM controls the bus and bus sniffing

First let’s look at what a control bus is. A control bus uses binary signals to synchronize the behavior of all the components connected to the system bus.

The CPU speed of modern computers is extremely high, and the operation of taking data from the main memory often fails to keep up with the CPU speed, resulting in a great waste of CPU resources. In order to avoid this waste, a cache area is added between the main memory and CPU that can keep up with the CPU speed, and the data in the memory will be copied to the cache area. The CPU directly deals with the cache, which can undoubtedly improve the EFFICIENCY of the CPU. However, in the case of multiple cpus, there will be the problem of cache consistency. When the computing tasks of multiple cpus involve the same memory area, it may lead to the inconsistency of their cache data. To avoid this, you need to define a “cache consistency protocol”.

The processor cache consistency agreement at the time of reading and writing data to follow some of the specification, when the CPU writing data if found a copy of the other CPU is also included in the data copy then will notice other CPU this data has been out of date is invalid, other CPU will have the data set to null and void, They re-read the data from memory when they need it.

Each processor by monitoring the spread of the data on the bus to check their cache value is expired, if the processor found himself cache line corresponding to the memory address modification, will set the current processor cache line invalid state, when the processor to modify the data operation, will be to read the data from main memory to the processor cache. (References from other articles and references in resources at the end of this article)

Atomicity of vloatile

Atomicity means that your operations can be done at once without interference. As demonstrated in the previous singleton design pattern example, volatile is not atomic. It simply helps us implement memory visibility and forbid instruction reordering. In listing 1.2, I used sum++ to verify that volatile is not thread-safe if your steps are disturbed in the JVM, but that atomic operations such as listing 1.1 are thread-safe if they simply read or write volatile variables. So it’s safe to simply assign volatile variables, but do compound operations such as i++ or flag=! The operation of Flag does not ensure atomicity and thread-safety issues naturally arise.

Instruction reordering problem

Under the as-if-serial rules (the result of a single-threaded program execution is immutable no matter how reordered), the compiler and processor reorder instructions to improve the efficiency of the program execution, but in the case of concurrent ambiguity, different execution logic can result in different results.

// Program code 2-1
public  void test(a){
    int a = 1;
    int b = 8;
    System.out.println(a+b); 
}

/ / 2-2
public void test2(a) {
    
    int a = 0;

    int b = a;
}

Copy the code

As shown in code 2-1, variable A and variable B are only the operations of assigning values and adding outputs at the end, so logically there is no dependency between them. In this case, instructions may be reordered to improve execution efficiency. In 2-2, there is a dependency between a and B. The value of B differs from the value of A, so there will be no rearrangement between 2 and 2. A simple generalization is that there will be no rearrangement if there is a dependency between the two instructions. The following table summarizes the circumstances in which instruction rearrangement will not occur.

The name of the Code sample instructions
Writing after reading a=1; b=a Write a variable and read it
Write after a=1; a=2 Write a variable and then write that variable
Got to write a=b; b=1 Write a variable after reading it

happens-before

You can use a happens-before relationship to describe the order in which two operations are executed, since they can be within one thread or between different threads. Therefore, the JMM can provide programmers with A guarantee of memory visibility across threads through the happens-before relationship. If A happens before B, then A can be assumed to be visible to B. In A nutshell, happens-before describes A rule for visibility between programs. The specific scenarios are as follows:

1. Program order rule: the execution result of a piece of code in a thread is ordered. It will rearrange the instructions, but whatever it does, the results will be generated in the same order as our code.

2. Pipe lock rule: no matter in single-thread or multi-thread environment, for the same lock, after one thread unlocks the lock, another thread can see the operation result of the previous thread! (A pipe is a generic synchronization primitive, and synchronized is its implementation)

The rule for volatile variables is that if a thread writes a volatile variable and then reads it, the result of the write operation is visible to the thread that reads it.

4. Thread start rule: During the execution of main thread A, child thread B is started, so the modification result of shared variable by thread A before starting child thread B is visible to thread B.

5. Thread termination rule: During the execution of the main thread A, the child thread B terminates, so the modification result of the shared variable made by thread B before termination is visible in thread A. Also called the thread Join () rule.

6. Thread interrupt rule: A call to the interrupt() method occurs when the interrupted Thread code detects that an interrupt has occurred. This can be detected by thread.interrupted ().

7. The transitivity rule: The simple happens-before principle is transitive, hb(A, B), HB (B, C), then HB (A, C).

Object finalization rule: This is also simple, the completion of an object initialization, that is, the completion of the constructor must happens-before its Finalize () method.

How volatile is implemented (memory barriers)

If you’ve read the JVM source code (which I haven’t), the HotSpot source code does a special treatment for volatile. If you find that the current variable is volatile, the assembly instruction is preceded by a lock instruction that generates a memory barrier during execution. The lock directive is key to ensuring that volatile implements visibility and disallows instruction reordering.

1. The memory barrier prevents instruction reordering, that is, it does not reorder previous instructions behind the barrier or place subsequent instructions in front of the barrier.

2. The lock command also ensures that the current row is immediately written back to system memory. This operation invalidates the data cached by other CPUS. If it is found to be out of date it is forced to re-read data from system memory to the processor cache.

The Java memory model divides memory barriers into the following types:

Barrier type Order sample instructions
LoadLoad Barriers Load1; LoadLoad; Load2 This barrier ensures that Load1 data is loaded before Load2 and all subsequent load instructions
StoreStore Barriers Store1; StoreStore; Store2 This barrier ensures that Store1 immediately flushers data to memory (making it visible to other processors) before Store2 and all subsequent operations to store instructions
LoadStore Barriers Load1; LoadStore; Store2 Ensure that Load1 loads data before Store2 and all subsequent store instructions flush data into memory
StoreLoad Barriers Store1; StoreLoad; Load2 This barrier ensures that Store1 immediately flusher data to memory before Load2 and all subsequent loading instructions. It makes all memory access instructions (store instructions and access instructions) prior to the barrier complete before executing memory access instructions behind the barrier

Volatile writes insert instructions before and after instructions.

2. Volatile reads insert two instructions after the instruction.

3. The StoreLoad barrier is an all-purpose barrier that has the effect of the other three barriers. So performing this barrier is expensive because it forces the processor to flush all the data in the cache into memory.

The resources

1. The interviewer ali didn’t expect that I could talk to him for half an hour on a Volatile

2. How does Volatile ensure thread visibility with bus locks and cache consistency protocols

Do you really understand the volatile keyword?

4. In-depth understanding of Java virtual machines

Happens-before rule

6. Volatil and memory barriers

Understanding memory barriers