preface

Through this article you will know:

What is thread synchronization and why is it needed? A print statement is used to exit the loop. A volatile statement does not guarantee atomicity. Why volatile 7. Volatile Visibility, Sequentiality Principle 8. Where volatile is used

Concept of analytical

Volatile is associated with thread synchronization. Does volatile enable thread synchronization? Two premises: what is thread synchronization? Why thread synchronization?

  1. We know that in modern computing, the interaction between the central processing unit (CPU), which performs the calculations, and the memory (MEM), which stores the data, can be easily understood as the core of program execution (although it is much more complicated than that).

A thread can be thought of as a piece of code running on a CPU. On a single-core CPU, only one thread is executing at any one timeAs shown in the figure above, there are 5 threads waiting for the CPU to execute in rotation. There is a variable x in meM. The function of each thread is to increase x by 1.

Thread 1 fetchs x=1 from mem to CPU x= x+1 -> x= 2; Thread 2 fetchs x=2 from mem to CPU x= x+1 -> x= 3; . Thread 5: fetch x=4 from meM to CPU x= x+1 -> x= 5;

As you can see, on a single-core CPU, multiple threads share the same variable x and all operate on it. When one thread changes x, the other threads know that the value has changed. We can think of it as thread synchronization between threads for variable x.

2. We know that CPU processing speed is much faster than memory reading and writing speed. When thread 1 finishes computing X and writes to memory, the CPU has nothing to do. In order to make full use of CPU resources, a memory is added between the CPU and MEM, which is much faster than meM. It is called cache. When an X is read from the MEM, it is updated to the cache. The next time an X is read from the MEM, it is searched in the cache first. Therefore, the meM does not need to be searched, which greatly improves the read/write efficiency. However, a single-core CPU can only execute one thread at a time. In order to achieve true parallelism, multi-core CPUS are born as shown in the following figure.Thread 1 runs on CPU1, thread 2 runs on CPU2, and the X variable is stored in MEM. How can thread 1 and thread 2 update variable X?

1, assuming that the initial value of x x = 1, 2 threads, thread 1 and 2 on the operation for the x x = x + 1 3, thread 1 will be loaded into the x value cache1, thread 2 x value is loaded into the cache2 4, thread 1 x = x + 1 – > x = 2, Cache1 =x+1->x =2; cache2 = mem (x=2); Write x in cache2 to mem where x=2

Since thread 1 and thread 2 are modifying X at the same time, there is a risk that thread 2 is not aware that thread 1 is modifying X while thread 2 is modifying X, and thread 1 is aware that thread 2 is modifying X. The two threads are doing their own work and finally reporting x=2 in the MEM instead of x=3 as expected.

Take a popular example: father (thread 1) mother (thread 2) stipulated that Xiao Ming can only have 1 yuan of pocket money for 1 day. One day, father gave Xiao Ming 1 yuan, and mother did not know that father had given 1 yuan, so she also gave Xiao Ming 1 yuan. So far, Xiao Ming has 2 yuan, which is inconsistent with the requirements of mom and dad. This is a real need for thread synchronization.

Now that we’ve explained what thread synchronization is and why, let’s look at whether volatile can do thread synchronization.

private volatile static int x = 0; public static void main(String args[]) { Runnable runnable = new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { x++; }}}; Thread thread1 = new Thread(runnable); Thread thread2 = new Thread(runnable); thread1.start(); thread2.start(); try { thread1.join(); thread2.join(); System.out.println("x=" + x); } catch (Exception e) { } }Copy the code

Thread1 and thead2 increment x, volatile, and volatile, respectively. As you can see, the results are random and not as expected, suggesting that volatile does not guarantee thread synchronization in this scenario.

Thread concurrency has three elements

atomic

One or more operations, either all performed without interruption, or none performed at all.

visibility

In the same thread, the result of the code executed first is visible to the code executed later. After any thread changes a variable in different threads, other threads can know the result of the modification in time.

order

In the same thread, the program is executed in the order in which the code is executed.

As you can see, if you do nothing else, you may encounter atomicity and visibility problems in a multithreaded environment, because each thread does not know that one of the threads is operating on a shared variable, and the result of the operation is not immediately known. In order for multithreading to be safe, all three conditions need to be met. What does volatile satisfy?

Verify volatile visibility

Juejin. Im /post/5c6b99…

private static int sharedVariable = 0; private static final int MAX = 10; public static void main(String[] args) { new Thread(() -> { int oldValue = sharedVariable; while (sharedVariable < MAX) { if (sharedVariable ! = oldValue) { System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable); oldValue = sharedVariable; } } System.out.println(Thread.currentThread().getName() + " stop run"); }, "t1").start(); new Thread(() -> { int oldValue = sharedVariable; while (sharedVariable < MAX) { System.out.println(Thread.currentThread().getName() + " do the change : " + sharedVariable + "->" + (++oldValue)); sharedVariable = oldValue; try { Thread.sleep(500); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println(Thread.currentThread().getName() + " stop run"); }, "t2").start(); }Copy the code

The purpose of the above code is: there is a sharedVariable sharedVariable, respectively start two threads t1,t2. Each time T2 increments sharedVariable, T1 continuously detects the change of sharedVariable, and prints if there is any change. Based on our previous knowledge, we can quickly guess that T1 may not be able to know the changes of sharedVariable in real time. Run the code to verify our guess.

The above picture conveys two messages:

2. The t2 thread stops, t1 does not stop, and the program does not stop running

Consistent with our expectation, T1 was not timely informed of t2’s modification of the shared variable, leading to t1’s detection all the time. Ok, now what happens if we add volatile to sharedVariable?

The above picture conveys two messages:

1. T1 can timely know every modification of T2 to sharedVariable. 2. T1 and T2 have been stopped, and the program has stopped running.

Using the positive and negative examples above (with or without volatile), it seems possible to prove that volatile variables can be seen across threads, but is this proof appropriate? Let’s look at the code for T1

while (sharedVariable < MAX) {
                if (sharedVariable != oldValue) {
                    System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable);
                    oldValue = sharedVariable;
                }
            }
Copy the code

When sharedVariable! = oldValue = sharedVariable; sharedVariable = oldValue; sharedVariable = oldValue; Add a line of print to the T1 loop.

new Thread(() -> { int oldValue = sharedVariable; while (sharedVariable < MAX) { System.out.println("sharedVariable:" + sharedVariable + " oldValue:" + oldValue); if (sharedVariable ! = oldValue) { System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable); oldValue = sharedVariable; } } System.out.println(Thread.currentThread().getName() + " stop run"); }, "t1").start();Copy the code

Results:

To our surprise, t1 was able to detect sharedVariable changes without volatile modification, and we only added one print line. If (sharedVariable! = oldValue) condition is not true, so we want to print the else to verify:

new Thread(() -> { int oldValue = sharedVariable; while (sharedVariable < MAX) { if (sharedVariable ! = oldValue) { System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable); oldValue = sharedVariable; } else { System.out.println("sharedVariable:" + sharedVariable + " oldValue:" + oldValue); } } System.out.println(Thread.currentThread().getName() + " stop run"); }, "t1").start();Copy the code

The results are as follows:

T1 can still detect changes in sharedVariable. To see all the printing in T1, we add sleep(200).

We see that T1 is aware of every change t2 makes to sharedVariable. The variable here is that we add a line of print, so we’re going to take the print out of the else

new Thread(() -> { int oldValue = sharedVariable; while (sharedVariable < MAX) { if (sharedVariable ! = oldValue) { System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable); oldValue = sharedVariable; } else { try { // System.out.println("sharedVariable:" + sharedVariable + " oldValue:" + oldValue); Thread.sleep(200); } catch (Exception e) { } } } System.out.println(Thread.currentThread().getName() + " stop run"); }, "t1").start();Copy the code

Results:

The only variable is a single line of sleep(200). We comment out the result:

The printed result returned to our initial performance, t1 could not detect the change of sharedVariable. A conclusion derived from the phenomenon:

Add print or sleep to T1 to enable T1 to detect changes in sharedVariable

You might say, “Why do I need volatile?” Can println and sleep really be visible? I looked at println and sleep methods, nothing special in themselves. Now that I’m sure there’s nothing wrong with our code, I can’t help but wonder if the compiler and JVM are screwing up. Let’s see how they optimize the code without printltn and sleep. [2]www.cnblogs.com/dzhou/p/954…Quote a picture from this article

Possible code optimizations include: 1) compiler compilation to.class files; 2) interpreter &JIT code generatorLook at the.class fileBefore compiling

The compiled

As you can see, there’s no difference before and after compilation. See how the JVM optimized interpreter interprets bytecode as machine language; JIT saves time by allowing code that is being executed repeatedly (hot-spot code) to be compiled to local machine code and then executed directly. Since the compiler did not optimize, the assumption is that the JVM did optimize, which is actually what the JIT did. 【 note 3 】 hllvm-group.iteye.com/group/topic… This article explains how JIT optimizes loops. Here is a brief description: taking t1 code as an example, our original intention is to constantly monitor the changes of sharedVariable if sharedVariable<10, and print out any changes.

            while(sharedVariable < 10) {
                if (sharedVariable != oldValue) {
                    System.out.println(Thread.currentThread().getName() + " watched the change : " + oldValue + "->" + sharedVariable);
                    oldValue = sharedVariable;
                }
            }
Copy the code

The JIT found that there was a repeated determination of “sharedVariable! = oldValue “, so the code is optimized to only fetch the value of sharedVariable the first time, and then not to fetch the latest value, and then continue to loop forever, resulting in the program does not exit. By adding println or sleep, the JIT cancels optimization. (Note: this conclusion is based on the example in the link to guess, not to understand the compiled code, if you have different views, please comment).

Cache consistency protocol

We have found that t1 can detect sharedVariable changes by using println/sleep instead of volatile. In order to ensure data consistency between caches of different cpus, the CPU implements the cache consistency protocol (cACHES).Cache-Coherence Protocols )

The basic data unit of a cache is called a cache line (64 bytes, 128 bytes, etc.). Data is exchanged between the CACHE, MEM, and cache using the cache line as the basic unit. Since data consistency is required between CACHE1, MEM, and cache2, effective collaboration is required. The cache line has four states:

Modified (M), Exclusive (E), Shared (S), and Invalid (I)

So the cache consistency protocol is also called the MESI protocol. So how do cache1, MEM, and cache2 communicate? Six messages are defined between them:

1, Read (contains the address of the variable to be Read in meM) 4. Invalidate Acknowledge (Reply to Invalidate message) 5 6, WriteBack (write cache to meM)

Simplified communication process

We observe how MESI works by reading and writing a shared variable x, respectively. Cache1 read-write x: x is stored in mem and not in cache1 or cache2. The initial value is 0.

Cpu1 sends a Read message. Mem and Cache2 receive this message. Mem sends a Read Response (including x)

Cpu2 reads x:

Cpu2 sends a Read message. Mem and Cache1 receive the message. Cache1 sends a Read Response (including x)

At this point, cache1 and cache2 both have an x state of S and now CPU1 needs to change x=1

2. After receiving the Invalidate message, CPU1 finds the corresponding cache line, sets it to “I”, and sends an Invalidate Acknowledge message. 3. Update x=1 to cache line (M)

Cpu2 reads x = 1

2. Cpu1 receives the Read message and writes the cache line to THE MEM state. 3. 3. Cpu2 receives the Read Response and updates the corresponding cache line

In this case, x=1 will be written to the MEM. If x=1 in cpu2 cache and x=1 in cache1, MEm, and cache2, consistent communication is completed.

Extract CPU performance

If the CPU follows the above protocol, it is expected to be less efficient. Store Buffer assumes that x is in CPU2, but not cpu1. The initial value x=0 is executed by CPU1

1. Cpu1 sends a “Read Invalidate” message. 2. 3. After receiving the message, cpu1 adds the corresponding cache line. 4

As you can see from the previous steps, CPU1 needs to wait for CPU2’s Response to proceed to the next step. The waiting process is time-consuming. Therefore, Store Buffer data is added between CPU1 and cache. The process becomes as follows:

Cpu1 sends a “Read Invalidate” message. Cpu1 places x=1 in the Store Buffer. 4. After receiving the message, ADD the corresponding cache line to cpu1. 5

Added Store Buffer, which seems useless. However, if CPU1 needs to update the values of multiple variables such as x, y, and z at the same time, it would be efficient to write x, y, and z to the Store Buffer and then to the cache at a later time.Invalidate QueueIf CPU1 sends a “Read Invalidate” message, it needs to wait for the “Invalidate Acknowledge” message from CPU2. Then the Invalidate Acknowledge is not sent and CPU1 needs to wait. The Invalidate Queue is introduced. Cpu2 receives the Invalidate message, places it in the Invalidate Queue, and sends an Invalidate Acknowledge immediately. The Store Buffer and Invalidate Queue are added

The memory barrier

Are there any side effects to the success of improving CPU efficiency through the Store Buffer and Invalidate Queue? To start with, Store Buffer introduces x in cache2 and y in cache1, where CPU1 executes performByCpu1() and cpu2 executes performByCpu2

int x = 0; boolean y = false; void performByCpu1() { x = 1; y = true; } void performByCpu2() { while(! y) continue; assert x == 1; }Copy the code

If y=false is found, cpu1 writes y=true to the cache. 4. If y=true is found to be invalid, cpu2 reads x from cpu1 and finds x=0. Assert Fail 6. Cpu1 flusher x=1 in the Store Buffer to the cache line

X =1 was not written to the cache line by CPU1. Why does this happen?

  • When cpu1 sends a Read Invalidate and puts x=1 into the Store Buffer, y=1 continues. If cpu2 sends a Read request for y, then cpu2 may receive a Response from cpu1. However, cpu1 has not yet received the Read Response or Invalidate Acknowledge, so the Store Buffer cannot be written to the cache, and CPU2 receives the data before the change.

How to solve this problem? Adding a memory barrier tells the CPU to ensure that the Store Buffer is flushed into the cache before proceeding to the next step

int x = 0; boolean y = false; void performByCpu1() { x = 1; smp_mb y = true; } void performByCpu2() { while(! y) continue; assert x == 1; }Copy the code
  • X =1 is written to the cache from the Store Buffer when cpu1 executes the smp_MB command. X =1 is Read in Response to the smp_MB command. Asser x==1 is true. Successfully resolved the issue of introducing a Store Buffer.

Let’s look at the introduction of the Invalidate Queue

1. Cpu1 sends the Read Invalidate message. 2. 4. Cpu2 updates y to the cache line and breaks the loop. 5. X =0 Assert fail 6. Cpu2 Read the Invalidate Queue to Invalidate its cache

X =1 was not picked up by CPU2. The reasons for this problem are as follows:

  • Cpu2 executed the message in the Invalidate Queue at an indefinite time

How to solve this problem? Adding a memory barrier tells the CPU to ensure that the messages in the Invalidate Queue are executed before proceeding to the next step

int x = 0; boolean y = false; void performByCpu1() { x = 1; smp_mb y = true; } void performByCpu2() { while(! y) continue; smp_mb assert x == 1; }Copy the code
  • When cpu2 reaches smp_mb, it forcibly fetches the Invalidate Queue message and removes the cache line from x. Then, when cpu2 accesses x, it obtains the Invalidate Queue message from cpu1. Assert x== 1 is true. The Invalidate Queue was introduced successfully.

【 note 4 】 memory barrier message flow may refer to: zhuanlan.zhihu.com/p/55767485

Summary CPU implements the cache consistency protocol to ensure data consistency between caches as much as possible. However, in order to make full use of CPU performance, the Store Buffer and Invalidate Queue caches are added. As a result, the CPU may have an inconsistent instruction order problem, which appears to be rearranged. This problem can be avoided by adding memory read/write barriers. Back to our previous problem: “No volatile modifiers for shared variables, add println/sleep to cancel JIT optimization, allowing T1 to detect changes in sharedVariable.” This is possible because the CPU implements the cache consistency protocol. Returning to our original question, how do we verify that volatile variables are visible across threads? We wanted to prove that t1’s dead-loop was not detected because T1 failed to detect sharedVariable changes, but because the JIT was optimizing the code. Therefore, we added a print statement and cancelled JIT optimization. At this time, t1 cannot effectively observe the change of sharedVariable, because cache consistency ensures cache data consistency as much as possible.

Volatile role

visibility

MESI uses memory barrier instructions to ensure data order consistency. When is the barrier inserted? In fact, when compiling into assembly statements, the JVM adds a LOCK prefix to the memory barrier when it detects that a variable is volatile. This is one reason why, with the MESI protocol, we still need volatile. Because memory barrier instructions are inserted before and after volatile variables are read and written, data between caches is perceived in a timely manner, and shared variables are visible across threads. That is, volatile variables are visible across threads.

order

We know that in a single thread, programs are executed in the order in which the code is executed. Use the example above.

int x = 0; boolean y = false; void performByCpu1() { x = 1; y = true; } void performByCpu2() { while(! y) continue; assert x == 1; }Copy the code

First to see performByCpu1 () method, x, y no relationship, the compiler to optimize, possible instruction rearrangement, y = true ahead of x = 1, has no effect in a single thread, but when we look at performByCpu2 () method, because the instructions rearrangement, Lead to multithreaded program execution results are incorrect. This is a rearrangement of instructions at the compiler level that, with the volatile modifier, cancels the compile optimization. This is the second reason why, with the MESI protocol, we still need volatile.

Does volatile guarantee atomicity

As we demonstrated in the previous example, volatile does not guarantee atomicity. Why not? Both threads T1 and T2 perform x++ on x at the same time, assuming that T1 executes on CPU1 and T2 executes on CPU2. X++ is a compound operation that consists of the following three steps:

2. Calculate the value of x+1. Assign x to the result of x+1

Command execution is simplified as follows:

3. Execute x+1, TMP =1 4. Write x=1 to the cache. The value and use cpu1 cache, cpu2 cache x x + 1 = 1 6, cpu2 implementation, because before x = 0 already in cpu2 register inside, so direct execution x + 1, and 7 x = 1 write cache, final written mem x = 1 = 2 x with expected results

Thus volatile does not guarantee atomicity.

Volatile usage Scenarios

In summary, volatile guarantees visibility, order, and not atomicity. In what situations is volatile used? Then analyze the characteristics of visibility and order. For order, we can achieve it by prohibiting instruction reordering with volatile. Typical use is double-checked locking (DCL) in singletons.

    static volatile CheckManager instance;
    public static CheckManager getInstance() {
        if (instance == null) {
            synchronized (CheckManager.class) {
                if (instance == null) {
                    instance = new CheckManager();
                }
            }
        }
        
        return instance;
    }
Copy the code

Instance = new CheckManager() is a compound operation, the normal steps are decomposed as follows:

CheckManager = CheckManager; CheckManager = CheckManager

Due to an instruction reordering optimization, 3 might execute before 2, which could cause problems when another thread gets an instance because normal initialization is not complete. This is avoided by using volatile to modify instance and disallow instruction reordering. For visibility, if one thread is modifying a shared variable while another thread is only reading it, the shared variable is volatile so that the reader thread is aware of the change. For example, you can refer to t1 and T2 reading and writing X. In fact, we found that CAS, locks and other locking mechanisms use volatile to ensure visibility. If you are interested, check out the source code. Have doubt, correct please comment ~

Refer to the link

Juejin. Im/post / 5 c6b99… www.cnblogs.com/dzhou/p/954… Hllvm-group.iteye.com/group/topic… zhuanlan.zhihu.com/p/55767485

This article is based on JDK 1.8 and the runtime environment is JDK1.8.

If you like, please like, pay attention to your encouragement is my motivation to move forward

Continue to update, with me step by step system, in-depth study of Android/Java

More dry goods, public account [small fish love programming]