Volatile for Java multithreading

Author someone Valar if need to reprint please keep the original link

Some pictures from Baidu, if there is infringement please contact delete

Related recommendations:

Java multithreading Synchronized

Volatile for Java multithreading

Directory:

What is volatile, and what should you be careful about using it?
The relationship between volatile and atomicity, visibility, and order
How volatile works (memory barriers, CPU caching, MESI protocol)
The difference between volatile and synchronized

1. What is volatile?

Volatile means volatile in Chinese. In Java, it is also a keyword used to modify variables.

In the JMM (Java Memory Model), there is main Memory, and each thread also has its own Memory (such as registers). For performance, a thread keeps a copy of the variable to be accessed in its own memory.
In this case, the value of the same variable in memory in one thread may be inconsistent with the value in memory in another thread or in main memory at any given moment.
The declaration of a variable as volatile means that the variable is subject to change by another thread, and that the thread reads its latest value each time it uses the variable.

Java memory model diagram:

When using volatile, note:

volatileWhether it’s embellishmentThe instance variablesorA static variable, both need to be placedThe data typeThe keyword is placed beforeString,intSuch as before.
volatileandfinalYou cannot modify one variable at a time. Volatile ensures that the result is visible to other threads when a variable is written, and final prevents the variable from being written again.

2. The relationship between volatile and atomicity, visibility, and order

An introduction to atomicity, visibility, and orderliness was introduced in a previous article, Portals

2.1 Does volatile Guarantee atomicity?

Can’t.

We know that atomicity means that an operation cannot be separated into more than one step. An operation or operations will either all be performed without interruption by any factor, or none at all.
For volatile, atomicity is not guaranteed by using volatile on a variable of an operation if the operation itself is not atomic.

For example, we often encounter the i++ problem.

i = 1; // Atomic operations do not use volatile and do not have thread-safety issues.
Copy the code

volatile int i = 0;
i++; // Non-atomic operation
Copy the code

If we start 200 threads and execute i++ concurrently, we only execute it once per thread. If volatile guarantees atomicity, then the final result of I should be 200; And in fact we find that this value is going to be less than 200. Why is that?

// I ++ can be split into 1, thread read I 2, temp = I + 1 3, I = tempCopy the code

For example, when I =5, thread A and thread B simultaneously read the value of I
And then thread A executestemp = i + 1“, and notice that I hasn’t changed yet, and then thread B does ittemp = i + 1Thread A and thread B both store I value 5, temp value 6
And then thread A executesi = tempAt this point, the value of I will be immediately refreshed to main memory and notified that the value of I saved by other threads is invalid. At this point, thread B needs to read the value of I again, so the value of I saved by thread B is 6
Meanwhile, thread B still holds temp 6, and then thread B executesi=temp(6), so the calculation result is 1 less than expected.

Reference: www.cnblogs.com/simpleDi/p/…

So how can i++ be thread-safe?

usesynchronizedKeyword orLock. As for why, can lookSynchronized and atomicity

synchronized(object){
    i++;
}
Copy the code

Use classes that support atomic operations, such asjava.util.concurrent.atomic.AtomicInteger, it uses CAS(Compare and swap) algorithm, which is more efficient than the first one.

2.2 Volatile and visibility

Volatile writes enforce synchronization between the cache and main memory. Other threads read from main memory if the cache is invalid, thus ensuring visibility.

2.3 Volatile and order

Volatile disallows instruction reordering and therefore guarantees order.

What is Instruction Reorder?

In the Java memory model, the compiler and processor are allowed to reorder instructions without affecting the execution of a single thread, but there is no guarantee that concurrent execution of multiple threads will not be affected.

For example, the following code is executed in the order 1->2->3->4 without instruction reordering. But in the actual execution, it will be 1->2->4->3 or 2->1->3->4 or other. But it will guarantee that 1 is before 3 and 2 is before 4. All of these end up being a=10; B = 20.
int a = 0;//语句1
int b = 1;//语句2
a = 10; //语句3
b = 20; //语句4
Copy the code
But in the case of multiple threads, there is the following program in another thread. When the above execution order is reordered to 1->2->4->3, when thread 1 executes to step 3 b=20, switch to thread 2, it will output a is already 10, but the value of a is still 0.
if(b == 20){
  System.out.print("A is already 10.");
}

Copy the code

3. How volatile works

3.1 Memory barriers and instruction reordering

To understand how volatile disallows instruction reordering, you first need to understand the conceptThe memory barrier.

Memory barriers, also known as Memory barriers, are CPU instructions and are used in languages such as Java, C ++, and C.

It allows the CPU or compiler to perform operations on memory in a strict order, which means that instructions before and after a memory barrier cannot be out of order due to system optimization.

A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a CPU or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are Guaranteed to be performed before operations issued after the barrier. — Wikipedia

4 memory barriers in 3.1.1 JVM

LoadLoad barrier:

// Abstract scene:
Load1; 
LoadLoad; 
Load2
Copy the code

Load1 and Load2 represent two read instructions. Ensure that the data to be read by Load1 is completed before the data to be read by Load2 is accessed.

StoreStore barrier:

// Abstract scene:
Store1; 
StoreStore; 
Store2
Copy the code

Store1 and Store2 represent two write instructions. Before Store2 writes are executed, ensure that Store1 writes are visible to other processors

LoadStore barrier:

// Abstract scene:
Load1; 
LoadStore; 
Store2
Copy the code

Ensure that the data to be read by Load1 is read before Store2 is written.

StoreLoad barrier:

// Abstract scene:
Store1; 
StoreLoad; 
Load2
Copy the code

Ensure that writes to Store1 are visible to all processors before Load2 reads are executed. The StoreLoad barrier has the largest overhead of the four.

3.1.2 Relationship between Volatile and memory barriers

After a variable is volatile, the JVM does two things for us:

The StoreStore barrier is inserted before each volatile write and the StoreLoad barrier is inserted after each volatile write.
A LoadLoad barrier is inserted before each volatile read and a LoadStore barrier is inserted after each read.

Again, use the above example:

This time use volatile to modify the variable B

int a = 0;//语句1
volatile int b = 1;//语句2

// The statement executed in thread 1
a = 10; //语句3
b = 20; //语句4

// The statement executed in thread 2
if(b == 20){
   System.out.print("A is already 10.");
}
Copy the code

The statement in thread 1 will look like this after compilation

 a = 10; //语句3----------- StoreStore barrier --------------- b =20; //语句4----------- StoreLoad barrier ---------------Copy the code

Because of the barrier, statements 3 and 4 cannot be instructed to reorder, ensuring that when b=20, A has already been assigned 10. There are no thread safety issues with this program.

3.1.3 Performance Impact of memory barriers

Memory barriers prevent the CPU from using optimization techniques to reduce memory operation latency, and the performance penalty must be considered. For best performance, it is best to modularize the problem to be solved so that the processor can execute the task in units, and then place all the required memory barriers at the boundaries of the task unit. Using this method allows the processor to execute a task unit without limitation.

Reference: ifeve.com/memory-barr…

3.2 How does Volatile achieve visibility?

To understand how volatile ensures visibility, you need to understand the concept of CPU caching.

3.2.1 CPU cache

We know that the CPU can compute much faster than the memory can read and write. This causes the memory to fail to keep up with the CPU, hence the CPU cache. It is a temporary data exchange between CPU and memory, our common CPU will have three levels of cache, often referred to as L1, L2, L3.

Below is a conceptual model of caching for Intel Core I7 processors (image from Understanding Computer Systems)

When the system is running, the CPU performs calculations as follows:

Programs and data are loaded into main memory
Instructions and data are loaded into the CPU cache
The CPU executes instructions and writes the results to the cache
Data in the cache is written back to main memory

Under the above cache model, problems tend to occur when multiple cores execute a task concurrently. eg.

Core 0 first reads variable A from memory.
Core 3 also reads variable A from memory.
Kernel 0 modifies variable A and synchronizes it to main memory.
Core 3 starts with variable A, but the value is still old.

To solve these problems, the MESI protocol for cpus emerged.

3.2.2 the msci agreement

In the early days of cpus, this was done by locking the bus with LOCK# (also known as bus lock). When a CPU operates on data in its cache, it sends a Lock signal to the bus. In this case, all cpus receive this signal and do not operate on the corresponding data in their cache. When the operation is finished and the lock is released, all cpus go to the memory to obtain the latest data update.

But this was too expensive, so Intel developed the cache consistency protocol, also known as MESI. It does this by storing a flag bit in the CPU cache that has four states:

M: Modify the cache. The current CPU cache has been modified, that is, it is inconsistent with the data in the memory.
E: Exclusive cache. The current CPU cache and memory data are consistent, and other processors do not have available cache data.
S: Share, a copy of the cache that is consistent with memory. Multiple groups of caches can have a shared cache segment for the same memory address at the same time.
I: Invalid cache, this indicates that the CPU cache is no longer available.

CPU reads follow the following points:

If the cache state is I, then it is read from memory, otherwise it is read directly from the cache.
If a CPU with a cache of M or E reads from another CPU, it writes its cache to memory and sets its state to S.
The CPU can modify the cache data only when the cache status is M or E. After the modification, the cache status changes to M.

A common example is:

When a CPU writes to data, if it finds that the variable being operated on is a shared variable, that is, a copy of the variable exists in other cpus, it signals the other cpus to set the cache line of the variable to invalid state. When other cpus use the variable, they first sniff for changes to the variable, and when they discover that the variable’s cache line is invalid, they re-read the variable from memory.

3.2.3 Visibility of Volatile

With this in mind, it’s easy to understand how volatile is implemented.

Shared variables first decorated with the volatile keyword are converted to assembly language with an instruction prefixed with lock
When the CPU finds this instruction, it does two things immediately:
- 1. Writes the current kernel cache row data back to memory immediately
- 1. The MESI protocol invalidates data cached in other kernels so that other threads must re-read the data from memory.

Reference:

Crowhawk. Making. IO / 2018/02/10 /… Blog.csdn.net/nch_ren/art…

4. The difference between volatile and synchronized

Volatile has been covered quite a bit here, and finally, how it differs from synchronized.

To learn more about synchronized, click here.

Volatile is a variable modifier, while synchronized acts on a piece of code or method.
Volatile simply synchronizes the value of a variable between thread memory and “main” memory; Synchronized synchronizes the values of all variables by locking and unlocking a monitor. Clearly synchronized consumes more resources than volatile.
Volatile does not guarantee atomicity, but visibility and order (implemented by memory barriers). Synchronized can guarantee atomicity, visibility and order.When you're talking to an interviewer about this, it's important to be clear about the details, such as what the order is from and how it's implemented, otherwise the interviewer will be asked questions.

conclusion

This concludes the article on volatile, but if there are any errors or other important points about volatile that are not covered, feel free to leave a comment in the comments section.