1.Volatile Code raises the question: What is the difference between Volatile and unvolatile code?
public class App { public volatile static boolean stop = false; // Add volatile and terminate immediately. public static void main(String[] args) throws InterruptedException { Thread thread = new Thread(() -> { int i = 0; while (! Stop) {// do not satisfy i++; }}); thread.start(); Thread.sleep(1000); stop = true; //true } }Copy the code
2. What does Volatile do?
Volatile ensures visibility of shared variables in a multithreaded environment, so what is visibility? In A single-threaded environment, if thread A is told to change A value of A quantity, then, without any interference, thread A reads the value again, which is the value it changed before. However, in a multi-threaded environment, when reading and writing occur in different threads, it may occur that the reader thread cannot read the latest value written by another thread in a timely manner, which is called visibility.Copy the code
How do Java classes implement visibility?
In order to implement memory visibility for cross-thread writes, several mechanisms need to be used. Volatile is such a mechanism.Copy the code
-
How does volatile guarantee visibility?
We can use [HSDIS] this tool, to view the previous demonstration of the code assembly instructions, the specific use of please see the use of the instruction document in the running code, Set the JVM parameter as follows [- server – Xcomp – XX: + UnlockDiagnosticVMOptions – XX: + PrintAssembly -xx :CompileCommand=compileonly, thread. Java 】 Find the lock directive in the output. Volatile variable will fetch another lock directive. Lock is a kind of control instruction, in the multi-processor environment, lock assembly instruction can be based on the mechanism of bus lock or cache lock to achieve an effect of visibility.
3. What exactly is visibility?
To better understand the nature of visibility, we need to tease out the hardware layer from the hardware layer
Computer core components: CPU, memory, I/O equipment, these three in the processing speed difference. CPU is very fast, followed by memory, and finally IO devices such as disks and SSDS. In most applications, there will be memory access, and some may have ACCESS to I/O devices. In order to improve computing performance, cpus were upgraded from single-core to multi-core, and even hyperthreading was used to maximize CPU processing performance. However, improving CPU performance alone was not enough. If the latter two processing performance did not keep pace, it meant that overall computing efficiency depended on the slowest device.
In order to balance the speed difference between the three and maximize the use of CPU to improve performance, a lot of optimization has been made from hardware, operating system, compiler and other aspects, so what are the specific optimization?
- The CPU has increased the cache
- The operating system added processes, threads. Maximize CPU usage by switching CPU time slices
- Compiler instructions are optimized to make better use of the CPU cache.
Each optimization, however, brings its own problems, and these problems are the root cause of thread-safety problems. In order to understand the nature of the visibility issues mentioned earlier, it is necessary to understand these optimizations.
CPU cache thread is the smallest unit of CPU scheduling, thread design is ultimately to make full use of the efficiency of computer processing, but the vast majority of computing tasks can not only rely on the processor “calculation” can be completed, the processor also needs to interact with memory, such as reading operational data, storage of operational results. I/O operations are hard to avoid, and since the speed gap between the computer’s storage device and the processor is very large, modern computer systems add a buffer between the memory and the processor by adding a layer of caches with read and write speeds as close as possible to the processor’s speed: Copy the data needed for the operation to the cache so that the operation can proceed quickly, and then synchronize the operation from the cache to memory when it is finished.
The schematic diagram is as follows:
Storage interaction through caching is a good solution to the problem of processor/memory speed, but it also introduces higher complexity to computer systems because it introduces a new problem, cache consistency.
What does cache consistency mean?
First after the cache exists, each CPU process is: first calculates the needed data cached CPU cache, hormone in CPU at that time, than to read data directly from the cache in the calculation after the completion of the write cache, in the whole operation process is completed, and then put the data in the cache synchronization to the main memory.
Because in a multi-core CPU. Each thread may run on a different CPU, and each thread has its own cache. The same piece of data may be cached in multiple cpus. If different threads running on different cpus see inconsistent cache values for the same piece of memory, a lot of things are done at the CPU level to solve the problem. Two main methods are provided:
- The bus lock
- Cache lock
Bus locks and cache locks
Bus locks, is simply, under the multiple CPU, as one of the processor to the Shared memory to operate, on the bus signal a LOCK#, this signal makes other processor cannot be asked by bus to visit the data in the Shared memory, bus lock lock of communication between the CPU and memory, which makes the lock period, Other processors cannot manipulate data at other memory locations, so bus locking is expensive, and this mechanism is obviously not appropriate.
How do you optimize it?
The best way to do this is to control the granularity of lock protection. We just need to ensure that the same data cached by multiple cpus is consistent. So cache lock is introduced, its core mechanism is based on cache consistency protocol to achieve. Cache consistency protocol To achieve data access consistency, each processor needs to follow some protocol when accessing the cache, and operate according to the protocol when reading and writing data. Common protocols are MSI, MESI, MOSI, etc. The most common is the MESI protocol. So LET me give you a brief introduction to MESI
MESI represents the four states of the cached row, which are:
- M(Modify) indicates that shared data is cached only in the current CPU cache and is in the modified state. That is, the cached data is inconsistent with the data in the main memory.
- E(Exclusive) indicates the Exclusive state of the cache. Data is cached only in the current CPU cache and is not modified.
- S(Shared) indicates that data may be cached by multiple cpus and the data in each cache is the same as that in the main memory
- I(Invalid) indicates that the cache is Invalid
In the MESI protocol, each Cache’s Cache controller not only knows about its own read and write operations, but also snoop on other caches’ read and write operations.
- S status:
- M:
- E:
The MESI protocol complies with the following rules in terms of CPU read and write: CPU read request: the cache in the M, E, and S states can be read. The CPU in the I state can only read data from the main memory. CPU write request: The cache in the M and E states can be written. With the bus lock and cache lock mechanism, CPU operations on memory can be abstracted into the following structure to achieve cache consistency.
The nature of Visibility Due to THE advent of CPU caching, visibility problems can occur when multiple cpus simultaneously cache the same shared data. That is, CPU(A) has modified its local cache and the value is not visible to CPU(B). If the data is not visible, CPU1 will use dirty data when it writes to the data. Making the final result of the data unpredictable.
This situation is difficult to simulate. Since we can’t get a thread to specify a particular CPU, this is the underlying algorithm of the system and should be out of the CONTROL of the JVM. Most importantly, you can’t predict when the CPU cache will pass the value to main memory, and the interval may be so short that you can’t even observe it.
Finally, there is the order of execution of threads, because with multiple threads you have no control over which thread’s code will be executed immediately after another thread’s code. So we can only understand the fact that there is such an objective fact based on the principle of this, you should have a question at this point, didn’t you just say that cache consistency protocol or bus lock can achieve cache consistency?
Why is the volatile keyword needed? Or why is there a visibility problem?
MESI optimization brings visibility issues. The MESI protocol can achieve cache consistency, but it also has some problems. That is, the state of the individual CPU cache rows is mediated by message passing. If CPU0 wants to write to a variable shared in the cache, it first needs to send an invalid message to the other CPUS that have cached the data.
And wait for their confirmation receipt. CPU0 will be blocked during this time. To avoid the waste of resources caused by blocking. Store Bufferes were introduced in the CPU.
What does storeBufferes do? CPU0 simply writes the shared data directly to storeBufferes, sends the invalidate message, and continues processing the other instructions. When receiving an Invalidate Acknowledge message from all other cpus, store the data in store Bufferes to the cache line. Finally, the cache rows are synchronized to main memory.
But there are two problems with this optimization
It is uncertain when the data will be committed because it waits for a response from another CPU before synchronizing the data, which is an asynchronous operation. When storeBufferes are introduced, the processor first tries to read values from StoreBuffer, directly from storeBuffer if there is data in storeBuffer, and then from cache lines otherwise. Let’s look at an example
int value=3; void exeToCPU0(){ value =10; // (S->M) -> write to storeBuffer to notify other cpus and wait for other cpus ask isFinish =true; // (E) the following code is still executed and does not need to communicate with cpus, so cpu1 will immediately get isFinish to true. } void exeToCPU1(){if(isFinish){assert value==10; // Result is not 10; }}Copy the code
ExeToCPU0 () and exeToCPU1 () are executed on two separate cpus. If CPU0’s cache line caches the shared variable isFinish, and the state is (E), Value may be in the (S) state. In this case, CPU0 will first write the instruction value=10 into storeBuffer. And notifies other cpus that cache the value variable. CPU0 continues to execute isFinish=true while waiting for other cpus to tell it. Because CPU0 caches isFinish and is in an Exclusive state, isFinish=true can be changed directly. CPU1 may initiate a read operation to read isFinish with a value of true, but value is not equal to 10. This can be thought of as out-of-order CPU execution, or as a reordering, which brings visibility problems.
As a result, it is difficult to understand the software dependencies at the hardware level, so memory barriers are provided at the CPU level. The memroy barrier is an instruction in the CPU flushstore bufferes. The software layer can decide where to insert memory barriers. CPU level memory barrier
What is a memory barrier? As a preliminary guess, the memory barrier is to write instructions from Store Bufferes to memory, thus making it visible to other threads accessing the same shared memory. X86 memory-barrier instructions include lfence, sfence, and mfence.
A Store Memory Barrier tells the processor to synchronize all data stored in Store bufferes prior to the write Barrier to main Memory. In short, it makes the results of instructions prior to the write Barrier visible to reads or writes after the write Barrier. All read operations performed by the processor after the Load Memory Barrier are performed after the Load Memory Barrier. In conjunction with the write barrier, memory updates before the write barrier are visible to read operations after the read barrier.
With a Full Memory Barrier in place, we can change the above example to avoid visibility problems by ensuring that the result of a pre-barrier read/write is committed to Memory before a post-barrier read/write is performed.
In general, memory barriers can be used to ensure that shared data is visible in parallel execution by preventing out-of-order CPU access to memory. But how can this barrier be added? Going back to the beginning of the code for volatile, which generates a Lock assembly instruction, the problem arises when memory barriers are implemented. Memory barriers, reordering, and other things seem to be platform and hardware architecture dependent. As a feature of the Java language, multiple runs are written at once. Platform-specific issues should not be considered, and these so-called memory barriers should not be a programmer’s concern, so let’s take a look at how to ensure visibility from the JMM level.
— — — — — — — —
Copyright notice: This article is originally published BY CSDN blogger “17610229712”. It is subject to CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement. Original link: blog.csdn.net/weixin_3950…