The keyword volatile is a familiar one. It is widely used in the practice of concurrent programming, and it is hard to say you know anything about concurrent programming in Java unless you understand how volatile works.
This article will cover the underlying logic and implementation of volatile in the following ways to help you understand its real logic.
- The underlying implementation of the CPU
- JMM memory model
- Three basic problems with concurrent programming
- The volatile principle and memory semantics
- About instruction rearrangement and memory barriers
- summary
One, the bottom implementation of CPU
More CPU
Modern computers typically have two or more cpus running at the same time, and context switching can be costly if multiple programs, or processes, are running on only one CPU
CPU multi-core IN addition to the processor core, a modern CPU also includes registers, L1L2L3 level cache, floating point arithmetic unit, integer arithmetic unit and some auxiliary arithmetic devices, internal bus, etc. What’s the advantage of having a multicore CPU that means multiple processor cores on one CPU? If we run on a computer a multithreaded program, if the computer is a single core CPU, means that the application of different thread requires constant in different external bus communication between the CPU, but also need to deal with different between different CPU cache cause data inconsistency problem, so in this scenario, The addition of multiple cores and single CPU can play a great role in the communication on the internal bus, sharing the same cache.
The process by which a CPU reads data from memory
1, read the value of the register directly read, CPU directly with the register 2, L1 cache lock cache line, get data 3, L2 cache first from L1, if not, to L2, if in L2 lock, first copy to L1, then execute the process of reading L1. 5. Memory notifies the memory controller to occupy the bus loan, notifies the memory to lock, initiates the memory to read, and waits for the response. Save the response to L3–>L2–>L1, and finally release the bus lock.
1.1 Cache Consistency
In a multiprocessor system, each CPU has its own cache, and they share data in the same main memory. Storage interaction based on high-speed storage can solve the speed contradiction between CPU and main memory, but it also introduces a new problem ———— cache consistency. When multiple processors work on the same main memory area, their cache data may be inconsistent. If this happens, whose data should prevail? To solve this, each processor needs to follow some protocol when accessing the cache and operate according to the protocol when reading and writing. This article mainly uses the MESI protocol to illustrate.
1.11 Four states of cached rows:
The process of modifying data
- 1. CPU-A reads data from main storage and the state is E
- 2. Some other CPU reads the data, and everyone’s state changes to S
- 3. Cpu-a needs to modify the data, so it sends A message to the bus and sets the data cached by other cpus to I (invalid).
- 4. Write back to main memory through the bus
We can see that if the volatile keyword is used in a large number of programs, and a large number of concurrent changes are made, a write back from one thread can cause caches in a large number of other cpus to fail. These cpus can consume a large amount of bus bandwidth as they re-read from main memory. We can call this a bus storm.
If you change more than one cache row, MESI will not take effect, and the bus will be locked (inefficient) CPU does not support cache consistency protocol (low)
Second, JMM memory model
The Java Memory Model is translated from the English Java Memory Model (JMM). The JMM is not as real as the JVM memory structure. It is only an abstract concept.
The Java Memory Model (JMM) is a mechanism and specification that conforms to the specification of the Memory Model, shielding the access differences of various hardware and operating systems, and ensuring that the Java program can get the same effect on the access of Memory on various platforms. The goal is to solve atomicity, visibility (cache consistency), and orderliness problems due to multiple threads communicating through shared memory.
JVM to run the program of the entity is a thread, 2 each thread creation when the JVM will go to create a working memory (or become a thread stack space), used to store the thread private data, while the Java memory model rules all variables stored in main memory, main memory is Shared memory region, all threads can access, However, the operation (reading, assigning, etc.) of the variable must be carried out in the working memory, the priority is to copy the variable from the main memory to its own working memory space, and then operate on the variable. After the operation is completed, the variable can be written back to the main memory, instead of directly operating on the variable in the main memory. (The CPU actually only deals with register L1 cache, as explained in the analysis of the underlying implementation of the CPU.)
Working memory stores a copy of the data in the main memory, which is the private data area of each thread. Therefore, different threads cannot access each other’s working memory, and communication between threads must be realized through the main memory.
2.1 Working Memory
Each thread can only access its own working memory. Local variables in a thread are not visible to other threads. Even if two threads are executing the same piece of code, they will create local variables in their working memory that belong to the current thread. This will also include bytecode line number indicators and information about native methods.
Because worker threads are private data for each thread, there is no thread-safety issue.
According to the JVM virtual machine specification, main memory and working memory data storage type and operation mode, for the members of an instance object method, if the method contains the local variable is the basic data types (Boolean, byte, short, char, int, long, float, double), Store directly in the frame stack structure of working memory. But if the local variable is a reference type, the reference to the variable is stored in the frame stack of functional memory, and the two object instances are stored in main memory (shared data area, heap). However, a member variable of an instance object is stored in the heap, regardless of whether it is a primitive data type, wrapper type, or reference type. Information about static variables and the class itself is stored in main memory.
Note that instance variables in main memory can be shared by multiple threads. If two threads call the same method on the same object, both threads will make a copy of the data to their working memory, and then refresh to main memory after the operation is complete.
Take a look at the relationship between the JMM and specific hardware
In fact, there is no working memory distinction on hardware, only the concept of registers and levels of cache, so the JMM is not a real concept, just an abstract concept
2.2 JMM Synchronization Operations in 8
There is A shared variable X in the main memory of jia Shuan. Now thread A wants to change the value of X to 2, and thread B wants to read the value of X. Does thread B read the value of 2 after thread A updated it or 1 before it updated it? The answer is — both. This is because the working memory is the data area that is private to each thread. When thread A modifies variable A, it first copies A to the working memory area of thread A, and then operates on the variable, and then writes it back to main memory. And B reads the value in main memory at that instant is indeterminate.
The Java memory model defines eight operations to complete the details of how a variable is copied from main memory to working memory, and how working memory is synchronized to main memory.
(1) Lock: a variable that acts on main memory, marking a variable as a thread-exclusive state
(2) UNLOCK: a variable that acts on the main memory. It releases a locked variable so that it can be locked by other threads
(3) Read: a variable acting on main memory that transfers a variable value from main memory to the thread’s working memory for subsequent load action
(4) Load: a variable operating on working memory, which puts the value of the variable obtained by the read operation from main memory into a copy of the variable in working memory
(5) use: a variable applied to the working memory, passing the value of a variable in the working memory to the execution engine
Assign: a working memory variable that assigns a value received from the execution engine to a working memory variable. Store: a working memory variable that transfers the value of a working memory variable to main memory for subsequent write operations
(8) Write: a variable operating on working memory that transfers store operations from the value of a variable in working memory to a variable in main memory
To copy a variable from main memory to working memory, read and load operations need to be performed sequentially, and to synchronize a variable from working memory to main memory, store and write operations need to be performed sequentially. But the Java memory model only requires that these operations be performed sequentially, not sequentially.
Synchronization rule analysis:
1) It is not allowed for a thread to synchronize data from working memory to main memory without any assign operation
2) A new variable can only be created in main memory. It is not allowed to use an uninitialized variable in working memory. The use and store operations on a variable must be assigned and loaded.
3) A variable can be locked by only one thread at a time. However, a variable can be locked several times by the same thread. After multiple unlock operations are performed, the variable can be unlocked only after the same number of UNLOCK operations are performed. Lock and unlock must come in pairs.
4) If you lock a variable, the value of the variable will be emptied from the working memory, and the load or assign operation will be re-executed to initialize the variable before the execution engine can use it.
5) If a variable has not been locked by a lock operation, it is not allowed to perform unlock operation; It is also not allowed to unlock a variable that has been locked by another thread.
6) Before you can unlock a variable, you must first synchronize the variable to main memory (store and write)
Three basic problems of concurrent programming
Volatile and memory semantics
4.1 Memory semantics for volatile
- Ensure that volatile shared variables are visible to all threads. That is, changes made by one thread to a volatile variable are immediately perceived by other threads. The specific implementation is achieved through the cache consistency protocol, which is transparent to the programmer. The thinking model can be understood by referring to the MESI protocol.
- Disallow command reordering
4.2 Visibility of Volatile
Volatile guarantees only the visibility of variables between threads, not the atomicity of each thread’s operations on variables. To take a simple example, two threads simultaneously apply I ++ to a variable I. The i++ operation itself is not atomic, we can think of it as a store operation and a write operation. Although thread A’s write to I can be immediately sensed by thread B, it is uncertain whether thread B’s store is before or after thread A’s write. If thread B’s store is before thread A’s write and has been assigned back to the working memory, Then A’s write will still be overwritten by B’s write.
Thinking about? Can thread B override the write of thread A if thread B has just stored the value of I but no use? I think it is not possible. According to the agreement, B threads total store will listen at the same time when the bus thread A modification of the I, I send the message bus, showed his need to modify the value of time, I according to the agreement, will be in the right thread B I value as invalid, B this time want to operation, I will need in memory to read the latest I value.
4.3 Volatile Disables rearrangement optimization
First, we need to understand what instruction reordering is and when it occurs. A high-level language like Java, when run, is first compiled into bytecode, which the JVM reads into a.class file for interpretation and sends to the JIT compiler. The JIT compiler compiles bytecode to machine code.
Typically, javac compiles the program’s source code, converts it into Java bytecode, and the JVM interprets the bytecode into the corresponding machine instructions, reading them in and interpreting them. It is quite obvious that interpreted execution is bound to be slower than a runnable binary bytecode program. In order to improve the speed, JIT technology is introduced.
At execution the JIT saves the translated machine code for future use, so in theory it can be close to pure compilation.
When a CPU runs an instruction, it will be divided into different instruction cycles. We can simply abstract them as three cycles of value, decoding and execution (in fact, in order to improve the efficiency of pipeline, THE CPU will be divided into more cycles). Because the modern CPU adopts the pipelining operation mode, that is, the value of the first instruction is completed, when entering the decoding, another instruction can start to enter the stage of taking the finger. This time if the second instruction depend on the first instruction of the calculation results, so this time we say the related instructions, this time we have to wait for the next instruction pulled two machine cycles (CPU, of course, there may be some hardware level optimization, such as data bypass technology, however, affects the line throughput). To improve CPU efficiency, if the third instruction does not depend on the result of the first instruction, and if the second instruction is executed before it does not affect the final result, the compiler will have to help us reorder the instructions. This process is what we call order reordering. Essentially, it is to ensure the smooth flow of the CPU pipeline.
How does volatile prevent instruction reordering? A memory barrier is a CPU instruction that guarantees the order in which certain operations are executed and the memory visibility of certain variables, since both the compiler and the processor can perform instruction reordering optimizations. Inserting a Memory Barrier between instructions tells the compiler and processor that no instructions can reorder the memory-barrier instructions. In other words, inserting a Barrier prevents reordering optimizations before and after the Barrier. Another aspect of the Memory Barrier is that it forces various CPUS to flush out cached data, so that any thread on the CPU can read the latest version of the data. In short, volatile formally implements its in-memory semantics through memory barriers.
This is a classic singleton double detection code, this code in a single thread without any problems, but in a multi-threaded environment may be thread safety issues. The reason is that instance references may not be initialized if instance is read without null. Because instance = new DoubleCheckLock () can be divided into three steps
memory = allocate() // allocate the object memory space
instance(memory) //2. Initialize the object
instance = memory // set instance to the memory address you just allocated. = null
Copy the code
Since reordering may occur between step 2 and Step 3, as follows
memory = allocate() // allocate the object memory space
instance = memory // set instance to the memory address you just allocated. = null
instance(memory) //2. Initialize the object
Copy the code
Since there is no data dependency between step 2 and Step 3, the execution result will not change after the rearrangement in the case of a single thread, so this instruction rearrangement is allowed. Second, in the case of multi-threading, when a thread accesses instance without null, the instance instance may not have been initialized, thus causing the problem of thread safety. Therefore, if this problem is to be solved, instruction reordering optimization must be disabled.
private volatile static DoubleCheckLock instance
Copy the code
4.4 Implementation of volatile memory semantics.
To implement volatile memory semantics, the JMM limits compiler and processor reordering, respectively. The following is a table of rules for volatile reordering specified by the JMM for the compiler.
From the picture above, we can find that:
- 1. When the second operation is volatile, no matter what the first operation is, it cannot be reordered. This rule ensures that operations before volatile writes are not reordered by the compiler after volatile writes.
- 2. The first operation is volatile, and no matter what the second operation is, it cannot be reordered. This rule ensures that operations that follow volatile reads are not reordered by the compiler to pre-volatile.
To implement the memory semantics of volatile, when the bytecode is generated, the compiler inserts a memory barrier into the instructions to prevent a particular type of handler from reordering. For toilets, it is almost impossible to find an optimal arrangement that minimizes the total number of barriers inserted. To this end, the JMM takes a very conservative approach.
- Insert a StoreStore barrier before each volatile write
- Insert a StoreLoad barrier after each volatile write
- Insert a LoadLoad barrier in front of each volatile read
- Insert a LoadStore barrier after each volatile read