Underlying implementation principles of Java concurrency mechanism

This is the 10th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Java code is compiled into Java bytecode, which is loaded by the classloader into the JVM. The JVM executes the bytecode, which ultimately needs to be converted into assembly instructions for execution on the CPU. The concurrency mechanism used in Java depends on the IMPLEMENTATION of the JVM and the INSTRUCTIONS of the CPU

The CPU can directly operate its own cache, without the need to directly and frequently communicate with the main memory, which can ensure that the efficiency of CPU calculation is very high.

I. Volatile applications

Synchronized and Volatile both play important roles in multithreaded concurrent programming, and Volatile ensures the visibility of shared variables in multiprocessor development. When one thread modifies a shared variable, another thread can read the changed value. This article takes an in-depth look at how processors implement volatile at the hardware level.

Ii. The definition and implementation principle of volatile

The Java programming language allows threads to access a shared variable, and to ensure that the shared variable is updated accurately and consistently, threads should ensure that the variable is acquired separately through an exclusive lock. Volatile is provided, which in some cases is more convenient than locking. If a field is declared volatile, the Java thread memory model ensures that all threads see the variable with the same value.

CPU terminology and descriptions of how volatile is implemented. \

image.png

How volatile provides visibility is to get the compiler-generated assembly instructions to see what the CPU will do when it writes to volatile. \

image.png

Shared variables decorated with volatile variables write to a second line of assembly code, and instructions prefixed with Lock do two things on multicore processors. 1) The Lock prefix instruction causes the processor cache to write back to memory. The Lock prefix instruction causes the processor’s Lock signal to be spoken during the execution of the instruction. In a multiprocessor environment, the Lock signal ensures that the processor can monopolise any shared memory during the time the signal is spoken. The Lock signal usually does not Lock the bus, but locks the cache. A Lock signal is always said on the bus during a Lock operation, but not if the accessed memory region is already cached in processor memory. Instead, it locks the cache of that memory region and writes back to main memory, using a cache consistency mechanism to ensure atomicity. This operation is called cache locking. Cache consistency prevents simultaneous modification of data in an area of memory cached by more than two processors. Processors use the MESI (Modify, Exclusive, Share, invalid) control protocol to maintain consistency between internal caches and those of other processors. When operating on a multi-core processor system, the processor can sniff out other processors’ access to system memory and their internal caches. The processor uses sniffing techniques to keep its internal cache, system main memory, and other processors’ cached data consistent across the bus. If sniffing one processor detects that another processor intends to write to a memory address that is currently in a shared state, the sniffing processor invalidates its cache row and forces cache row padding the next time the same memory address is accessed. \

PNG Java memory model

To improve processing speed, the processor does not communicate directly with the memory, but reads data from the system memory to the internal cache (L1, L2, or other) before performing operations, but does not know when the operation will be written to the memory. If you write to a volatile variable, the JVM sends the processor an instruction prefixed with Lock to write the variable’s cached row back to system memory. But writing back to main memory can be problematic if other values in the processing cache are still old. So, on a multiprocessor, in order to ensure that each processing cache is consistent, can achieve cache coherence protocol, each processor by sniffing the spread of the data on the bus to check the value of the cache is expired, when dealing with found himself cache line corresponding to the memory address has been changed, and will be set for the current processor cache line in invalid state, When processing changes to this data, the data is read back into the processor cache from system main memory.

Load Writes a value from the main memory to the working memory. Use Reads data from the working memory to compute. Assign the calculated value to the working memory Assigns the store's past variable value to a variable in main memoryCopy the code

Memory semantics for volatile

Read/write to a volatile variable is atomic. If multiple volatile operations or compounds such as volatile++ are used, they are not atomic as a whole.

3.1. Volatile variables themselves have the following properties: 1) Visibility: A read of a volatile variable always shows the last write to it (by any thread); 2) Atomicity: Atomicity is used for read/write of any volatile variable, but volatile++ is not used. 3) Orderliness: instructions are reordered. In order to improve the efficiency of code execution, compilers and instructions sometimes reorder instructions according to certain rules, the happens-before principle. As long as it complies with the happens-before principle, it cannot be reordered

Memory semantics for volatile writes: When a volatile variable is written, the JMM flusher the value of the shared variable from the thread’s local memory to the main memory. \

image.png

After thread A writes flag variables, the values of the two shared variables updated by thread A in local memory A are flushed to the main memory. In this case, the values of shared variables in local memory A and main memory are the same.

Memory semantics for volatile reads: When a volatile variable is read, the JMM invalidates the thread’s local memory. The thread will next read the shared variable from main memory. \

image.png

After reading the flag variable, the value contained in the local memory B is set to invalid. In this case, thread B must read the shared variable from the main memory. The read operation of thread B will cause the values of the shared variable in the local memory B and the main memory to be the same.

Summary: 1) Thread A writes A volatile variable, which essentially means thread A sends A message to A thread that will read the volatile variable next. 2) Thread B reads a volatile variable, essentially receiving a message from a thread that made changes to the shared variable before writing to the volatile variable; 3) Thread A writes A volatile variable, and thread B reads the volatile variable. Thread A sends A message to thread B through main memory.

3.3. Implementation of volatile memory semantics

Reordering is divided into compiler reordering and processor reordering. To implement volatile memory semantics, the JMM restricts both types of reorder separately.

To implement the memory semantics of volatile, when the bytecode is generated, the compiler inserts a memory barrier into the instruction sequence to prevent a particular type of handler from reordering. Therefore, the JMM memory barrier insertion strategy is based on conservative strategy.

1) Insert a StoreStore barrier before each volatile write; Insert a StoreLoad barrier after each volatile write; Insert a LoadLoad barrier after each volatile read; 4) Insert a LoadStore barrier after each volatile read;

Schematic diagram of the instruction sequence generated by volatile write insertion into the memory barrier: \

image.png

The StoreStore barrier ensures that all prior common writes are visible to any processing song before volatile writes, because the StoreStore barrier guarantees that all common writes are flushed to main memory before volatile writes.

The StoreLoad barrier prevents the reordering of volatile writes from potentially volatile read/write operations. The JMM takes the conservative approach of inserting a StoreLoad barrier after each volatile write or in front of each volatile read.

Schematic diagram of the instruction sequence generated by volatile read insertion into the memory barrier: \

image.png

The LoadLoad barrier prevents the processor from reordering volatile reads above from normal reads below. The LoadStore barrier prevents the processor from reordering volatile reads above from normal writes below.

LoadLoad barrier: Load1; LoadLoad; Load2, to ensure that Load1 data is loaded before all loading instructions after Load2; StoreStore barrier: Store1; StoreStore; Store2, ensuring that data from Store1 is flushed back to main memory and visible to other cpus before Store2 and subsequent instructions; LoadStore barrier: Load1; LoadStore; Store2, ensuring that the Load1 instruction loads data before Store2 and subsequent instructions; StoreLoad barrier: Store1; StoreLoad; Load2, ensure that the data of Store1 instructions must be flushed back to main memory, visible to other cpus, before the data of Load2 and subsequent instructions are loaded;Copy the code

  <br />
Copy the code

Underlying implementation principles of Java concurrency mechanism

I. Volatile applications

Ii. The definition and implementation principle of volatile

Memory semantics for volatile

Related Posts

MongoDB learning notes (2) add, delete, change and check

JAVA- Part 10 -Spring-AOP and Transactions

K8S ecological weekly | Helm v3.1.0 officially released