The hardware layer

Cache lock:

4 data states in the MESI cache Modified CPU Exclusive Shared Invaild Modified by other cpusCopy the code

However, some data that cannot be cached or data that has multiple cached rows across domains still requires bus locks

Cache line

The CPU reads the cache in the basic unit of the CacheLine, and most of the data currently implemented as 64 bytes of MESI is the entire CacheLine

For example, when two ints are not related to each other and are locked on the same cache line by two different cpus, they affect each other and lead to pseudo-sharing problemsCopy the code

The solution is to ensure that the data being executed is in a separate cache row by referring to the circular pointer before and after the completion of the disruptor

 public long p1, p2, p3, p4, p5, p6, p7; // cache line padding
  private volatile long cursor = INITIAL_CURSOR_VALUE;
  public long p8, p9, p10, p11, p12, p13, p14; // cache line padding
Copy the code

The processor is out of order

CPU cache structure

L3 is the CPU shared cache

L2 and L1 are each CPU’s own internal cache

In order to improve efficiency, the CPU will execute the following instructions during the execution of an instruction while ensuring that the instructions are not dependent on each other

Merge write write 1 write 2 write 3 and then write to L1 at the same time read 1 read 2 is not conflicting so it might read 2 then read 1Copy the code

How to keep order

The memory barrier

X86 CPU primitives Memory barrier instructions Instructions on both sides of the barrier cannot be rearranged

Sfence: The operation written before the sfence directive must complete before the operation written after the sfence directive. Read and write operations before the mfence directive must be completed before the read and write operations after the mfence directive, ensuring that all previous memory accesses are completed before subsequent memory accessesCopy the code

The Lock instruction

Memory read during instruction execution cannot be changed by other cpus

JMM model implementation in Java

The three principles

visibility

The data written by CPU1 is visible to CPU2, meaning that CPU2 knows that the corresponding data has been modified by CPU1 and needs to read the corresponding data from main memoryCopy the code

atomic

If CPU1 modiifies some data, other cpus are not allowed to modify the corresponding value, such as the locking mechanismCopy the code

sequential

When executing a statement, the CPU may optimize the corresponding instruction data. For example, the order of non-dependent instructions may be reordered to ensure that the execution order is not disrupted by the cache lock mechanism or memory barrierCopy the code

Main storage area (memory)

The main memory area is a common memory area that can be accessed by multiple cpus. When the CPU reads data from the main memory area, it reads a copy of the data from the main memory area to the private memory area, which is the working memory area

Working storage (memory)

When executing instructions, the CPU reads data from the main storage area to its own working storage area, and then performs read and write operations

The memory barrier

There are many ways to understand a memory barrier. I generally understand it as a signal to the CPU that the data on both sides cannot be scrambled to execute even non-dependent instructions

LoadLoad Dual read barrier Load1 // Read barrier Load2 // Read barrier The read operation ensures that load2 and subsequent loads cannot load data before load1 loads data StoreStore Double write barrier write1 // Write operation StoreStore // Write barrier write2 // Write operation Ensure that store1 will not be executed before writing Store2 and subsequent writes StoreLoad Write read barrier write1 // Write operations StoreLoad // Write read barrier Load1 // Read operations ensure that write1 can load data only after writing data. LoadStore Read/write barrier Load1 // Read operation LoadStore // Read/write barrier Write1 // Write operation ensures that write1 can write data only after load1 has finished loading dataCopy the code

volatile

Bytecode compilation is the access flag which is 0x0040 [volatile]

Volatile is implemented by the JVM

Volatile guarantees sequential and visible reads and writes but does not guarantee atomicity

The order of reads and writes is volatile, using memory barriers

. // The previous operation LoadLoad can be read only after the previous read operation is read. The previous read operation can be read only after the previous read operation is read. The previous read operation can be read only after the previous read operation is read. Ensure that the current read is finished before subsequent writes can execute...... // After the operationCopy the code

Volatile writes

. // The previous operation StoreStore can be written to volatile only after the previous write operation is complete. The other reads can be read to...... only after the previous write operation is complete // After the operationCopy the code

Guaranteed visibility

The MESI protocol allows you to implement a vision-guaranteed Lock instruction which means you put a Lock instruction in front of a write so that it's visible and it's going to flush the data into the main store as soon as it's written and the Lock instruction is a CPU instructionCopy the code

happens-before

Happens-before means that the result of the previous operation is visible to subsequent operations. It is a way of expressing the visibility of memory between multiple threads. So we can assume that in the JMM, if the result of one operation needs to be visible to another, there must be a happens-before relationship between the two operations. The two operations can be on the same thread or on different threads