CPU layer caching
We know that when code is compiled it generates instructions that are executed in the CPU, and when the instructions are executed they read and write data from main memory. Because the execution speed of THE CPU is much faster than that of reading and writing data, the CPU designed a cache to copy the data in the main memory to the cache for operation. After the data changes, the data will be written to the main memory at a specific time. The diagram below:
This design improves the efficiency of instruction execution, but also brings the problem of data synchronization when multiple cpus (threads) cooperate.
For example, if CPU1 has the data in the main memory, and cpu2 does not write the data to the main memory, then cpu1 and CPU2 write the data to the main memory, so that one of the CPU’s calculations is invalid.
To solve the above problem, the CPU layer provides two solutions: bus lock and cache lock.
-
Bus locks are violent and inefficient. When a CPU acquiesces data from one main memory, it sends a LOCK# signal to the bus (the CPU and main memory communication channel) to prevent other cpus from reading main memory data.
-
Cache locks are based on the cache consistency protocol. When one CPU writes data back to main memory, caches that are used by other processors are invalidated.
CPU cache stuff
Cache Consistency Protocol (MESI)
MESI represents the four states of a cached line, which are:
- M(Modify) indicates that the shared data is cached only in the current CPU and is in the modified state.
- E(Exclusive) indicates that shared data is cached only in the current CPU and is not modified.
- S(Shared) indicates that Shared data is cached in multiple cpus, and the data in each cache is consistent with the data in the main memory.
- I(Invalid) indicates that the cache is Invalid.
In THE MESI protocol, each CPU not only knows its own read and write operations, but also listens (snoop) to other cpus for read and write operations (sniffing mechanism), following the following principles:
- CPU read request: The cache can be read in M, E, and S states. In I state, the CPU can only read data from main memory
- CPU write request: The cache can be written only when it is in the M or E state. For a write in the S state, the cache lines in other cpus must be set to invalid.
The memory barrier
MESI solves the problem of cache consistency, but it is still inefficient, such as receiving a message that all other cpus have set the data invalid before continuing with the current CPU operation. To solve the blocking during this time, store Bufferes were introduced.
Before writing to main memory, the CPU writes data directly to Store bufferes, sends an invalidate message, and then proceeds to process other instructions (without blocking), until it receives a message that all other cpus have set their caches to invalid. The data in Store Bufferes is then stored in the cache line, and finally synchronized from the cache line to main memory. However, the unordered execution of the CPU can cause visibility problems, thus introducing the concept of memory barriers.
Memory barriers include Load barriers and Store barriers. Based on their combination, there are four types of memory barriers:
-
LoadLoad barrier: For statements such as Load1; LoadLoad; Load2; Ensure that the Load1 read operation is earlier than Load2 and subsequent read operations.
-
StoreStore barrier: For statements such as Store1; StoreStore; Store2; Ensure that Store1 is written earlier than Store2 and subsequent writes.
-
LoadStore barrier: For statements such as Load1; LoadStore; Store2; Ensure that Load1 reads precede Store2 and subsequent writes.
-
StoreLoad barrier: For statements such as Store1; StoreLoad; Load2; Ensure that Store1 is written before Load2 and subsequent reads.
Note: The above writes and reads refer to reads from and writes to main storage
The internals of volatile
While volatile is known to guarantee visibility and disallow ordering, MESI and memory barriers are used internally.
Volatile is compiled to add a lock instruction in front of an instruction, initially with bus locks because they are too expensive, and later with cache locks.
- The Intel manual explains the prefix lock: to ensure that read-read-write operations to memory are performed atomically. In Pentium and pre-Pentium processors, instructions prefixed with lock locked the bus during execution, making it temporarily impossible for other processors to access memory through the bus. Obviously, this is expensive. Starting with the Pentium 4, Intel Xeon, and P6 processors, Intel has made a significant optimization of the bus lock: If the area of memory to be accessed is already locked in the processor’s internal cache during the execution of the lock prefix instruction (that is, the line containing the memory area is currently in an exclusive or modified state) and the memory area is entirely contained in a single cache line, The processor then executes the instruction directly. Since the cache line is locked for the duration of the instruction execution, other processors cannot read/write the memory area that the instruction accesses, thus ensuring the atomicity of the instruction execution. This operation is known as cache locking, which greatly reduces the execution overhead of the lock prefix instruction, but can still lock the bus when there is high contention between multiple processors or when the instruction accesses memory addresses that are not aligned.
- Disallows this instruction from reordering previous and subsequent read and write instructions.
- Flushes all data in the write buffer into memory.
Memory semantics of volatile
- Insert a LoadLoad barrier and a LoadStore barrier after volatile reads.
- Volatile writes are preceded by a StoreStore barrier and followed by a SotreLoad barrier.
Java Memory Model (JMM)
Relationship between the JMM and the hardware layer
Eight operations for the Java memory model
The Java memory model defines the following eight operations to accomplish the specific interaction protocol between main memory and working memory, i.e. how a variable is copied from main memory to working memory and how it is synchronized from working memory to main memory:
- Lock: A variable that acts on main memory to identify a variable as being occupied by a thread.
- Unlock: Acts on a main memory variable to release a locked variable so that it can be locked by another thread.
- Read: Acts on main memory variables to transfer a variable value from main memory to the thread’s working memory for subsequent load actions
- Load: a variable that acts on working memory, putting the value of the variable obtained from main memory by the read operation into the working memory copy of the variable.
- Use: a variable that acts on the working memory, passing the value of a variable in the working memory to the execution engine, which is performed whenever the virtual opportunity comes to a bytecode instruction that requires the value of the variable.
- Assign: a variable that acts on the working memory. It assigns a value received from the execution engine to the variable in the working memory. This operation is performed whenever the virtual opportunity comes to a bytecode instruction that assigns a value to the variable.
- Store: a variable that acts on working memory to transfer the value of a variable in working memory to main memory for subsequent write operations.
- Write: A variable that acts on main memory to transfer the store operation from the value of a variable in working memory to a variable in main memory.
The Java memory model also specifies that the following rules must be met when performing the eight basic operations described above:
- To copy a variable from main memory to working memory, the read and load operations are performed sequentially, and to synchronize the variable from working memory back to main memory, the store and write operations are performed sequentially. But the Java memory model only requires that the above operations be performed sequentially; there is no guarantee that they must be performed consecutively.
- One of the read and load, store, and write operations is not allowed separately
- A thread is not allowed to discard its most recent assign operation, that is, variables changed in working memory must be synchronized to main memory.
- A thread is not allowed to synchronize data from working memory back to main memory for no reason (no assign operation has occurred).
- A new variable can only be created in main memory. It is not allowed to use an uninitialized (load or assign) variable in working memory. The use and store operations must be performed before the assign and load operations can be performed.
- Only one thread can lock a variable at a time, but the lock operation can be repeated by the same thread many times. After multiple lock operations, the variable can be unlocked only by performing the unlock operation for the same number of times. Lock and unlock must be paired
- If you lock a variable, the value of the variable will be cleared from the working memory, and you will need to load or assign the variable again to initialize it before the execution engine can use it
- Unlock is not allowed if a variable has not been locked by the lock operation. It is also not allowed to unlock a variable that is locked by another thread.
- Before you can unlock a variable, you must synchronize it to main memory (store and write).
Atomicity, visible line, order these three properties.
- atomic
The operation cannot be interrupted, the operation is not finished, and you cannot switch to another thread.
Here is an example:
i = 2; j = i; i++; And I = I + 1;Copy the code
I = 2 has only one step, “assign to I,” which is atomic. J = I has two steps, “read the value of I” and “assign a value to j”, so it is not atomic. I ++ and I = I + 1 are equivalent and consist of three steps, “read value of I”, “add 1”, “assign value to I”, so they are not atomic.
- visibility
Changes to a visible variable are immediately synchronized to main memory, and when other threads need to read the variable, the new value is read from main memory. For ordinary realization, there will be problems in multi-threading: the initial value of I is 0, and two threads perform ++ operation on I. According to the above analysis, there are three steps for I ++. Thread 1 obtains I from main memory, +1 is synchronized to main memory, then switch to thread 2 to obtain I from main memory, and then +1
Thread thread1 = new Thread(new Runnable() { @Override public void run() { i++; }});Copy the code
Thread thread2 = new Thread(new Runnable() { @Override public void run() { i++; }});Copy the code
- order
The virtual machine allows the compiler and processor to reorder instructions, but adheres to data dependencies and does not change the order of execution of two operations that have data dependencies. This is also known as as-if-serial semantics, which makes developers feel “as if it were serial”. Another happens-before rule:
Specific rules:
- Rule of program order: Every action in a thread happens before any subsequent action in that thread.
- Monitor lock rule: The unlocking of a lock happens before the subsequent locking of the lock.
- The volatile variable rule: a write to a volatile field happens before any subsequent reads to that field.
- Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
- Start () rule: if thread A performs the operation threadb.start () (which starts ThreadB), then thread A’s threadb.start () action happens-before any action in ThreadB.
- The Join() rule: if thread A performs the operation threadb.join () and returns successfully, then any operation in ThreadB happens-before thread A returns successfully from the operation threadb.join ().
- Rule for program interrupts: The call to the interrupted() method happens-before the interrupt is detected by the thread’s code.
- The Object Finalize rule: The completion of an object’s initialization (completion of constructor execution) happens-before the beginning of its Finalize () method.
Thread 1 reordered, set isInit = true, uninitialized object, cut to thread 2, call mprinter.print (), null pointer.
Thread thread1 = new Thread(new Runnable() { @Override public void run() { mPrinter = new Printer(); isInit = true; }});Copy the code
Thread thread2 = new Thread(new Runnable() { @Override public void run() { if (isInit) { mPrinter.print(); }}});Copy the code