Computer Principles
Modern CPU instruction speed far outpaces the speed of memory access, so modern computer models have introduced a cache with close to CPU access speed as a buffer between CPU and memory.
- Copies the data needed for the operation to memory
- Read the data from the cache to the CPU for calculation
- Save the CPU calculation results to the cache
- Writes the contents of the cache back to memory
Problems with computer physics models
Buffer consistency problem
While caches are a good solution to the problem of buffering between fast cpus and slow memory, at the same time, advances in technology tend to make updates more difficult, adding a higher level of complexity — the problem of buffer consistency.
In a system with multiple cpus, each CPU maintains its own buffer and shares a large chunk of main memory.
Inconsistent buffering occurs when computations involving multiple cpus are crossed.
Modern computers use write buffers to temporarily store data written from memory to the CPU buffer. Write buffers allow pipelined instructions to be executed without the CPU pausing to wait for slow loading into memory. At the same time, the use of batch processing, and merge buffer to the same block of memory area for many times to reduce the loss of resources.
As nice as it looks, the ability to combine multiple writes with an asynchronous idea actually presents a difficult problem, which is that the ORDER in which the CPU reads and writes to memory is not necessarily the same as the order in which the actual memory reads and writes.
False sharing
A Cache line is the smallest unit of the CPU’s Cache. Instead of fetching bytes, the CPU fetches rows one by one. Therefore, when multiple cpus modify different variables in the memory of the same CacheLine at the same time, it will undoubtedly affect each other’s performance.
This can be done by data padding, which is a space-for-time method where a single piece of data fills the entire cacheLine.
Java Memory Model (JMM)
Main memory The memory area shared by all threads Thread memory The thread’s own private memory area Main memory and thread memory interact with save and load operations All mentioned memory areas are not real, but virtual, just like the runtime data area of the virtual machine.
Problems with the JMM
Visibility problem
Threads A and B are both doing the same thing: reading variable A =0 from main memory to working memory, adding one and writing back to working memory. If a=1 is not flushed to main memory immediately after a has performed a++, the operation will not be visible to thread B.
In a multi-threaded environment, if a thread reads a shared variable for the first time, it first retrieves the variable from the main memory and then stores it in the working memory. Later, it only needs to read the variable in the working memory. Similarly, if the variable is modified, the new value is written to working memory and then flushed to main memory. It is not certain when the latest value will be flushed to main memory. It is generally very quick, but the exact time is unknown.
This problem can be solved by using the volatile keyword to modify variables
Competition issues
If threads a and B ideally execute sequentially, the result will be a=2. However, if threads A and B both copy a=0 into working memory and perform the ++ operation in their own working memory, then there will be two a=1 to flush into main memory.
This problem can be solved with synchronized blocks
reorder
Compilers and processors reorder instructions to make execution more efficient
- Compiler optimized reordering – reordering can be done without changing semantics
- Instruction level parallel reordering – Reordering can be done without data dependencies
- Reordering of memory systems – buffers make read and write instructions appear out of order
Data dependency
If two operations access the same variable and there is a write operation, then the two operations have data dependencies
as-if-serial
No matter how the program is reordered, the results of execution in a single thread cannot be changed.
So there is no reordering if there is a data dependency
Note that the data dependencies here are for a single processor or two operations within a single thread, not for multiple processors and threads.
Control dependence
For example
int a = 0;
boolean flag = false;
public void init(a) {
a = 1; / / 1
flag = true; / / 2
}
public void use(a) {
if (flag) { / / 3
int i = a * a; / / 4}}Copy the code
In the code above, operation 3 controls operation 4, and if the order of init() and user() changes, the result is wrong, which is a control dependency
Looking at the code, you can see that operations 1 and 2 have no dependencies under a single thread, so you can reorder them, and operations 3 and 4 have no dependencies, so you can reorder them.
When there is control dependence, the parallelism of instructions will be affected, so the compiler and processor will take the guessing way to overcome this, that is, the processor executing use can read the value of A in advance to calculate, and temporarily store the result in the reordering buffer. If the condition is true, it will be written from the buffer. This step actually does the reordering.
Because it doesn’t change the result in a single thread, reordering is allowed
But in multiple threads, thread A init, thread B use, and thread B 4 May not be visible to thread A’s 1 operation
Case: Operations 1 and 2 are reordered, which is allowed. Operations 3 and 4 are reordered, which is also allowed. Thread A flags true, thread B reads the value of A, and thread A does not write to A
So in multithreading, control reordering can cause inconsistencies
Memory barriers
The compiler inserts a memory barrier in place to prevent reordering of specific instructions when the sequence is generated.
No instructions can reorder Memory barriers and force spawning buffers into Memory
- LoadLoad Load1 LoadLoad Load2 ensures that Load1 operates before Load2 and after loadinstructions
- StoreStore Store1 StoreStore Store2 Ensure that the data stored in Store1 is visible after it is flushed to memory, before it is stored with other instructions
- LoadStore Load1 LoadStore Store1 ensures that Load1 operates before Store1 and after Store1
- StoreLoad Store1 StoreLoad Load1 After ensuring that all load and store instructions before the barrier have run, run the one behind the barrier
Happens-before ensures correct results under multiple threads
- The program sequence rules every action of a thread happens-before subsequent actions
- The monitor lock principle is happens-before a lock is unlocked
- The volatile variable rule happens before the write of a volatitle variable
- A. happens B. happens C. happens D. happens
- Start rule A starts thread B, so thread B’s start happens-before follows
- Join principle A performs thread B’s join method and returns successfully, then all operations of thread B are happens-before the return of thread Ajoin
- Happens-before Interrupts the checking code
Volatile,
Visibility For a volatile variable read, the most recent operation is always visible
Atomicity Read and write atoms are guaranteed for a volatile variable, but not for multiple
Memory semantics for volatile
Whenever a volatile variable is written, the JMM updates the shared variable of the thread’s local memory into main memory
Whenever a volatile variable is read, the JMM invalidates the shared variable in working memory and reads it again from main memory
Why does volatile not guarantee security
I++ have two threads at the same time for operation, when a thread to finish read operations, will, after I read into the CPU time slice, hand over control to other threads, because there is no written back to memory, so the buffer without failure, another thread to obtain the I, write to the cache, to obtain a CPU to calculate write memory I = 1, then back to the first thread, The first thread adds up I in the CPU, so again I =1
The rules of the volatile
When the second operation is volatile write, no matter what the first operation is, reorder cannot be performed. When the first operation is volatile read, no matter what the second operation is, reorder cannot be performed. When the first parent is volatile read and the second operation is volatile read, Cannot be reordered
Volatile write
StoreStore
Volatile write
StoreLoad
Volatile read
Volatile read
LoadLoad
LoadStore
synchronized
Memory semantics of locks
When the lock is released, the working memory’s shared variables are flushed to main memory
When the lock is acquired, shared variables in the working memory are set to invalid and shared variables are fetched from the main memory
Realize the principle of
Synchronized implementations in the JVM are based on entering and exiting Monitor objects to implement method synchronization and code block synchronization. Although the implementation details vary, they are implemented through pairs of MonitorEnter and MonitorExit directives.
For synchronized blocks, MonitorEnter inserts at the beginning of the synchronized block. When the code executes to MonitorEnter, it attempts to acquire ownership of the object Monitor, that is, the lock on the object, while monitorExit inserts at the end of the method and at exceptions. The JVM guarantees that each MonitorEnter must have a corresponding MonitorExit.
For synchronized methods, the decompile results show that the synchronized methods are not synchronized by monitorenter and Monitorexit. Instead, the constant pool of monitorenter contains the ACC_SYNCHRONIZED identifier. When a method is called, the calling instruction will check whether the ACC_SYNCHRONIZED access flag of the method is set. If so, the execution line will acquire monitor before executing the method body. Release Monitor after the method completes execution. During method execution, the same Monitor object is no longer available to any other thread
The state of the lock
- Unlocked state
- Biased locking
- Lightweight lock
- Weight of the lock
Locks can be upgraded but not degraded, to improve the efficiency of acquiring and releasing locks.
Biased locking
Biased lock acquisition
- Whether the Mark Word bias lock identifier is set to 1, whether the flag bit of the lock is 01, and whether the bias state is available
- If it is biased, test whether the thread ID points to the current thread, if so go to 5, not 3
- If the ID does not point to the current thread, the CAS attempts to set it to itself. If the CAS succeeds, go to 5; if the CAS fails, go to 4
- If CAS fails to obtain the lock, it indicates that there is a competition. After the safe point is reached, the thread holding the lock will be suspended for a while, and the biased lock will be upgraded to the light lock
- Execute sync code
Bias lock release
The thread does not actively release bias locks until it is safe to upgrade to light locks or release locks
Lightweight lock
The locking process of lightweight lock
If the synchronized object has no lock and cannot be biased, the virtual machine will establish a space of locking record in the stack frame of the current thread, which is used to store the copy of the current markWord of the lock object, officially called as the product of the bug. After the copy is successful, the VM uses the CAS operation to try to update the Mark Word of the object to a pointer to the Lock Record, and the owner pointer in the Lock Record to the Object Mark Word. If the update succeeds, go to Step 4; otherwise, go to Step 5. If the update succeeds, the thread owns the lock and the object’s Mark Word bit is set to 00, indicating that the object is in a lightweight locked state. If the update fails, the virtual machine first checks whether the object’s Mark Word points to the current thread’s stack frame. If it does, the current thread already owns the lock on the object and can simply enter the synchronized block to continue execution. Otherwise, it indicates that multiple threads compete for the lock. When the competing thread tries to occupy the lightweight lock and loses many times, the lightweight lock will expand to the heavyweight lock. The heavyweight thread points to the competing thread, and the competing thread will also block, waiting for the lightweight thread to release the lock and wake it up. The status value of the lock flag changes to “10”, and the Mark Word stores Pointers to heavyweight locks (mutex), followed by threads waiting for the lock to block.