CPU level 3 cache architecture and JAVA memory model
CPU level 3 cache structure
The following uses a WINDOWS CPU (Intel(R) Core(TM) I5-8250 U CPU @ 1.60GHz) as an example
The CPU has 4 cores and 8 logic processors, and each logic processor has three levels of cache (L1, L2, L3 cache).
The CPU level 3 cache is abstracted as shown below
Given the following main function, how does the CPU execute it?
public static void main(String[] args) {
int a = 1;
a = a + 1;
System.out.println(a);
}
Copy the code
We all know that when the JVM runs, it stores our variables and related methods in our memory, and when the CPU runs, it must copy data into registers to run. So how does the CPU get the data?
If there is no int A in the level-1 cache, the CPU queries the level-2 cache, and then the level-2 cache. If there is no int A in the level-2 cache, the CPU connects to the memory through the BUS, obtains data from the memory, and loads it into the cache line of the CPU cache step by step. Finally, the register is computed and reassigned.
JAVA memory model
The JMM (JAVA Memory Model) is an abstraction based on these rules and specifications. It is a way to access variables in a program, including instance fields, static fields, and the elements that make up array objects.
Suppose the following code exists:
public static boolean flag = true;
public static void main(String[] args) {
Thread thread_A = new Thread(new Runnable() {
@Override
public void run(a) {
while (flag){}
System.out.println("Flag changed status, recognized by thread A"); }}); thread_A.start(); Thread thread_B =new Thread(new Runnable() {
@Override
public void run(a) {
if (flag) {
flag = false;
System.out.println("Flag has been modified"); }}}); thread_B.start(); }Copy the code
After the main function starts, threads A and B are also started.
Thread A is removed from memoryflagattributeread
Into the BUS memory space, and thenload
To the thread-private workspace and in thread Ause
. Wait because the while condition is not true.
Thread B is also removed from memoryflagattributeread
Into the BUS memory space, and thenload
Go to thread private workspace, change to false on thread B, and thenassign
To working memory, throughstore
Copy data to main memory for writingwrite
.
JMM data operation Eight atomic operation
- Lock: A variable that acts on main memory, marking a variable as a thread-exclusive state
- Unlock: A variable that operates on main memory. It releases a locked variable so that it can be locked by another thread
- Read: a variable acting on main memory that transfers a variable value from main memory to the thread’s working memory for subsequent load action
- Load: Variable acting on working memory, which puts the value of the variable obtained from main memory by the read operation into a copy of the variable in working memory
- Use: variable applied to working memory, passing the value of a variable in working memory to the execution engine
- Assign: A working memory variable that assigns a value received from the execution engine to the working memory variable
- Store: Variable applied to working memory that transfers the value of a variable in working memory to main memory for subsequent write operations
- Write: variable acting on working memory that transfers store operations from the value of a variable in working memory to a variable in main memory
Problems of visibility, atomicity and orderliness under concurrency
visibility
Visibility refers to whether when one thread changes the value of a shared variable, other threads can immediately know the changed value.
There are no visibility issues in serially executed code. Since we change the value of a variable in any operation, subsequent operations can read the value of the variable, and it is the new value that has been changed.
This is not necessarily the case in multi-threaded environments, where threads only manipulate data in private working memory according to the JMM memory model. For example: Thread_A does not notice the change in the flag attribute after Thread_B changes the flag attribute and rewrites it to the shared memory, because Thread_A has an effective flag in its private working memory. Thread_A does not know that another thread has changed its flag. That is, flags are not visible between threads.
How to resolve visibility Issues The volatile keyword guarantees visibility. When a shared variable is volatile, due to the CPU’s multi-level cache consistency protocol, Thread_B’s modified flag is immediately written to shared memory and Thread_A is notified to set its private working memory flag to invalid. Thread_A then overwrites the data from memory
public static volatile boolean flag = true;
Copy the code
atomic
Atomicity refers to the fact that an operation is uninterruptible, even in a multi-threaded environment, and once an operation is started it will not be affected by other threads.
Atomicity for read and write operations on primitive data types in Java (exception: on 32-bit systems, long and double are not atomicity. Since a 32-bit system reads and writes 32 bits at a time, it is atomic, but a long/double is 64-bit, so it is possible to read and write only one half, and the other half is read and written by another thread.
public static volatile int num = 0;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 20; i++) {
Thread thread = new Thread(new Runnable() {
@Override
public void run(a) {
for (int i = 0; i < 1000; i++) { num++; }}}); thread.start(); } Thread.sleep(1000);
System.out.println(num);
}
Copy the code
After volatile, sorting the visibility problem, num should in theory output 20000, but in practice we often get less than 20000.
Num ++ =num+1; this process is divided into two steps: read num and assign num. (num+1 == num+1 ==) (num+1 == num+1 ==) (num+1 ==) (num+1 ==) (num+1 ==) (num+1 ==) (num+1 ==) (num+1 ==) (num+1 ==) Num +1 == num = 1, num = 1, num = 1, num = 1, num = 1, num = 1, num = 1, num = 1, num = 1, num = 1, num = 1
Atomicity is achieved with synchronized or Lock, which ensures that only one thread executes a block of code at a time. There must be no atomicity problem.
synchronized (this){
num++;
}
Copy the code
order
For a single thread, we think of code execution as sequential code execution. But for multithreading, it can be out of order. Instructions may be rearranged after the program is compiled into machine code instructions, which may not be in the same order as the original instructions.
How to solve the ordering problem: Some “ordering” can be guaranteed with the volatile keyword, which prohibits instruction reordering.
Volatile
Introduction to the
Volatile is the lightest synchronization mechanism provided by the Java Virtual machine! When a Volatile variable is modified, it notifies all owning and using threads to update the latest data, preventing “dirty reads” of the data and ensuring visibility of the shared variable.
How to do that?
After Java code is translated into assembly code using HSDIS, volatile shared variables are written with the lock prefix. A reference to the IA-32 Architecture Software Developer’s Manual shows that the lock prefix instruction locks the bus, ensuring that changes to shared variables that are volatile are atomic.
The original text of the IA-32 Architecture Software Developer’s Manual
Since early processors received the lock prefix instruction, they always started the bus lock. So I’m going to introduce it laterCache lock[CPU Multi-level Cache Consistency Protocol (MESI)], if the CPU has cache consistency protocol andThe modified data takes up only one cache row, the cache lock is triggered.
Four States of CPU Multi-level cache consistency Protocol (MESI)
M: Indicates that the shared variable has been modified. E: exclusive state, indicating that the shared variable is used by only one thread. S: shared state, indicating that the shared variable is used by multiple threads. I: invalid state, indicating that the shared variable is invalid in the current cache line.
The workflow of Volatile
Suppose the following code exists
public static volatile boolean flag = true;
public static void main(String[] args) {
Thread thread_A = new Thread(new Runnable() {
@Override
public void run(a) {
while (flag){}
System.out.println("Flag changed status, recognized by thread A"); }}); thread_A.start(); Thread thread_B =new Thread(new Runnable() {
@Override
public void run(a) {
if (flag) {
flag = false;
System.out.println("Flag has been modified"); }}}); thread_B.start(); }Copy the code
The following figure shows the state and variation of flag variables on both cpusA little explanation:
- After the main function starts,Thread AAlso started later, it loads the flag variable from main memory into the cache space, since at this point
Thread B
It is not enabled, so the current flag state is E (exclusive state)while (flag)
Always true, soThread AEnter the loop state. Thread B
It then starts, loading a copy of the flag variable from main memory into the cache space. In this case, the flag is S (shared status), and theThread AThe working memory state is set to S.Thread B
Load the flag from the cache line into the register and modify the flagflag = false
And change the state of flag to M (modified state).- After changing the flag, the flag is written into the Store Buffer (CPU is empty and other threads are scheduled), and thread A is waiting for the response of “Invalid” message.
- 4 at the same time the BUS initiates the local write instruction,Thread AThe bus sniffing mechanism senses that
Thread B
After a local write instruction to flag, theThread AThe flag state in the cache line is set to I (invalid). And responds to the BUS with an “expired” message. Thread B
Sensed by the bus sniffing mechanismThread AIn response to the “invalid” message, the Store Buffer flag is written back to shared memory. The lastThread AObtain the flag again.
Now that you’ve read this, you should understand why shared variables that are volatile can guarantee visibility.
Volatile shared variables are atomic for read – modify – write operations. Volatile shared variables are atomic for read – modify – write operations.
Of course not! Let me give you an example. Again, this code…
public static volatile int num = 0;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 20; i++) {
Thread thread = new Thread(new Runnable() {
@Override
public void run(a) {
for (int i = 0; i < 1000; i++) { num++; }}}); thread.start(); } Thread.sleep(1000);
System.out.println(num);
}
Copy the code
According to the cache consistency protocol, when NUM ++ occurs, other cpus will receive a local write from the CPU that triggered the cache lock, set num in their own CPU cache to invalid, and re-read the shared memory. That doesn’t seem to be a problem.
However, there is a special case where a CPU (say: cpu_X) has already loaded NUM into the register before receiving a local write from the CPU that triggered the cache lock. The cache consistency protocol invalidates num in the cache, but does not invalidate objects in the register. At this point, the CPU that triggered the cache lock does num++, and cpu_X does num++ once. At this point, they get the same num and are written to memory, but the number of cycles is reduced by one. In this case, the total number of cycles of num++ is reduced by one, and the number of num is reduced.
Code word is not easy, to the elder brothers point a praise bai, intention upgrade I extremely grateful ~~~~