Read the following with purpose according to the knowledge points shown in the chart!!

Previously omitted a thousand words….


At this point, Huang was very nervous sitting in front of the interviewer, looking at the interviewer back and forth to turn his resume, ready to accept the baptism of the storm.


At this point, the interviewer raised his head, eyes blazing, looking at Xiao Huang, smiled.


Interviewer: Is the keyword volatile used in your usual projects?


Huang: Yes, in order to ensure the visibility of shared variables in a multithreaded processor environment.


Interviewer: Good. What do you think visibility is?


Huang: In multi-threaded situations, read and write occur in different threads, and the reader thread cannot read the latest value written by the writer thread in time.


Interviewer: Right, so how do you think the volatile keyword ensures thread visibility?


Huang: I think we need to understand the nature of visibility at the hardware level first. The core components of a computer are the CPU, memory, and I/O devices. However, there is a big difference in the processing speed of these three devices, but ultimately the overall computing efficiency depends on the slowest device. In order to balance the speed difference of these three devices and maximize the utilization of CPU to improve performance, a lot of optimization has been made in hardware, operating system and compiler.


  1. CPU added tell cache

  2. The operating system added processes, threads, and time slice switching to maximize CPU performance

  3. Compiler instructions are optimized to make better use of the CPU cache



Interviewer: What is CPU caching?


Huang: Because of the huge gap between the computer’s memory and the processor’s computing speed, modern computer systems use caches with read and write speeds as close as possible to the processor’s computing speed as a buffer between the memory and the processor: Copy the data needed for the operation to the cache, so that the operation can be carried out quickly. When the operation is finished, it is synchronized from the cache to memory.


Interviewer: Great, but isn’t there a problem with using the CPU to tell the cache?


Huang: Yes, it solves the speed contradiction between processor and memory very well, but it also introduces a new problem, cache consistency.


With caches, every CPU process goes like this: first, the data that the computer needs is cached in the CPU cache. When the CPU performs a calculation, the data is directly read from the cache and written to the cache. When the calculation is complete, the data from the cache is synchronized to memory.


In a multi-CPU system, each thread may run on a different CPU, and each thread has its own cache. The same data may be cached in multiple cpus. If different threads running on different cpus see different cache values for the same memory, cache inconsistency may occur.


Interviewer: Is there any solution?


Huang: 1. Bus lock. 2. Cache lock.


Interviewer: Explain what a bus lock is?


Huang: or I draw a picture, and there in the image below), but the bus lock overhead, so you need to optimization, the best way is to control lock granularity, we only need to make sure that, for the same by multiple CPU cache data is consistent with the line, so the introduction of the cache lock, his core mechanism is the cache consistency protocol.




Interviewer: What is the cache consistency protocol?


Huang: In order to achieve the consistency of data access, each processor needs to follow some protocols when accessing memory and operate according to the protocols when reading and writing. Common protocols include MSI, MESI,MOSI, etc. The most common one is MESI protocol.


MESI represents the four states of the cached row, which are:

M (modify) indicates that shared data is cached only in the current CPU cache and is in the modified state. That is, the cached data is inconsistent with the data in the main memory.

E (Exclusive) indicates the Exclusive state of the thread. Data is cached only in the current CPU cache and is not modified

S (Shared) indicates that data may be cached by multiple cpus and the data in each cache is the same as that in the main memory

I (Invalid) indicates that the cache is Invalid

In the MESI protocol, each Cache’s Cache controller not only knows its own read and write operations, but snoop on other caches’ read and write operations


For the MESI protocol, the following principles are followed from a CPU read and write perspective:

CPU read request: the cache in the M, E, and S states can be read. The CPU in the I state can also read data from main memory

CPU write request: The cache in the M or E state can be written. The cache in the S state can be written only after the cache in other cpus is invalidated


With bus and cache locks, CPU operations on memory can be abstracted to achieve cache consistency



Interviewer: Why do you need the voliate keyword when you can achieve consistency based on the cache consistency protocol or bus lock?


Huang :MESI optimization brings visibility problems: the MESI protocol can achieve cache consistency, but it also has some problems. That is, the state of the individual CPU cache rows is mediated by message passing. If CPU0 wants to write to a variable shared in the cache, it first needs to send an invalid message to the other CPUS that have cached the data. And wait for their confirmation receipt. CPU0 will be blocked during this time.


So to avoid the resource waste of blocking. Store Bufferes were introduced in the CPU.


CPU0 simply writes the shared data directly to the Store Bufferes and sends the invalidate message, and then continues processing the other instructions. When receiving an Invalidate Acknowledge message from all other cpus, store the data in store Bufferes to the cache line. Finally, the cache rows are synchronized to main memory.


But there are two problems with this optimization

1. It is uncertain when the data will be submitted, because it will wait for the response from other cpus before data synchronization. This is actually an asynchronous operation

2. After the introduction of StoreBufferes, the processor will first try to read values from StoreBuffer. If there is data in StoreBuffer, it will directly read from StoreBuffer; otherwise, it will read from the cache line


Take a look at this example:

    int value=0;Copy the codevoid exeToCPU0{Copy the code  value=10;Copy the code  isFinish=true;Copy the code}Copy the codevoid exeToCPU1{Copy the code  if(isFinish){Copy the code    assert value= =10;  Copy the code  }Copy the code}Copy the code

    ExeToCPU0 and exeToCPU1 are executed on two independent cpus.


    If CPU0’s cache line caches the shared variable isFinish, and the state is (E), Value may be in the (S) state.


    In this case, CPU0 will first write the instruction value=10 into storeBuffer. And notifies other cpus that cache the value variable. CPU0 continues to execute isFinish=true while waiting for other cpus to tell it. Because CPU0 caches isFinish and is in an Exclusive state, isFinish=true can be changed directly. CPU1 may initiate a read operation to read isFinish with a value of true, but value is not equal to 10.


    This can be thought of as out-of-order CPU execution, or as a reordering, which brings visibility problems.


    Interviewer: How do you solve the visibility problem caused by reordering?


    Huang: The CPU memory barrier is used to solve this problem. The memory barrier is to write instructions from Store Bufferes to memory so that other threads accessing the same shared memory are visible.


    X86 memory-barrier instructions include lfence, sfence, and mfence.


    A Store Memory Barrier tells the processor to synchronize all data stored in Store bufferes prior to the write Barrier to main Memory. In short, it makes the results of instructions prior to the write Barrier visible to reads or writes after the write Barrier.


    All read operations performed by the processor after the Load Memory Barrier are performed after the Load Memory Barrier. In conjunction with the write barrier, memory updates before the write barrier are visible to read operations after the read barrier.


    With a Full Memory Barrier in place, we can change the above example to avoid visibility problems by ensuring that the result of a pre-barrier read/write is committed to Memory before a post-barrier read/write is performed.


    In general, memory barriers can be used to ensure that shared data is visible in parallel execution by preventing out-of-order CPU access to memory, but how can this barrier be added? Going back to the beginning of our code for the volatile keyword, which generates a Lock assembly instruction that acts as a memory barrier.


    Interviewer: Speaking of volatile, can you tell me what JMM is?


    Huang: JMM is the Java memory model, because the root cause of visibility is caching and reordering, and JMM disables caching and reordering reasonably, so its core value is to solve the visibility and order.


    Interviewer: How does the JMM address visibility and order?


    Yellow: The JMM addresses concurrency issues by limiting compiler reordering based on cpu-level memory barrier instructions.


    The JMM provides several methods for disabling caching and reordering, such as volatile, synchronize (a separate article on synchronize will follow), and final;


    How does the JMM solve the sequential consistency problem

    Reordering problem

    To improve program performance, both the compiler and the processor reorder instructions, and processor reordering has been analyzed previously. Reordering is simply the order in which instructions are executed. Compiler reordering refers to the fact that after a program is compiled, instructions may cause reordering to optimize the performance of the program. There are three possible reorders from the source code to the instructions that are finally executed.


    2 and 3 are handler reorders. These reorders can cause visibility problems. Compiler reordering, the JMM provides for disabling certain types of compiler reordering.


    Handler reordering. The JMM requires the compiler to insert a memory barrier to prevent handler reordering when generating instructions, but not all programs will have reordering problems.


    The compiler and processor do not change the order in which two operations with data dependencies are executed, as in the following code,


      a=1; b=a; Copy the code

      Copy the code
      a=1; a=2;Copy the code

      Copy the code
      a=b; b=1;Copy the code


      In all three cases, changing the order of code execution in a single thread can result in inconsistent results, so reordering does not optimize instructions like this.

      This rule is also known as as-IF-serial. No matter how reordered, the result of execution cannot be changed for a single thread. Such as

        int a=2; / / 1Copy the code

        Copy the code
        int b=3; / / 2Copy the code

        Copy the code
        int rs=a*b; / / 3Copy the code

        1 and 3, 2 and 3 have data dependencies, so in the final instruction, 3 cannot be reordered before 1 and 2, otherwise the program will report an error. Since 1 and 2 do not have data dependencies, you can rearrange the order of 1 and 2.


        Memory barrier at JMM level


        In order to ensure memory visibility, the Java compiler inserts memory barriers at the appropriate locations to generate instruction sequences to prevent reordering of a particular type of processor. Memory barriers are classified into four categories in the JMM

        HappenBefore means that the result of an action is visible to HappenBefore, so it is an expression of visibility out of memory between threads.


        So we can assume that in the JMM, if the result of one operation requires another operation on the courseware, there must be a happens-before relationship between the two operations.


        These two operations can be on the same thread or on different threads. What are the methods in the JMM to establish a happen-before rule?


        Procedural order rule

        1. Every action in a thread happens before any subsequent action in that thread; It can be simply considered as as-if-serial. The sequence rule 1 happenns-before 2;

          3 happens-before 4



        2. For volatile variables, writes to volatile variables must happen before subsequent reads to volatile variables; 2 happens before 3 according to the volatile rule



        3. Transitivity rule, if 1 happens-before 2; 2 happens-before 3; Then the transitivity rule says: 1 happens-before 3;


        4. Start rule, if thread A performs the operation threadb.start (), then thread A’s threadb.start () operation happens-before any operation in ThreadB


        5. Join rule, if thread A performs the operation threadb.join () and returns successfully, then any operation in ThreadB happens-before thread A returns successfully from threadb.join ().

        6. The rules of the monitor lock, the unlocking of a lock, happens-before the subsequent locking of that lock

        If the initial value of x is 10, the value of x will change to 12 after thread A finishes executing the code block (the lock is automatically released after executing the code block). When thread B enters the code block, thread A will be able to see the write operation of thread A on X, i.e. Thread B will be able to see x==12.


        Interviewer (OS) :That good? I can’t afford it. Go back and wait for notice!


        Yellow heart ten thousand grass mud horse galloping past……

        Copyright: 2019-10-08, by TopJavaer Please specify the source juejin.cn/post/684490… , thank you.