1 source

  • Source: Java high Concurrency programming in detail multithreading and architecture design, by Wang Wenjun
  • Chapters: Chapters 12 and 13

This paper is a compilation of notes in two chapters.

2 CPUThe cache

2.1 Cache Model

All operation of the computer operation is done by the CPU, CPU instruction execution process need to involve data read and write operation, but can only access the data in the memory, CPU and the speed of memory and CPU speed is far wrong etc, have cache model, namely joined the buffer layer between the CPU and memory. The general modern CPU cache layer is divided into three levels, respectively called L1 cache, L2 cache and L3 cache. The schematic diagram is as follows:

  • L1Cache: Level 3 cache has the fastest access speed, but the smallest capacityL1Caches are also divided into data caches (L1d.dataInitial) and instruction cache (L1i.instructionThe first letter)
  • L2Cache: speed ratioL1Slow, but volume ratioL1Large in modern multicoreCPU,L2They are generally monopolized by mononuclear nuclei
  • L3Cache: The slowest of the three levels of cache, but the largest capacity, modernCPUThere are alsoL3It’s a multi-core shared design, likezen3Architecture design

The emergence of cache is to solve the poor efficiency of CPU to directly access memory, CPU for operation, will need to copy a data from main memory into the cache, because the cache access faster than memory, at the time of calculation need only read the cache and will update to the cache as a result, end of the operation and then flushed to main memory as a result, In this way, the calculation efficiency is greatly improved. The overall interaction diagram is briefly shown as follows:

2.2 Cache Consistency

Although the advent of caching greatly improved throughput, it also introduced a new problem, namely cache inconsistency. For example, the simplest i++ operation requires a copy of the memory data to the cache, the CPU reads the cache value and updates it, first writing the value to the cache, and then refreshing the new value to the memory after the operation is complete. The specific process is as follows:

  • Read from memoryiIn the cache
  • CPURead cacheiThe values in the
  • rightiAdd one
  • Write the result back to the cache
  • The data is then flushed to main memory

This I ++ operation is not a problem in a single thread, but in multithreading, because each thread has its own working memory (also called local memory, which is the thread’s own cache), the variable I has a copy in the local memory of multiple threads. If there are two threads performing the I ++ operation:

  • Let’s say two threads are A and B, and let’s sayiThe initial value is 0
  • Thread A reads from memoryiIs put into the cacheiThread B has a value of 0 in the cache
  • When the two threads increment at the same time, the cache of thread A and thread B,iAll of these values are 1
  • Two threads williWrite to main memoryiIs assigned to 1 twice
  • The end result isiA value of 1

This is a typical cache inconsistency problem. The main solutions are:

  • The bus lock
  • Cache consistency protocol

2.2.1 Bus locking

This is a pessimistic implementation, in which the processor issues a lock instruction that locks the bus, which, upon receiving the instruction, blocks requests from other processors until the processor holding the lock completes its operation. The characteristic is that only one processor that grabs the bus lock runs, but this way is inefficient. Once one processor gets the lock, other processors can only block and wait, which will affect the performance of multi-core processors.

2.2.2 Cache Consistency Protocol

The illustration is as follows:

The best known of the cache consistency protocols is the MESI protocol, which ensures that copies of shared variables used in each cache are consistent. The idea is that when the CPU operates on data in the cache and finds that the variable is a shared variable, it does the following:

  • Read: Do nothing but read data from the cache into a register
  • Write: sends a signal to inform othersCPUSet the variable’s cache line to an invalid state (Invalid), otherCPUTo read this variable, it needs to be fetched again from main memory

Specifically, MESI specifies that the cache line is marked with four states:

  • M:ModifiedTo be modified
  • E:Exclusive, exclusive
  • S:Shared, the Shared
  • I:Invalid, invalid

Detailed MESI implementations are beyond the scope of this article, and can be found here or here.

3 JMM

After looking at the CPU cache, the JMM, or Java memory model, specifies how the JVM works with the computer’s main memory. It also determines when writes to shared variables by one thread are visible to other threads. The JMM defines an abstract relationship between threads and main memory, as follows:

  • Shared variables are stored in main memory and can be accessed by each thread
  • Each thread has private working memory, or local memory
  • Working memory stores only the thread’s copy of the shared variable
  • Threads cannot operate on main memory directly. They cannot write to main memory until they have operated on working memory first
  • Working memory sumJMMThe memory model is also an abstract concept that doesn’t really exist, covering caches, registers, compile-time optimizations, and hardware

The schematic diagram is as follows:

Similar to MESI, if one thread changes a shared variable and flusher it to main memory, other threads reading from working memory discover that the cache is invalid and read from main memory to working memory again.

The following diagram shows the relationship between the JVM and computer hardware allocation:

Three features of concurrent programming

We’re halfway through the article and we’re not at Volatile? Before you wait, let’s take a look at three important features of concurrent programming that will help you understand volatile correctly.

4.1 atomic

Atomicity is that in one or more operations:

  • Or all operations are performed without interruption by any factor
  • Or none of the operations are performed

A typical example is A transfer between two people. For example, A transfers 1000 yuan to B. Then there are two basic operations:

  • 1000 yuan was deducted from A’s account
  • B’s account is increased by 1000 yuan

So either they both succeed or they both fail, so you can’t have A situation where you have 1000 deducted from account A but you have the same amount in account B, and you can’t have A situation where you have the same amount in account A and you have 1000 added to account B.

Note that two atomic operations combined are not necessarily atomic, such as i++. Essentially, i++ involves three operations:

  • get i
  • i+1
  • set i

All three operations are atomic, but taken together (i++) are not.

4.2 the visibility

Another important feature is visibility, which means that if one thread makes a change to a shared variable, other threads can immediately see the updated value.

A simple example is as follows:

public class Main {
    private int x = 0;
    private static final int MAX = 100000;
    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        Thread thread0 = new Thread(()->{
            while(m.x < MAX) { ++m.x; }}); Thread thread1 =new Thread(()->{
            while(m.x < MAX){
            }
            System.out.println("finish");
        });

        thread1.start();
        TimeUnit.MILLISECONDS.sleep(1); thread0.start(); }}Copy the code

Thread1 will always run, because thread1 reads x into the working memory and determines the value of the working memory. Since thread0 changes the value of the working memory, it is not visible to thread1, so it never prints finish.

4.3 order

Orderliness refers to the order in which code is executed. Due to JVM optimizations, the order in which code is written is not necessarily the order in which the code is run, as in the following four statements:

int x = 10;
int y = 0;
x++;
y = 20;
Copy the code

It is possible that y=20 is executed before x++, which is instruction reordering. In general, the processor in order to improve the efficiency of the program, is likely to enter the code instruction to do some optimization, not in strict accordance with the written order to execute the code, but can ensure that the final result is code and the expected results, of course, reorder also has certain rules, need to strictly abide by the data dependencies between instructions and not being able to sort, such as:

int x = 10;
int y = 0;
x++;
y = x+1;
Copy the code

Y =x+1 cannot be executed better than x++.

Reordering in a single thread does not change the expected value, but in multithreading, if orderliness is not guaranteed, it can be very problematic:

private boolean initialized = false;
private Context context;
public Context load(a){
    if(! initialized){ context = loadContext(); initialized =true;
    }
    return context;
}
Copy the code

Initialized =true sort before context=loadContext(), assuming that two threads A and B are accessing simultaneously and loadContext() takes some time, then:

  • After thread A passes the judgment, it sets the Boolean variable totrue, and then toloadContext()operation
  • Thread B because the Boolean variable is set totrue, will directly return an unfinishedcontext

5 volatile

So, finally, volatile. The goal of all this is to understand and understand volatile once and for all. This section is divided into four sections:

  • volatileThe semantics of the
  • How do you ensure order and visibility
  • Realize the principle of
  • Usage scenarios
  • withsynchronizedThe difference between

Let’s start with the semantics of volatile.

5.1 the semantic

Instance variables or class variables that are volatile have two levels of semantics:

  • This ensures visibility between threads operating on shared variables
  • Disallow reordering of instructions

5.2 How to ensure visibility and order

First the conclusion:

  • volatileVisibility is guaranteed
  • volatileOrder is guaranteed
  • volatileAtomicity is not guaranteed

The following are introduced respectively.

5.2.1 visibility

In Java, you can ensure visibility by:

  • volatile: When a variable isvolatileWhen decorating, read operations to a shared resource are performed directly in main memory (and, to be exact, into working memory, but must be re-read from main memory if other threads make changes). Write operations modify working memory first, but flush to main memory immediately after modification
  • synchronized:synchronizedIt also ensures visibility, ensures that only one thread at a time obtains the lock, then executes the synchronization method, and ensures that changes to variables are flushed to main memory before the lock is released
  • Use explicit locksLock:LockthelockMethod ensures that only one thread at a time can acquire the lock and then execute the synchronized method, and that changes to variables are flushed to main memory before the lock is released

To be specific, take a look at the previous example:

public class Main {
    private int x = 0;
    private static final int MAX = 100000;
    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        Thread thread0 = new Thread(()->{
            while(m.x < MAX) { ++m.x; }}); Thread thread1 =new Thread(()->{
            while(m.x < MAX){
            }
            System.out.println("finish");
        });

        thread1.start();
        TimeUnit.MILLISECONDS.sleep(1); thread0.start(); }}Copy the code

If x were volatile, it would not be visible to thread1, because the changes to x would be visible to thread1.

5.2.2 order

The JMM allows compile-time and processor reordering of instructions, which can be problematic in multi-threaded situations. Java also provides three mechanisms to ensure order:

  • volatile
  • synchronized
  • The explicit lockLock

Another thing to mention about orderliness is the happens-before principle. The happends-before principle states that if the order of execution of two operations cannot be deduced from this principle, then order is not guaranteed and the JVM or processor can reorder at will. The purpose of this is to maximize the parallelism of the program, as follows:

  • Sequence rule: Within a thread, code is executed in the order in which it is written, written after subsequent actions and written after previous actions
  • Locking rule: If a lock is lockedunlockThe operation must first occur on the same locklockoperation
  • volatileVariable rule: Write to a variable before read to it
  • Transfer rule: If operation A precedes operation B and operation B precedes operation C, then operation A precedes operation C
  • Thread start rule:ThreadThe object’sstart()Method takes place in advance of any action on the thread
  • Thread interrupt rule: Execute on threadinterrupt()Method is definitely better than catching an interrupt signal, in other words, if an interrupt signal is received, it must have been called before theninterrupt()
  • Thread termination rule: All operations in a thread must occur prior to thread termination detection, i.e., execution of a logical unit must occur prior to thread termination
  • Object finalization rule: the completion of an object initialization occurs firstfinalize()before

For volatile, instruction reordering is directly prohibited, but instructions that have no dependencies on volatile can be reordered at will. For example:

int x = 0;
int y = 1;
//private volatile int z;
z = 20;
x++;
y--;
Copy the code

There’s no requirement to define x or y before z=20, just that x is equal to 0 and y is equal to 1 when you do z=20, or x++ or y– just make sure they do it after z=20.

5.2.3 requires atomic

In Java, all reads and assignments to variables of primitive data types are atomic, as are reads and assignments to variables of reference types, but:

  • The assignment of one variable to another is not atomic because it involves reading a variable and writing a variable. The two atomic operations combined are not atomic operations
  • Multiple atomic operations together are not atomic operations, for examplei++
  • JMMOnly basic read and assignment operations are guaranteed atomicity, nothing else is guaranteed, if atomicity is required, it can be usedsynchronizedorLockOr,JUCAtomic operation class under the package

That is, volatile does not guarantee atomicity, as shown in the following example:

public class Main {
    private volatile int x = 0;
    private static final CountDownLatch latch = new CountDownLatch(10);

    public void inc(a) {
        ++x;
    }

    public static void main(String[] args) throws InterruptedException {
        Main m = new Main();
        IntStream.range(0.10).forEach(i -> {
            new Thread(() -> {
                for (int j = 0; j < 1000; j++) { m.inc(); } latch.countDown(); }).start(); }); latch.await(); System.out.println(m.x); }}Copy the code

The final output value of x will be less than 10000, and the result of each run will be different. As for the reason, we can start from two threads A and B, as shown below:

  • 0-t1: Thread A willxRead into working memory at this timex=0
  • t1-t2: thread A finishes the time slice,CPUSchedule thread B, and thread B willxRead into working memory at this timex=0
  • t2-t3: thread B in working memoryxPerform auto-increment and update to working memory
  • t3-t4: thread B finishes the time slice,CPUI’m scheduling thread A, and I’m scheduling thread A in working memoryxSince the increase
  • t4-t5: thread A writes the value in the working memory back to the main memoryx=1
  • t5After: thread A finishes the time slice,CPUSchedule thread B, thread B also writes its working memory back to main memory, again into main memoryxAssignment 1

In other words, with multithreading, there are two increments but actually only one change. It is also easy to change x to 10000 and synchronized:

new Thread(() -> {
    synchronized (m) {
        for (int j = 0; j < 1000; j++) {
            m.inc();
        }
    }
    latch.countDown();
}).start();
Copy the code

5.3 Implementation Principles

We already know that volatile guarantees orderliness and visibility. How does that work?

The answer is a lock; Prefix, which is actually equivalent to a memory barrier. The memory barrier provides the following guarantees for instruction execution:

  • Ensure that instruction reordering does not place the following code in front of the memory barrier
  • Ensure that instruction reordering does not place the code in front of it behind the memory barrier
  • Make sure that all the preceding code completes when the memory barrier modification instruction is executed
  • Forces value changes in thread working memory to be flushed to main memory
  • If it is a write operation, the cached data in the working memory of other threads will be invalidated

5.4 Application Scenarios

A typical use scenario is to use a switch to close a thread, as shown in the following example:

public class ThreadTest extends Thread{
    private volatile boolean started = true;

    @Override
    public void run(a) {
        while (started){
            
        }
    }

    public void shutdown(a){
        this.started = false; }}Copy the code

If the Boolean variable is not volatile, it is very likely that the new Boolean value will not flush into main memory and the thread will not terminate.

5.5 withsynchronizedThe difference between

  • Differences in use:volatileCan only be used to modify instance variables or class variables, but cannot be used to modify methods, method parameters, local variables, etc. Other variables that can be modified arenull. butsynchronizedCannot be used to modify variables, can only modify methods or blocks, andmonitorObject cannot benull
  • Assurance of atomicity:volatileThere’s no guarantee of atomicity, butsynchronizedCan guarantee
  • Guarantees of visibility:volatilewithsynchronizedAll guarantee visibility, butsynchronizedIs the use ofJVMinstructionmonitor enter/monitor exitPromised, inmonitor exit, all shared resources are flushed to main memory, andvolatileIs through thelock;Machine instructions implemented to force other threads working memory invalidation, need to load into main memory
  • 2. A guarantee of order:volatileTo prohibitJVMAnd the processor reorders it, andsynchronizedThe order of the guarantee is obtained by serial execution of the program, and insynchronizedCode in a code block can also undergo instruction reordering
  • Other differences:volatileDoes not block the thread, butsynchronizedwill