Multithreading summary - Underlying principles

This paper summarizes relevant knowledge points from CPU instruction out of order execution, semantics of synchronized, volatile and operating system level to support multithreading.

Relevant concepts

The memory barrier

A set of processor instructions that implement restrictions on the order of memory operations. Operations before and after the barrier cannot be performed in disorder. There are several memory barriers in the JVM:

LoadLoad barrier

Load1, LoadLoad, Load2, ensure that the data to be read by Load1 is completed before the data to be read by Load2 and subsequent read operations are accessed

StoreStore barrier

For Store1, StoreStore, Store2 before Store2 is executed, make sure that writes to Store1 are visible to other processors

LoadStore barrier

For Load1, LoadStore, Store2 Before Store2 is executed, ensure that the data to be read by Load1 is completed

StoreLoad barrier

Store1, StoreLoad, Load2 Ensure that writes to Store1 are visible to all processors before Load2 and all subsequent reads are executed

Cache line

The minimum unit of storage that can be allocated in the CPU cache

A cache hit

Read data directly from the CPU cache

Cache row padding

Copy data in memory to the CPU cache

Writing a hit

When the processor writes the operand back to a cache, it checks to see if the cached memory address is in the cache line. If a valid cache line exists, the processor writes the operand back to the cache, not to memory

Atomic operation

An operation or series of operations that cannot be interrupted

Cache consistency protocol

To improve processing speed, the CPU does not directly communicate with the memory. Instead, the CPU reads data from the memory to the cache (L1,L2,L3) and then writes the data to the memory. However, with multiple processors, even if data is written to memory, the values in other processors’ caches remain old, causing dirty data problems. In this case, the cache consistency protocol is needed to ensure that the cache and system memory of each processor are consistent. Each processor checks to see if its cache is out of date by sniffing the data transmitted on the bus, and sets the processor’s cache to an invalid state when the memory address corresponding to its cache row has been changed. The MESI protocol is used on Intel 64-bit processors.

visibility

Changes made by one thread to a shared variable that another thread can see immediately are called visibility

In the single-core era, all threads are executed on a single CPU, and data consistency between CPU cache and memory is easy to solve. All threads operate on the same CPU’s cache, and one thread’s read and write to the cache must be visible to another thread. But in the multicore era, each CPU has its own cache, and when multiple threads are running on different cpus, those threads operate on different CPU caches.

atomic

An operation cannot be interrupted, even when multiple threads are working together. Once an operation is started, it cannot be disturbed by other threads.

Take count +=1 as an example. This statement is usually completed by three CPU instructions:

Instruction 1: Put variablescountA register loaded from memory to the CPU
Instruction 2: Perform the +1 operation in the register
Instruction 3: Finally write the result to memory (caching mechanism causes writing to CPU cache instead of memory)

Suppose initial state count=0, if two threads are executing the statement, thread A switches threads after instruction 1, thread B starts running, executes instructions 1, 2, 3, and writes result 1 to memory. Thread A then executes instruction 2, and the final result is 2.

Out-of-order execution

Compile level instruction rearrangement

At compile time, the compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program. Such as

a = 1;
b = 2;
Copy the code

These two lines of code have no dependencies and may be rearranged by the compiler at compile time.

CPU instruction rearrangement

In order to improve the efficiency of the program, the processor may optimize the input code. It does not ensure that the execution sequence of each statement in the program is the same as that in the code, but it will ensure that the final execution result of the program is the same as that of the code execution sequence. Instruction reordering does not affect the execution of a single thread, but does affect the correctness of concurrent execution of the thread

Solution to out-of-order execution

CPU level: Intel uses primitives (Mfence, lfence, sfence) or locks the bus
JVM layer: Memory barrier is used. Memory barrier is added before and after the operation on a part of memory. The operation before and after the barrier cannot be carried out out of order.

volatile

Volatile semantics

When one thread modifies the value of a shared variable that is volatile, the new value is always immediately known to other threads

For volatile variables, the CPU writes the row immediately back to system memory, invalidating the data cached by other cpus. When a volatile variable is read, the JMM invalidates the thread’s local memory, and the thread reads from main memory.
Disallow command reordering

Volatile is implemented in the JVM: Volatile variables are added to StoreStoreBarrier before writing, StoreLoadBarrier after writing, LoadLoadBarrier before reading, and LoadStoreBarrier after reading.

Implement singletons with volatile

There are several implementations of the singleton pattern in Java, one of which is the DCL(Double Check Lock) pattern, which involves the use of volatile keywords, as outlined here

public class Singleton {
    private volatile static Singleton instance;
    
    private Singleton(a) {}

    public static Singleton getInstance(a) {
        if (instance == null) {
            synchronized (Singleton.class) {
                if (instance == null) {
                    instance = newSingleton(); }}}returninstance; }}Copy the code

Volatile is used because instance = new Singleton(); In this step, there are actually three steps, and they are

Initialize objects. 3. Assign the address of the memory space to the referenceCopy the code

The CPU is running, an instruction reorder may occur, and step 3 May run before step 2. Suppose there are two threads, and if thread A executes into code instance = new Singleton(); If step 3 is run before step 2, the instance reference has a value even though the memory space is opened up. However, the corresponding object has not been initialized properly. If you call some properties of the object again, the method will report an error. Instance == null if (instance == null) is not null, it will not execute the following code, and it will get the current instance, which is not initialized.

The singleton pattern can be implemented in several ways, which will be summarized in a later article

Final domain memory semantics

Java uses final to denote immutable properties. The reordering rules for final fields are as follows:

The write operation

There is no reordering between a write to a final field within a constructor and a subsequent assignment of a reference to the constructed object to a reference variable. The compiler inserts a StoreStore barrier after the final field is written and before the constructor returns. This barrier prevents the processor from reordering the writes of final fields out of the constructor

A read operation

The JMM disallows reordering of the first read object reference and the first read of the final field that the object contains within a thread. The compiler inserts a LoadLoad barrier before reading final field operations. There is an indirect dependency between the first reading of an object reference and the first reading of the final field that the object contains. Since some compilers do not reorder these two operations, most processors will do the same. But there are a few processors that do this, and this rule is for a few processors.

This example is given in the book The Art of Concurrent Programming in Java, where one thread, A, executes writer() and another thread, B, executes Reader ().

public class FinalExample {
    int i ;
    final int j;
    static FinalExample obj;

    public FinalExample(a) {
        i = 1;
        j = 2;
    }

    public static void writer(a) {
        obj = new FinalExample();
    }

    public static void reader(a) {
        FinalExample object = obj;
        int a = object.i;
        intb = object.j; }}Copy the code

As explained earlier, for the statement obj = new FinalExample(); In fact, there should be three steps:

Allocating memory space
Initialize an object
Assigns the address of the memory space to the corresponding reference

CPU instruction reordering at execution time may cause step 2 to be executed after step 3. For the above code, obj is returned as an uninitialized object. The constructor is executed, but its properties have not been initialized, so if a thread B calls its Reader method, it will get the wrong properties. The final keyword disallows this behavior of the CPU, ensuring that its initialization is performed before the constructor returns.

synchronized

The synchronized keyword applies to normal methods, static methods, and code blocks, and locks corresponding slots at runtime.

With normal methods, the lock is the current instance, so the method is mutually exclusive only if called by the same instance
For static methods, the lock is for the current classClassObject, class level locks. Even if different instance objects are called from different threads, the effect is mutually exclusive
For synchronized code blocks, the lock is the object configured in sychronized parentheses

Normal method/static method

We define a class as follows:

package com.fred.javalib.thread;

public class EmptyClass {
    public synchronized void syncMethod(a) {}public synchronized static void syncStaticMethod(a) {}}Copy the code

View its bytecode file

You can see that a sychronized method will have one more attribute in the flags attribute of the method when compiledACC_SYNCHRONIZEDLogo. When a virtual machine accesses a method that has this identifier, it adds it at the appropriate locationmonitorenterandmonitorexitinstruction

Synchronized code block

Javap-verbose Singleton. Class javap-verbose Singleton. Javap-verbose Singleton.

You can see that in the bytecode above, there is amonitorenterAnd twomonitorexit. There aremonitorenterTo get the locks. Two of themmonitorexitThis is because one of them releases the lock after normal code execution and one of them releases the lock when code execution is abnormal.

Happens-before rules

A mechanism for ensuring lines and visibility so that the result of a previous operation is visible to subsequent operations.

1. The principle of sequence of procedures

In a thread, the previous action is happens-before any subsequent action, in procedural order.

2. Volatile variable rules

For writes to a volatile variable, happens-before is used for subsequent reads to that volatile variable.

3. The transitivity

If A happens-before B, and B happens-before C, then A happens-before C.

4. Lock rules

Happens-before the unlocking of a lock is happens-before the subsequent locking of the lock.

5. Thread start() rule

It means that after main thread A starts child thread B, child thread B can see what the main thread does before it starts child thread B.

6. Thread join() rule

The main thread A waits for the child thread B to complete (by calling the join() method of the child thread B). When the child thread B completes (the join method of the main thread A returns), the main thread can see the operation of the child thread. By “seeing,” we mean operating on shared variables.

Two methods of synchronization are provided at the operating system level

Semaphores and tubes are two methods of synchronization provided by the operating system. The concurrency of multiple threads leads to resource competition. Synchronization is to coordinate the access of multiple threads to the shared data. Only one thread can access the data at any time.

A semaphore

Semaphore is a method provided by the operating system to coordinate access to shared resources. Software synchronization is a synchronization negotiation mechanism between equal threads, and the semaphore is managed by the OS. The semaphore is used to represent a class of resources in the system, and the value of the semaphore is the number of a class of resources. Developed by Dijkstra in the 1960s, it was the main synchronization mechanism in early operating systems, but is rarely used today. A semaphore is an abstract data type:

It consists of an integer variable (SEM) and two atomic operations
P() operation (Dutch attempt to reduce)
Sem minus 1
If SEM < 0, enter wait, otherwise continue
V()
Sem + 1
If sem <=0, wake up a waiting process

Semaphores are protected integer variables. After initialization, they can only be modified by P and V operations, which are guaranteed by the operating system. PV operations are atomic operations

P may block, V does not block (V operation only frees resources, cannot block)
It is generally assumed that the semaphore is fair, that the process will not be blocked indefinitely in the P operation, and that the semaphore waits in a fifO queue.

Semaphores enable resource access to critical regions

mutex = new Semaphore(1); mutex->P(); Critical Section; Mutext ->V();Copy the code

The operation process is as follows:

Each resource class sets a semaphore with an initial value of 1
When the first thread comes in, the semaphore is 1, P is executed, and the semaphore decreases by 1;
Entry critical region
When the second thread comes in, the semaphore is 0, and then it performs P, the semaphore decreases by 1, and the semaphore is -1, and the second thread enters the waiting queue
The first thread enters the exit area with a semaphore of -1, so the operating system knows that one thread is still waiting. The V operation increments the semaphore by one while another waiting thread is invoked.

When using semaphores to achieve mutually exclusive access to critical sections, P and V operations must be used in pairs.

Conditional synchronization with semaphores

Based on the previous description of using semaphores to solve the problem of access to critical region resources, let’s look at how semaphores can be used to achieve conditional synchronization, using the producer-consumer model as an example. The main characteristics of the producer-consumer model are as follows:

Only one thread can operate the buffer at any time (mutually exclusive access)
When the buffer is empty, the consumer must wait for the production to start (condition synchronization)
When the buffer is full, the producer must wait for the consumer (condition synchronization)

Producers on the left, consumers on the right. Producers generate data and put it into the buffer, and consumers move it out of the buffer.

producers

Checks if there is space in the buffer
Atomic operations that produce data
FullBuffers performs V and + 1 has been used

consumers

Check whether fullBuffers have data
The atomic operation of removing data
EmptyBuffers perform V, free space + 1

Semaphore summary

Although semaphores can achieve mutually exclusive access and conditional synchronization in critical regions, their coding process is prone to error, and PV pairing must be kept in mind when used by developers. In the producer-consumer model, PV operations of semaphores are scattered between two different threads. In this case, PV operations are difficult to pair, and there is no way to avoid deadlock problems of semaphores.

Monitor (Monitor)

In order to solve the pain point of the semaphore, PV operations can be concentrated in one module in the tube, thus reducing the difficulty of implementation. So what is a pipe run? A pipe is a program structure used by multiple threads for mutually exclusive access to a shared resource.

The object – oriented method is adopted to simplify the synchronization control between threads
At most one thread is executing pipe code at any time
A thread in a pipe can temporarily abandon mutually exclusive access to the pipe and wait for it to resume when an event occurs. This is different from semaphores, which are intended to complete the critical section of code. (Think of wait method)

Composition of pipe side

The pipe side consists of the following two parts:

A lock that controls mutually exclusive access to pipe code
0 or more conditional variants that manage concurrent access to shared data.

The lock

In reference to the Java language, the lock is synchronized.

Condition variables,

The condition variable is the waiting mechanism in the pipeline
- A thread entering the pipe enters the wait state because the resource is occupied
- Each conditional variant represents a waiting cause and corresponds to a waiting pair
Wait()operation
- Blocks itself in a wait queue
- Wakes up a mutually exclusive access to a wait or release pipe
Signal()operation
- Wakes up a thread in the wait queue
- If the wait queue is empty, it is equivalent to an empty operation

In addition, the design ideas of multithreading, synchronized keyword, various locks, wait, notify and notifyAll methods in Object class are the same as those of pipe program, so the solution of concurrency problem in Java really depends on pipe program.

reference

www.cnblogs.com/xidongyu/p/…
Online operating system courses
The Art of Concurrent Programming in Java

Multithreading summary – Underlying principles