The memory model

Java memory model

Many people confuse the Java Memory structure with the Java Memory Model, which stands for Java Memory Model (JMM).

In simple terms, the JMM defines a set of rules and guarantees for the visibility, ordering, and atomicity of shared data (member variables, arrays) read and written by multiple threads

1. The atomicity

So the question is, are two threads increment and decrement a static variable with an initial value of 0 5000 times each, and the result is 0?

1.1 Problem Analysis

These results can be positive, negative, or zero. Why is that? Because Java increments static variables, decrement is not an atomic operation.

For example, for I ++ (I is a static variable), it actually produces the following JVM bytecode instructions:

4: iadd // add 5: putStatic #2 I // Store the modified value into static variable I 8: returnCopy the code

And the corresponding I — is similar

4: isub // subtraction 5: putStatic #2 I // Save the modified value to static variable I 8: returnCopy the code

The Java memory model is as follows: To complete the increment of static variables, the increment and decrement need to exchange data between main memory and thread memory:

In the case of multiple threads, the data is not the same as it really is, because the instructions are not principled

If more than 8 lines of code are executed sequentially (without interleaving), there is no problem:

// Assume I starts at 0
getstatic   i // thread 1- get the value of static variable I
iconst_1      Thread 1- Prepare constant 1
iadd          // thread 1- increment thread I =1
putstatic i   // Thread 1- store the modified value into static variable I. Static variable I =1
getstatic i   // thread 1- get the value of static variable I
iconst_1      Thread 1- Prepare constant 1
isub          // thread 1- subtracting thread I =0
putstatic i    // Thread 1- store the modified value into static variable I. Static variable I =0
Copy the code

But in multithreading these eight lines of code can interlace (why? Think about it) : negative numbers:

// Assume I starts at 0
    getstatic i    // thread 1- get the value of static variable I
    getstatic i    // thread 2- get the value of static variable I
    iconst_1       Thread 1- Prepare constant 1
    iadd            // thread 1- increment thread I =1
    putstatic i     // Thread 1- store the modified value into static variable I. Static variable I =1
    iconst_1       Thread 2- subtracting thread I =-1
    putstatic i     Thread 2- Store the modified value into static variable I. Static variable I =-1
Copy the code

When a positive number occurs:

// Assume I starts at 0
    getstatic i      // thread 1- get the value of static variable I
    getstatic i      // thread 2- get the value of static variable I
    iconst_1         Thread 1- Prepare constant 1
    iadd             // thread 1- increment thread I =1
    iconst_1         Thread 2- Prepare constant 1
    isub             // Thread 2- subtracting thread I =-1
    putstatic i      Thread 2- Store the modified value into static variable I. Static variable I =-1
    putstatic i      // Thread 1- store the modified value into static variable I. Static variable I =1
Copy the code

1.2 Solutions

Synchronized

grammar

synchronized(object) {to be atomic operation code}Copy the code

Synchronized to solve concurrency problems:

static int i = 0;
static Object obj = new Object();
public static void main(String[] args) throws InterruptedException {
    Thread t1 = new Thread(() -> {
        for (int j = 0; j < 5000; j++) {
            synchronized(obj) { i++; }}}); Thread t2 =new Thread(() -> {
        for (int j = 0; j < 5000; j++) {
            synchronized(obj) { i--; }}}); t1.start(); t2.start(); t1.join();// Let the mian thread wait for t1 to finish executing before continuing
    t2.join();// Let the mian thread wait for t2 to finish executing before continuing
    System.out.println(i); 
}
Copy the code

How to think about it: You can think of OBj as a room, and threads T1 and T2 as two people.

When thread t1 executes to synchronized(obj), it is as if t1 entered the room and backhanded locked the door, executing count++ code inside the door.

If T2 is synchronized(obj), it finds that the door is locked and can only wait outside. T1 will unlock the door and exit obj only when it has finished executing the synchronized{} block. The T2 thread can then enter the OBJ room, lock the door, and execute its count– code.

Note: in the above example, t1 and T2 threads must lock the same OBJ object with synchronized if T1 locks an M1 pair

For example, T2 locks m2 objects, which is like two people entering two different rooms.

2. The visibility

2.1 Inescapable cycles

The main thread’s changes to the run variable are not visible to the T thread, so the T thread cannot stop:

static boolean run = true;
public static void main(String[] args) throws InterruptedException { 
    Thread t = new Thread(()->{ 
        while(run){ 
            / /...}}); t.start(); Thread.sleep(1000); 
    run = false; // Thread T does not stop as expected
}
Copy the code

Why is that? Analyze:

  1. In the initial state, thread T has just read the value of run from main memory into working memory.

  1. Because t threads frequently read run values from main memory, the JIT compiler caches run values high in its own working memory

In the speed cache, the access to run in main memory is reduced to improve efficiency

  1. After a second, the main thread changes the value of run and synchronizes it to main memory, while t is read from the cache in its own working memory

If you take the value of this variable, you’ll always get the old value

2.2 Solutions

Volatile (volatile keyword)

It can be used to modify member and static variables. It prevents a thread from looking up the value of a variable from its own working cache. It must fetch its value from main memory

Volatile guarantees visibility, but not atomicity

2.3 the visibility

This ensures that changes to volatile variables made by one thread are visible to another thread across multiple threads. This does not guarantee atomicity, but only for one writer thread and multiple readers.

    getstatic run // Thread T gets run true
    getstatic run // Thread T gets run true
    getstatic run // Thread T gets run true
    getstatic run // Thread T gets run true
    putstatic run // Thread main changes run to false, only once
    getstatic run // Thread T gets run false
Copy the code

Compare the thread-safe example we used earlier: two threads, one I ++ and one I –, are only guaranteed to see the latest values, not interleaving instructions

// Assume I starts at 0
getstatic i     // thread 1- get the value of static variable I
getstatic i     // thread 2- get the value of static variable I
iconst_1        Thread 1- Prepare constant 1
iadd            // thread 1- increment thread I =1
putstatic i     // Thread 1- store the modified value into static variable I. Static variable I =1
iconst_1        Thread 2- Prepare constant 1
isub            // Thread 2- subtracting thread I =-1
putstatic i     Thread 2- Store the modified value into static variable I. Static variable I =-1
Copy the code

Note:

Synchronized statement blocks can ensure both atomicity of code blocks and visibility of variables in code blocks. The disadvantage is that synchronized is a heavyweight operation with relatively lower performance

If you add system.out.println () to the loop in the previous example, thread T correctly sees changes to the run variable even without the volatile modifier. Think about why?

 public void println(int x) {
        synchronized (this) { print(x); newLine(); }}Copy the code

The println() method, which uses synchronized to synchronize the println stream, also prevents objects from fetching values from the cache, forcing the current thread to fetch values from main memory

3. The order

3.1 Weird results

    int num = 0;
    boolean ready = false;

    public void actor1(I_Result r) {
        if(ready) {
            r.r1 = num + num;
        } else {
            r.r1 = 1; }}public void actor2(I_Result r) {
        num = 2;
        ready = true;
    }
Copy the code

I_Result is an object with a property R1 that holds the result. How many possible results are there?

Case 1: Thread 1 executes first, ready = false, so the else branch is 1

Case 2: Thread 2 executes num = 2 first, but does not have time to execute ready = true, thread 1 executes, still enters the else branch, and results in 1

Case 3: Thread 2 executes until ready = true, thread 1 executes, which goes into the if branch and results in 4 (because num has already executed)

In case 4: 0, thread 2 executes ready = true, switches to thread 1, enters the if branch, adds up to 0, and cuts back to thread 2

num = 2

This phenomenon is called instruction reordering

3.2 The solution

A volatile variable that disables instruction reordering

/@JCStressTest
@Outcome(id = {"1"."4"}, expect = Expect.ACCEPTABLE, desc = "ok")
@Outcome(id = "0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "!!!!")
@State
public class ConcurrencyTest {

    int num = 0;
    volatile boolean ready = false;
    @Actor
    public void actor1(I_Result r) {
        if(ready) {
            r.r1 = num + num;
        } else {
            r.r1 = 1; }}@Actor
    public void actor2(I_Result r) {
        num = 2;
        ready = true; }}Copy the code

3.3 Understanding of order

The JVM can adjust the order in which statements are executed without compromising correctness. Consider the following code

static int i; 
static int j;  // Perform the following assignment in a threadi = ... ;// A time-consuming operationj = ... ;Copy the code

As you can see, it makes no difference whether I or j is executed first. So, when the code above is actually executed, it can be either

i=... ;// A time-consuming operationj=... ;Copy the code

It can also be

j=... ; i=... ;// A time-consuming operation
Copy the code

This feature is called “instruction rearrangement”. In multi-threading, “instruction rearrangement” will affect the correctness, such as the singleton realization of the famous double-checked locking mode

public final class Singleton { 
    private Singleton(a) {}private static Singleton INSTANCE = null; 
    public static Singleton getInstance(a) {
        // Enter the inner synchronized block only if the instance is not created
        if (INSTANCE == null) { 
            synchronized (Singleton.class) { 
                // Maybe another thread has already created the instance, so check again
                if (INSTANCE == null) {
                    INSTANCE = newSingleton(); }}}returnINSTANCE; }}Copy the code

The above implementation features are:

Lazy instantiation

Synchronized is used only when getInstance() is used for the first time

INSTANCE = new Singleton() = new Singleton()

0: new #2 // class cn/itcast/jvm/t4/Singleton
3: dup
4: invokespecial #3 // Method "<init>":()V
7: putstatic #4 // Field
INSTANCE:Lcn/itcast/jvm/t4/Singleton;
Copy the code

The sequence of steps 4 and 7 is not fixed. Perhaps the JVM will optimize to assign the reference address to the INSTANCE variable before executing the constructor. If two threads t1 and T2 execute in the following time sequence:

Time 1 t1 thread to INSTANCE = new Singleton(); At time 2 the T1 thread allocates space to generate a reference address for the Singleton object (at 0). At time 3 the T1 thread assigns the reference address to INSTANCE. = null (7 places) time 4 T2 thread enters getInstance() method and finds INSTANCE! T1 thread executes Singleton constructor (4 places)Copy the code

T1 has not yet fully executed the constructor, and if there is a lot of initialization going on in the constructor, t2 will have an uninitialized singleton

It is possible to disable instruction reordering by using volatile on INSTANCE, but note that volatile is effective in JDK 5 and older

3.4 happens-before

Happens-before, which specifies which writes are visible to the reads of other threads, is a set of rules for visibility and order,

Regardless of the happens-before rule, the JMM does not guarantee that one thread writes to a shared variable and another thread writes to that shared variable

The amount of reading is visible

  • Writes to a variable before m is unlocked by a thread are visible to subsequent reads of m by other threads
static int x;
static Object m = new Object(); 
new Thread(()->{ 
    synchronized(m) {
        x = 10; }},"t1").start();

new Thread(()->{ 
    synchronized(m) { System.out.println(x); }},"t2").start();
Copy the code
  • Writes by a thread to a volatile variable are visible to subsequent reads of that variable by other threads
volatile static int x; 
new Thread(()->{ 
    x = 10; 
},"t1").start(); 

new Thread(()->{ 
    System.out.println(x);
},"t2").start();
Copy the code
  • Write to a variable before the thread starts, and read to the variable after the thread starts
static int x; 
x = 10; 

new Thread(()->{ 
    System.out.println(x);
},"t2").start();
Copy the code
  • A write to a variable before the thread terminates is visible to a read after other threads know it has ended (for example, other threads wait for it to end by calling T1.isalive () or t1.join())).
static int x;
Thread t1 = new Thread(()->{ 
    x = 10; 
},"t1"); 
t1.start();
t1.join(); 
System.out.println(x);
Copy the code
  • Writing variables before t1 interrupts T2 (interrupt) is visible for reading variables after other threads know T2 has been interrupted (via T2.interrupted or t2.isinterrupted).
static int x;
public static void main(String[] args) {
    Thread t2 = new Thread(()->{
        while(true) { 
            if(Thread.currentThread().isInterrupted()) {
                System.out.println(x); 
                break; }}},"t2");
    t2.start();
    
    new Thread(()->{
        try {
            Thread.sleep(1000);
            } 
        catch (InterruptedException e) {
            e.printStackTrace(); 
        }
        x = 10;
        t2.interrupt(); 
    },"t1").start(); 
    
    while(! t2.isInterrupted()) { Thread.yield(); } System.out.println(x);Copy the code
  • Writes to the default value of a variable (0, false, null) are visible to other threads reading the variable

  • It’s transitive, if x Hb -> y and y Hb -> z then x Hb -> z

Variables are either member variables or static member variables

4. CAS With atomic classes

4.1 the CAS

CAS stands for Compare and Swap, which embodies the idea of optimistic locking, such as multiple threads performing +1 operations on a shared integer variable:

// Keep trying
while(true) {
    intOld value = shared variable;// Get the current value 0
    intResult = old value +1; // Add 1 to the old value of 0
    
    /* If another thread changes the shared variable to 5, this thread's correct result 1 will be invalidated, and compareAndSwap will return false and try again until: CompareAndSwap returns true, which means that no other thread is interfering with */ while this thread is making changes
    if(compareAndSwap (old value, result)) {// Write the result to shared memory and compare the old value with the old value of the shared variable. If not, return false
        // On success, exit the loop}}Copy the code

When obtaining a shared variable, use volatile to ensure that the variable is visible. The combination of CAS and volatile enables lock-free concurrency, which is suitable for scenarios with low competition and multi-core cpus.

  • Because synchronized is not used, threads don’t get blocked, which is one of the efficiency gains

  • However, if the competition is intense, you can imagine that retries will occur frequently and efficiency will suffer

The CAS layer relies on an Unsafe class to call the CAS directive directly from the underlying operating system. Here’s how to use the Unsafe object directly

An example of line thread safety protection

4.2 Optimistic locks and pessimistic locks

  • CAS is based on the idea of optimistic locking: at best, it doesn’t matter if another thread changes the shared variable, but I’ll try again if I’m at a disadvantage.

  • Synchronized is based on the idea of pessimistic lock: in the most pessimistic estimation, you have to prevent other threads from modifying the shared variable, you can’t change the lock after I change the lock, you can only have a chance.

4.3 Atomic operation classes

Juc (java.util.concurrent) provides atomic operation classes that provide thread-safe operations such as AtomicInteger,

AtomicBoolean, etc., is implemented by CAS technology + volatile at the bottom.

AtomicInteger can be used to rewrite the previous example:

// Create an atomic integer object
private static AtomicInteger i = new AtomicInteger(0); 
public static void main(String[] args) throws InterruptedException { 
    Thread t1 = new Thread(() -> {
        for (int j = 0; j < 5000; j++) {
            i.getAndIncrement(); // get and increment i++
            // i.incrementAndGet(); // Increment and get ++ I}}); Thread t2 =new Thread(() -> { 
        for (int j = 0; j < 5000; j++) {
            i.getAndDecrement(); // Get and decrement I --}}); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println(i); }Copy the code

5. Synchronized optimizations

In the Java HotSpot VIRTUAL machine, each object has an object header (including a class pointer and Mark Word). Mark Word usually stores the hash code and generation age of this object. When locking, these information will be replaced with marker bits, thread lock record Pointers, heavyweight lock Pointers, thread IDS, etc

5.1 Lightweight Lock

If an object has multithreaded access, but the multithreaded access times are staggered (that is, there is no contention), then lightweight locking can be optimized. It’s like:

Student (Thread A) uses the book to occupy A seat, attends half of the class, leaves the room (CPU time is up), returns, finds that the book has not changed, indicating that there is no competition, and continues his class.

If another student (thread B) arrives during this time, thread A is notified of concurrent access, and thread A then upgrades to A heavyweight lock and enters the heavyweight lock process.

The heavyweight lock is not as simple as using A textbook to hold A seat. You can imagine thread A putting an iron fence around the seat before leaving

Suppose two methods synchronize blocks, using the same object to lock them

static Object obj = new Object();
public static void method1(a) {
    synchronized( obj ) { 
        / / the synchronized block
        A method2(a); }}public static void method2(a) {
    synchronized( obj ) {
        // synchronize block B}}Copy the code

Each thread’s stack frame contains a lock record structure, which can store the locked object’s Mark Word (8 bytes).

thread 1 object Mark Word thread 2
Access sync block A and copy Mark to thread 1’s lock record 01 (Unlocked)
CAS changed Mark to thread 1 lock record address 01 (Unlocked)
Success (lock) 00 (Light lock) Thread 1 lock record address
Execute synchronization block A 00 (Light lock) Thread 1 lock record address
Access sync block B and copy Mark to thread 1’s lock record 00 (Light lock) Thread 1 lock record address
CAS changed Mark to thread 1 lock record address 00 (Light lock) Thread 1 lock record address
Fail (discover it is your own lock) 00 (Light lock) Thread 1 lock record address
Lock the reentrant 00 (Light lock) Thread 1 lock record address
Execute synchronization block B 00 (Light lock) Thread 1 lock record address
Synchronization block B is complete 00 (Light lock) Thread 1 lock record address
Synchronization block A is complete 00 (Light lock) Thread 1 lock record address
Success (Unlocked) 01 (Unlocked)
01 (Unlocked) Access sync block A and copy Mark to thread 2’s lock record
01 (Unlocked) CAS changed Mark to thread 2 lock record address
00 (Light lock) Thread 2 lock record address Success (lock)
. .

5.2 lock expansion

If the CAS operation fails during an attempt to add a lightweight lock, there is a case where another thread has added a lightweight lock to the object (in contention), and lock inflation is required to change the lightweight lock to a heavyweight lock

static Object obj = new Object(); 
public static void method1(a) {
    synchronized( obj ) { 
        / / the synchronized block}}Copy the code
Thread 1 object Mark thread 2
Access the sync block and copy Mark to thread 1’s lock record 01 (Unlocked)
CAS changed Mark to thread 1 lock record address 01 (Unlocked)
Success (lock) 00 (Light lock) Thread 1 lock record address
Executing synchronized blocks 00 (Light lock) Thread 1 lock record address
Executing synchronized blocks 00 (Light lock) Thread 1 lock record address Access the sync block and copy Mark to thread 2
Executing synchronized blocks 00 (Light lock) Thread 1 lock record address CAS changed Mark to thread 2 lock record address
Executing synchronized blocks 00 (Light lock) Thread 1 lock record address Failure (finding someone else has taken the lock)
Executing synchronized blocks 00 (Light lock) Thread 1 lock record address CAS changed Mark to weight lock
Executing synchronized blocks 10 (Weight lock) Weight lock pointer jam
completed 10 (Weight lock) Weight lock pointer jam
Failed (unlock) 10 (Weight lock) Weight lock pointer jam
Release weight locks, evoking blocking thread contention 01 (Unlocked) jam
10 (Weight lock) Competitive weight lock
10 (Weight lock) Success (lock)
. .

5.3 Weight of the lock

Spin can also be used to optimize for heavyweight lock contention, so that if the current thread spins successfully (i.e. the thread holding the lock has exited the block and released the lock), then the current thread can avoid blocking.

After Java 6, spin-locking is adaptive. For example, if an object has just performed a successful spin operation, it will spin more times, assuming that the probability of successful spin operation is high. Otherwise, they spin less or they don’t spin at all, but they’re smarter.

  • Spin takes up CPU time. Single-core CPU spin is wasteful. Multi-core CPU spin takes advantage.

  • For example, if a car stalls at a red light, it is equivalent to spinning (a short wait is a good deal) and blocking (a long wait is a good deal).

  • After Java 7, you cannot control whether spin is enabled or not. Spin retries are successful

Spin retry successful

thread 1 (CPU 1On) object Mark thread 2(* * * * 2 CPUOn)
10 (Weight lock)
Access the synchronization block to obtain monitor 10 (Weight lock) Weight lock pointer
Success (lock) 10 (Weight lock) Weight lock pointer
Executing synchronized blocks 10 (Weight lock) Weight lock pointer
Executing synchronized blocks 10 (Weight lock) Weight lock pointer Access the synchronization block to obtain monitor
Executing synchronized blocks 10 (Weight lock) Weight lock pointer Spin retry
completed 10 (Weight lock) Weight lock pointer Spin retry
Success (Unlocked) 01 (Unlocked) Spin retry
10 (Weight lock) Weight lock pointer Success (lock)
10 (Weight lock) Weight lock pointer Executing synchronized blocks
. .

Spin retry failure

thread 1 (CPU 1On) object Mark thread 2(* * * * 2 CPUOn)
10 (Weight lock)
Access the synchronization block to obtain monitor 10 (Weight lock) Weight lock pointer
Success (lock) 10 (Weight lock) Weight lock pointer
Executing synchronized blocks 10 (Weight lock) Weight lock pointer
Executing synchronized blocks 10 (Weight lock) Weight lock pointer Access the synchronization block to obtain monitor
Executing synchronized blocks 10 (Weight lock) Weight lock pointer Spin retry
completed 10 (Weight lock) Weight lock pointer Spin retry
Success (Unlocked) 10 (Weight lock) Weight lock pointer Spin retry
10 (Weight lock) Weight lock pointer blocking
10 (Weight lock) Weight lock pointer Executing synchronized blocks
. .

5.4 Biased locking

Lightweight locks without contention (on their own thread) still need to perform the CAS operation each time they reenter. Java 6 introduced biased locking to further optimize: only the first time the CAS is used to set the thread ID to the Mark Word header of the object, then if the thread ID is found to be its own, there is no contention and no need to re-cas.

  • Undo bias requires upgrading the thread holding the lock to a lightweight lock, in which all threads are paused (STW)

  • If hashCode is accessed, the bias lock will be unlocked. If hashCode is accessed, the bias lock will be unlocked. If hashCode is accessed, the bias lock will be unlocked. Set hashCode back to the object header to access it.)

  • If the object is accessed by multiple threads, but there is no contest, then the object biased to Thread T1 can still be biased to Thread T2 again. Rebiased resets the Thread ID of the object

  • Undo bias and rebias are done in batches, in units of class

  • If undo bias reaches a certain threshold, all objects of the entire class become unbiased

  • You can actively use -xx: -usebiasedlocking to disable biased locking

Suppose two methods synchronize blocks, using the same object to lock them

static Object obj = new Object();
public static void method1(a) {
    synchronized( obj ) { 
        / / the synchronized block
        A method2(a); }}public static void method2(a) {
    synchronized( obj ) {
        // synchronize block B}}Copy the code
thread 1 object Mark
Access sync block A to check if there is A thread ID in Mark 101 (No lock can be biased)
Try adding bias locking 101 (no lockable bias) object hashCode
Success (lock) 101 (no lockable bias) thread ID
Executing synchronized blocks 101 (no lockable bias) thread ID
Access sync block B and check if there is a thread ID in Mark 101 (no lockable bias) thread ID
Is its own thread ID, the lock is its own, do not need to do more 1101 (no lock bias) thread ID
Execute synchronization block B 101 (no lockable bias) thread ID
completed 101 (no lockable bias) object hashCode

5.3 Other Optimizations

1. Reduce the lock time

Synchronize code blocks as short as possible

2. Reduce the granularity of locks

Splitting a lock into multiple locks improves concurrency, for example:

  • ConcurrentHashMap

  • LongAdder is divided into base and cells. If there is no concurrent contention or if the array of cells is being initialized, CAS is used to accumulate values to base, if there is concurrent contention, it initializes the array of cells, how many threads are allowed to modify it in parallel, and then sums each cell in the array, Plus base is the final value

  • LinkedBlockingQueue Uses different locks for joining and leaving the queue, which is more efficient than having only one lock on LinkedBlockingArray

3. Lock coarsening

In addition, the JVM may be optimized to coarser multiple appends into a single lock (since they are all locked on the same object, there is no need to re-enter multiple times).

new StringBuffer().append("a").append("b").append("c");
Copy the code

4. Eliminate the lock

The JVM performs code escape analysis, such as when a lock object is a local variable within a method and cannot be accessed by other threads

All synchronization is ignored by the just-in-time compiler.

5. Read and write separation

CopyOnWriteArrayList

CopyOnWriteSet