What is CAS?

Cas: compare and swap, which ensures that multithreaded pairs are worth updating in an unlocked state. We can take a look at cas in the JDK:

/**
 * Atomically increments by one the current value.
 *
 * @return the updated value
 */
public final int incrementAndGet(a) {
    return unsafe.getAndAddInt(this, valueOffset, 1) + 1;
}


public final int getAndAddInt(Object var1, long var2, int var4) {
        int var5;
        do {
            var5 = this.getIntVolatile(var1, var2);
        } while(!this.compareAndSwapInt(var1, var2, var5, var5 + var4));

        return var5;
}
Copy the code

CompareAndSwapInt is used for increment operations in Atomic Atomic classes, where cas implementation uses native methods. Use a flow chart to understand what CAS is.

There is an ABA problem: THE so-called ABA is that after thread A saves the value, another thread B modiates the value, and the result of B’s modification is still the original value. This is the ABA problem. At this time, it needs to judge whether the modification is worth the perception according to the business scenario. You can add a version number to this value if you want to be aware.

Let’s use a piece of code to demonstrate the ABA problem in CAS

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

/** * create by yanghongxing on 2020/5/8 11:03 PM */
public class ABA {
    private static AtomicInteger atomicInt = new AtomicInteger(100);

    public static void main(String[] args) throws InterruptedException {
        // Do this twice for an AtomicInteger value, resulting in the same result as before
        Thread intT1 = new Thread(() -> {
            atomicInt.compareAndSet(100.101);
            atomicInt.compareAndSet(101.100);
        });
        
        Thread intT2 = new Thread(() -> {
            try {
                TimeUnit.SECONDS.sleep(1);
            } catch (InterruptedException e) {
            }
            boolean c3 = atomicInt.compareAndSet(100.101);
            // true, the command is executed successfullySystem.out.println(c3); }); intT1.start(); intT2.start(); }}Copy the code

You can solve this problem with AtomicStampedReference in the JDK. Finally, we take a look at the cas implementation principle and take a look at the source code of the final native method jdK8u: atomic_linux_x86.inline-hpp

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  int mp = os::is_MP();
  __asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc", "memory");
  return exchange_value;
Copy the code

Assembly instructions let’s look at this one

__asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
Copy the code

__asm__ indicates assembly instruction, lock indicates lock, if if MP (%4) indicates CPU is multi-core, CMPXCHGL indicates CMP exchange full name compare and exchange. Final implementation:

Lock the CMPXCHG instructionCopy the code

This assembly instruction (hardware instruction) indicates that a lock is applied if the CPU is multi-core.

Layout of Java objects in memory

Let’s start by looking at the (detailed) layout of Java objects in memory, which is closely related to the implementation of Java locks. Tool: JOL = Java Object Layout

<dependencies>
    <! -- https://mvnrepository.com/artifact/org.openjdk.jol/jol-core -->
    <dependency>
        <groupId>org.openjdk.jol</groupId>
        <artifactId>jol-core</artifactId>
        <version>0.9</version>
    </dependency>
</dependencies>
Copy the code

Use the sample

public class ShowJOL {
    public static void main(String[] args) {
        Object o = newObject(); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}Copy the code

The output

java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object  header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes totalCopy the code

OFFSET: indicates the starting position of the file. Size: indicates the size in bytes.

TYPE DESCRIPTION: TYPE DESCRIPTION. The example above is the object header.

VALUE: the VALUE

Loss due to the next object alignment: The loss due to the next object alignment is shown in the figure below.

Markword: Information about locks. Class Pointer: Indicates which class the object belongs to. Instance data: Literal interpretation of instance data, such as 4 bytes for an int a and 8 bytes for a long B created in an object. Padding Data: If the space occupied by the data above is not divisible by 8, the padding fills up the space so that it is divisible by 8. Divisible by 8 is faster to read.

The first one and the second one are both 4 bytes markword, and the third one is 4 bytes class inter.Instance data is used to store member variables, but we did not write it, so it is 0. These add up to 12 bytes that are not divisible by 8, so we’re going to add 4 bytes to align. (note the memory footprint here is the default open bytes compressed XX: + UseCompressedClassPointers – XX: + UseCompressedOops)

With that in mind, let’s execute the following code

/** * create by yanghongxing on 2020/5/11 11:52 PM */
public class ShowJOL {
    public static void main(String[] args) {
        Object o = newObject(); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}Copy the code

Execution Result:

java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object  header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes totalCopy the code

Comparing the output of this with the output we printed the first time, we can conclude that synchronized lock information is recorded on markword. One of the most common phrases we hear in Java development is that synchronized is a heavyweight lock. Is that true? We can see the process of synchronized locking by analyzing Markword. In early JDK1.0, JDK applied for heavyweight locks every time, and the performance was poor. With the upgrade of JDK later, the performance of synchronized improved. Synchronized does not start out as a weighted lock, but as a slow escalation process. Let’s start with the table markword-64.png

Biased Locking: A multithreading optimization introduced in Java6, biased locking, as its name implies, favors the first thread to access the lock. If a synchronization lock is accessed by only one thread during execution and there is no multi-thread contention, then the thread does not need to trigger synchronization. In this case, a biased lock is added to the thread. If, while running, another thread preempts the lock, the thread holding the biased lock is suspended, and the JVM removes the biased lock from its body, reverting the lock to a standard lightweight lock. Spin lock: The purpose of spin lock is to occupy CPU resources, until the lock is acquired immediately processing. Spinning all the time is also a CPU hog, so if you have a lot of threads spinning, if you have a lot of spins you might run out of CPU, so you need to upgrade. Heavyweight lock: kernel-based lock, which costs a lot of resources. Internal wait processing is performed on waiting threads to prevent CPU consumption.

Using this table, let’s write an example of what synchronized looks like by default when there is no lock contention.

/** * create by yanghongxing on 2020/5/11 11:52 PM */
public class ShowJOL {
    public static void main(String[] args) {
        Object o = new Object();
        System.out.println(Integer.toHexString(o.hashCode()));
        System.out.println(ClassLayout.parseInstance(o).toPrintable());

        synchronized(o) { System.out.println(Integer.toHexString(o.hashCode())); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}}Copy the code

Then look at the output:

5f8ed237 java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 37 d2 8e (00000001 00110111 11010010 10001110) (-1898825983) 4  4 (object header) 5f 00 00 00 (01011111 00000000 00000000 00000000) (95) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 90 29 7d 06 (10010000 00101001 01111101 00000110) (108865936) 4 4  (object header) 00 70 00 00 (00000000 01110000 00000000 00000000) (28672) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total Disconnected from the target VM, address:'127.0.0.1:62501', transport: 'socket'

Process finished with exit code 0

Copy the code

We print the hexadecimal encoding of the Object’s Hashcode in the first line, compared to the unlocked output. The Hashcode is stored in the Object’s Markword. Let’s look at the secondary value of the unlocked markword: 00000001 00110111 11010010 10001110, look at the first 8 digits of the bottom 3 is 001 (spoken description is not accurate 😂), compare the above table is no lock state, let’s look at the second markword value 000, corresponding to the table is light lock, spin lock. Let’s use another example with lock contention to see what happens.

/** * create by yanghongxing on 2020/5/12 7:13pm */
public class MarkwordMain {


    private static Object OBJ = new Object();

    private static void printf(a) {
        System.out.println(ClassLayout.parseInstance(OBJ).toPrintable());
    }

    private static Runnable RUNNABLE = () -> {
        synchronized(OBJ) { printf(); }};public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < 3; i++) {
            newThread(RUNNABLE).start(); } Thread.sleep(Integer.MAX_VALUE); }}Copy the code

Here we have three threads competing to print this memory distribution, and look at the output,

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           5a 59 82 ef (01011010 01011001 10000010 11101111) (-276670118)
      4     4        (object header)                           f9 7f 00 00 (11111001 01111111 00000000 00000000) (32761)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           5a 59 82 ef (01011010 01011001 10000010 11101111) (-276670118)
      4     4        (object header)                           f9 7f 00 00 (11111001 01111111 00000000 00000000) (32761)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           5a 59 82 ef (01011010 01011001 10000010 11101111) (-276670118)
      4     4        (object header)                           f9 7f 00 00 (11111001 01111111 00000000 00000000) (32761)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Copy the code

We see that it is 010, and the corresponding table is the heavyweight lock.

Synchronized lock upgrades follow the process of new-bias lock – lightweight lock (no lock, spin lock, adaptive spin lock) – heavyweight lock. Bias lock – Markword records the current thread pointer, the next time the same thread locks, there is no need to contention, only need to judge whether the thread pointer is the same, so bias lock, bias the first thread to lock.

Contention – Lock upgrade to lightweight lock – each thread has its own LockRecord on its own thread stack, with CAS to contention markword’s LockRecord pointer, pointer to which thread’s LockRecord, which thread has the lock

More than 10 spins, upgrade to heavyweight lock – if too many spins consume too much CPU, upgrade to heavyweight lock, enter wait queue (no CPU consumption) -XX:PreBlockSpin

Spin-locking was introduced in JDK1.4.2 and is enabled using -xx :+UseSpinning. It became default in JDK 6 and introduced adaptive spin locking (adaptive spin locking).

Adaptive spin locking means that the spin time (number) is no longer fixed, but is determined by the previous spin time on the same lock and the state of the lock owner. If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time. If spin is rarely successfully acquired for a lock, it is possible to omit the spin process and block the thread directly in future attempts to acquire the lock, avoiding wasting processor resources.

Implementation principle of synchronized

Java source level

Synchronized (object)

Bytecode hierarchy

Using the IDEA plugin jclasslib plugin to view the bytecode, let’s use the previous code as an example

public class ShowJOL {
    public static void main(String[] args) {
        Object o = newObject(); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}Copy the code

public class ShowJOL {
    public static void main(String[] args) {
        Object o = new Object();
        synchronized(o) { System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}}Copy the code

Assembly level

We decompiled the Java source code into assembly code using the HSDIS tool

/** * create by yanghongxing on 2020/5/12 11:45pm */
public class SynchronizedTest {

    private static int c;

    public static synchronized void sync(a) {}public static void noSynchronized(a) {
        int a = 1;
        int b = 2;
        c = a + b;
    }

    public static void main(String[] args) {
        for (int j = 0; j < 1000 _000; j++) { sync(); noSynchronized(); }}}Copy the code
  0x00000001195d2e4e: lock cmpxchg %r11,(%r10)
  0x00000001195d2e53: je     0x00000001195d2da0
  0x00000001195d2e59: mov    %r13,(%rsp)
  0x00000001195d2e5d: movabs $0x79578d830,%rsi  ;   {oop(a 'java/lang/Class' = 'com/example/demo/SynchronizedTest')} 0x00000001195d2e67: lea 0x10(%rsp),%rdx 0x00000001195d2e6c: data32 xchg %ax,%ax 0x00000001195d2e6f: callq 0x0000000119525860 ; OopMap{off=404} ; *synchronization entry ; - com.example.demo.SynchronizedTest::sync@-1 (line 11)Copy the code

Lock CMPXCHG (synchronized) uses CAS to implement locking.

Lock eliminate

public void add(String str1,String str2){
         StringBuffer sb = new StringBuffer();
         sb.append(str1).append(str2);
}
Copy the code

We all know that StringBuffer is thread-safe because its key methods are modified by synchronized, but if we look at the code above, we can see that the reference to sb is only used in the add method and cannot be referenced by other threads (because it is local and the stack is private). So sb is an impossible resource to share, and the JVM automatically removes the lock inside the StringBuffer object.

Lock coarsening lock coarsening

public String test(String str){
       int i = 0;
       StringBuffer sb = new StringBuffer():
       while(i < 100){
           sb.append(str);
           i++;
       }
       return sb.toString():
}
Copy the code

When the JVM detects that a sequence of operations is locking the same object (append 100 times in the while loop, lock/unlock 100 times without lock coarsening), the JVM coarsenes the scope of the lock outside of the sequence of operations (such as while Unreal outside). The sequence of operations requires only one lock.

Volatile implements the application and principle

First, consider what volatile does:

  1. Disable command rebeats

  2. Ensure visibility of memory

    Let’s look at an example

public class VolatileExample {
    // Visibility parameters
    /*volatile*/ static boolean flag = false;

    public static void main(String[] args) {
        new Thread(() -> {
            try {
                // Suspend execution for 0.5s
                Thread.sleep(500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            flag = true;
            System.out.println("Flag changed to true");
        }).start();
        // Always check flag=true
        while (true) {
            if (flag) {
                System.out.println("Flag changed to true detected");
                break; }}}}Copy the code

When volatile is not used, the flag is changed to true in the child thread, but is not visible in the parent thread. The “flag is detected as true” is printed by volatile. Let’s look at another example of instruction rearrangement.

public class VolatileExample1 {
    // Command reorder parameters
    private static int a = 0, b = 0;
    private static int x = 0, y = 0;

    public static void main(String[] args) throws InterruptedException {
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            Thread t1 = new Thread(() -> {
                // There is a possibility of instruction reordering, first x=b then a=1
                a = 1;
                x = b;
            });
            Thread t2 = new Thread(() -> {
                // There is a possibility of instruction reordering, y=a then b=1
                b = 1;
                y = a;
            });
            t1.start();
            t2.start();
            t1.join();
            t2.join();
            System.out.println("The first" + i + "Time, x =" + x + " | y=" + y);
            if (x == 0 && y == 0) {
                // An instruction reorder occurred
                break;
            }
            // Initialize variables
            a = 0;
            b = 0;
            x = 0;
            y = 0; }}}Copy the code

X = b; And then y = a; Instruction reordering occurs when the a = 1 and b = 1 statements are finally executed. Let’s talk about another application that disallows reordering of instructions. In the singleton mode, to ensure the singleton in multi-threaded environment, we usually use the mechanism of double check, and the implementation code is as follows:

public class LazyDoubleCheckSingleton {
    private volatile static LazyDoubleCheckSingleton lazyDoubleCheckSingleton = null;
    private LazyDoubleCheckSingleton(a) {}public static LazyDoubleCheckSingleton getInstance(a) {
        if (lazyDoubleCheckSingleton == null) {
            synchronized (LazyDoubleCheckSingleton.class) {
                if (lazyDoubleCheckSingleton == null) {
                    lazyDoubleCheckSingleton = newLazyDoubleCheckSingleton(); }}}returnlazyDoubleCheckSingleton; }}Copy the code

The easiest way to ensure thread-safe singletons is to add synchronized to the getInstance method, but this method locks too much and doesn’t perform very well. Therefore, we first check whether the variable lazyDoubleCheckSingleton is empty on the getInstance method. If it is empty, we will lock it. Create an object if it is empty. There are two judgments that are made and this is often called a double check. Why is the member variable volatile? What about not volatile? To understand this, let’s look at the process of creating an object.

For example, lazyDoubleCheckSingleton = new lazyDoubleCheckSingleton (),

  1. Allocate memory to this object

  2. Initialize an object

  3. Set lazyDoubleCheckSingleton to point to the newly allocated memory address

    If we do not use volatile to modify the lazyDoubleCheckSingleton, we may have a 1-3-2 execution process. After steps 1-3, the lazyDoubleCheckSingleton is no longer null. If (lazyDoubleCheckSingleton == null) returns an uninitialized thread. If (lazyDoubleCheckSingleton == null) So we use volatile to ensure that the instructions are executed 1-2-3. Let me add a picture to get a sense of what’s going on.

For execution in multiple threads it becomes the following. Multithreaded. PNG