CAS

Compare And Swap (Compare And Exchange)/spin/spin lock/no lock

It refers to a general class of operations because it is often combined with a loop operation until it is completed

Cas (v, A, b), variable V, expected value A, modified value b

ABA problem, your girlfriend went through someone else while she was away, and spin is you idling around waiting for her to take you in

Solution (Version number AtomicStampedReference) : A simple value of the basic type does not require a version number

Unsafe

AtomicInteger:

public final int incrementAndGet(a) {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                returnnext; }}public final boolean compareAndSet(int expect, int update) {
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }
Copy the code

Unsafe:

public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);
Copy the code

To use:

package com.mashibing.jol;

import sun.misc.Unsafe;

import java.lang.reflect.Field;

public class T02_TestUnsafe {

    int i = 0;
    private static T02_TestUnsafe t = new T02_TestUnsafe();

    public static void main(String[] args) throws Exception {
        //Unsafe unsafe = Unsafe.getUnsafe();

        Field unsafeField = Unsafe.class.getDeclaredFields()[0];
        unsafeField.setAccessible(true);
        Unsafe unsafe = (Unsafe) unsafeField.get(null);

        Field f = T02_TestUnsafe.class.getDeclaredField("i");
        long offset = unsafe.objectFieldOffset(f);
        System.out.println(offset);

        boolean success = unsafe.compareAndSwapInt(t, offset, 0.1);
        System.out.println(success);
        System.out.println(t.i);
        //unsafe.compareAndSwapInt()}}Copy the code

jdk8u: unsafe.cpp:

cmpxchg = compare and exchange

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END
Copy the code

jdk8u: atomic_linux_x86.inline.hpp

is_MP = Multi Processor

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  int mp = os::is_MP(a);__asm__ volatile (LOCK_IF_MP(%4) "cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest), "r" (mp)
                    : "cc"."memory");
  return exchange_value;
}
Copy the code

jdk8u: os.hpp is_MP()

  static inline bool is_MP(a) {
    // During bootstrap if _processor_count is not yet initialized
    // we claim to be MP as that is safest. If any platform has a
    // stub generator that might be triggered in this phase and for
    // which being declared MP when in fact not, is a problem - then
    // the bootstrap routine for the stub generator needs to check
    // the processor count directly and leave the bootstrap routine
    // in place until called after initialization has ocurred.
    return(_processor_count ! =1) || AssumeMP;
  }
Copy the code

jdk8u: atomic_linux_x86.inline.hpp

#define LOCK_IF_MP(mp) "cmp $0, " #mp "; je 1f; lock; 1:"
Copy the code

Final implementation:

CMPXCHG = cas Modifies the variable value

Lock the CMPXCHG instructionCopy the code

Hardware:

The lock command locks a Northbridge signal when executing the following command

(Do not lock the bus)

markword

Tool: JOL = Java Object Layout

<dependencies>
        <! -- https://mvnrepository.com/artifact/org.openjdk.jol/jol-core -->
        <dependency>
            <groupId>org.openjdk.jol</groupId>
            <artifactId>jol-core</artifactId>
            <version>0.9</version>
        </dependency>
    </dependencies>
Copy the code

jdk8u: markOop.hpp

// Bit-format of an object header (most significant first, big endian layout below):
//
// 32 bits:
// --------
// hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
// size:32 ------------------------------------------>| (CMS free block)
// PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
// 64 bits:
// --------
// unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
// PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
// size:64 ----------------------------------------------------->| (CMS free block)
//
// unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object)
// JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object)
// narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
// unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)
Copy the code

Detailed explanation of synchronized cross section

  1. Principle of synchronized
  2. The upgrade process
  3. Assembly implementation
  4. Vs reentrantLock

Java Source hierarchy

synchronized(o)

Bytecode hierarchy

monitorenter moniterexit

JVM Hierarchy (Hotspot)

package com.mashibing.insidesync;

import org.openjdk.jol.info.ClassLayout;

public class T01_Sync1 {
  

    public static void main(String[] args) {
        Object o = newObject(); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}Copy the code
com.mashibing.insidesync.T01_Sync1$Lock object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4   (object header)  05 00 00 00 (00000101 00000000 00000000 00000000) (5)
      4     4   (object header)  00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4   (object header)  49 ce 00 20 (01001001 11001110 00000000 00100000) (536923721)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Copy the code
com.mashibing.insidesync.T02_Sync2$Lock object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4   (object header)  05 90 2e 1e (00000101 10010000 00101110 00011110) (506368005)
      4     4   (object header)  1b 02 00 00 (00011011 00000010 00000000 00000000) (539)
      8     4   (object header)  49 ce 00 20 (01001001 11001110 00000000 00100000) (536923721)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes tota
Copy the code

InterpreterRuntime: : monitorenter method

IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  if (PrintBiasedLockingStatistics) {
    Atomic::inc(BiasedLocking::slow_path_entry_count_addr());
  }
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap() - >is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (UseBiasedLocking) {
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
    ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
  } else {
    ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
  }
  assert(Universe::heap() - >is_in_reserved_or_null(elem->obj()),
         "must be NULL or an object");
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END
Copy the code

synchronizer.cpp

revoke_and_rebias

void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
 if (UseBiasedLocking) {
    if(! SafepointSynchronize::is_at_safepoint()) {
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return; }}else {
      assert(! attempt_rebias,"can not rebias toward VM thread");
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(! obj->mark() - >has_bias_pattern(), "biases should be revoked by now");
 }

 slow_enter (obj, lock, THREAD) ;
}
Copy the code
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
  markOop mark = obj->mark(a);assert(! mark->has_bias_pattern(), "should not see bias pattern here");

  if (mark->is_neutral()) {
    // Anticipate successful CAS -- the ST of the displaced mark must
    // be visible <= the ST performed by the CAS.
    lock->set_displaced_header(mark);
    if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj() - >mark_addr(), mark)) {
      TEVENT (slow_enter: release stacklock) ;
      return ;
    }
    // Fall through to inflate() ...
  } else
  if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) {
    assert(lock ! = mark->locker(), "must not re-lock the same lock");
    assert(lock ! = (BasicLock*)obj->mark(), "don't relock with same BasicLock");
    lock->set_displaced_header(NULL);
    return;
  }

#if 0
  // The following optimization isn't particularly useful.
  if (mark->has_monitor() && mark->monitor() - >is_entered(THREAD)) {
    lock->set_displaced_header (NULL);return ;
  }
#endif

  // The object header will never be displaced to this lock,
  // so it does not matter what the value is, except that it
  // must be non-zero to avoid looking like a re-entrant lock,
  // and must not look locked either.
  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD, obj() - >enter(THREAD);
}
Copy the code

The inflate method: expands to a heavyweight lock

Lock upgrade process

JDK8 markWord implementation table

[img-PS2yawuQ-1611826398819)(./markword.png)]

No lock – Biased lock – Lightweight lock (spin lock, adaptive spin) – Heavyweight lock

Synchronized optimization is closely related to MarkWord

The lowest three digits in Markword are used to represent the lock state, in which one is the biased lock position and two are the common lock position

  1. Object O = New Object() Lock = 0 01 No lock state

  2. o.hashCode()

    001 + hashcode

    00000001 10101101 00110100 00110110
    01011001 00000000 00000000 00000000
    Copy the code

    little endian big endian

    00000000 00000000 00000000 01011001 00110110 00110100 10101101 00000000

  3. Default synchronized(o) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized(O) 00 -> Lightweight synchronized Because the JVM has its own threads started by default, there is a lot of sync code in the sync code. The sync code starts with the knowledge that there will be contention. If biased locks are used, biased locks will constantly revoke and upgrade locks, which is inefficient.

    -XX:BiasedLockingStartupDelay=0
    Copy the code
  4. New Object () -> 101 bias lock -> thread ID 0 -> Anonymous BiasedLock turn bias lock, new out of the Object, the default is a bias Anonymous Object 101

  5. If a thread has a bias lock, it means that the process bias lock cannot be rebiased batch bias batch undo when the thread ID of markword is changed to its own thread ID

  6. If there are threads competing to revoke the biased lock, upgrade the lightweight lock thread to generate a LockRecord in its own thread stack, use the CAS operation to set the Markword to point to the LR of its own thread, set the successful lock

  7. If the race is increasing, the race is increasing: there are threads with more than 10 spins, -xx :PreBlockSpin, or spin threads with more than half of the CPU cores. After 1.6, the JVM controls the upgrade heavyweight lock by adding adaptive spin Self Spinning: -> Apply resources to the operating system, Linux MUtex, CPU system calls from level 3-0, thread hangs, enters a wait queue, waits for the operating system to schedule, and then maps back to user space

(the above experiment environment is JDK11, open bias lock, and JDK8 default object header is unlocked)

Bias locks are turned on by default, but there is a delay that should be set if bias locks are to be observed

Yes, I am the director of the toilet

Locking means locking the object

Lock the upgrade process

Earlier VERSIONS of the JDK used user-mode mutexes to convert to kernel mode, which was inefficient

The modern version is optimized

No lock – Bias lock – Lightweight lock (spin lock) – Heavyweight lock

Bias lock – Markword records the current thread pointer, the next time the same thread locks, there is no need to contention, only need to judge whether the thread pointer is the same, so bias lock, bias the first thread to lock. The hashCode backup is thread destroyed on the thread stack and the lock is demoted to lockless

Contention – Lock upgraded to lightweight lock – each thread has its own LockRecord on its own thread stack, using CAS to contention the pointer to the LR of Markword, which thread owns the lock

More than 10 spins, upgrade to heavyweight lock – if too many spins consume too much CPU, upgrade to heavyweight lock, enter wait queue (no CPU consumption) -XX:PreBlockSpin

Spin-locking was introduced in JDK1.4.2 and is enabled using -xx :+UseSpinning. It became default in JDK 6 and introduced adaptive spin locking (adaptive spin locking).

Adaptive spin locking means that the spin time (number) is no longer fixed, but is determined by the previous spin time on the same lock and the state of the lock owner. If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time. If spin is rarely successfully acquired for a lock, it is possible to omit the spin process and block the thread directly in future attempts to acquire the lock, avoiding wasting processor resources.

Partial lock because of the process REVOKE, which consumes system resources, partial lock may not be efficient when lock contention is particularly intense. Use lightweight locks instead.

Synchronized low-level implementation


public class T {
    static volatile int i = 0;
    
    public static void n(a) { i++; }
    
    public static synchronized void m(a) {}
    
    publics static void main(String[] args) {
        for(int j=0; j<1000 _000; j++) { m(); n(); }}}Copy the code

java -XX:+UnlockDiagonositicVMOptions -XX:+PrintAssembly T

C1 Compile Level 1

C2 Compile Level 2

Find the m() n() method’s sink code, you can see lock comxchg… instruction

synchronized vs Lock (CAS)

Synchronized is more efficient in high contention and high time consuming environments. CAS is more efficient in low contention and low consumption environments. After synchronized to heavyweight, it is the waiting queue (does not consume CPU), CAS (consumes CPU during waiting)Copy the code

Lock eliminate

public void add(String str1,String str2){
         StringBuffer sb = new StringBuffer();
         sb.append(str1).append(str2);
}
Copy the code

We all know that StringBuffer is thread-safe because its key methods are modified by synchronized, but if we look at the code above, we can see that the reference to sb is only used in the add method and cannot be referenced by other threads (because it is local and the stack is private). So sb is an impossible resource to share, and the JVM automatically removes the lock inside the StringBuffer object.

Lock coarsening lock coarsening

public String test(String str){
       
       int i = 0;
       StringBuffer sb = new StringBuffer():
       while(i < 100){
           sb.append(str);
           i++;
       }
       return sb.toString():
}
Copy the code

When the JVM detects that a sequence of operations is locking the same object (append 100 times in the while loop, lock/unlock 100 times without lock coarsening), the JVM coarsenes the scope of the lock outside of the sequence of operations (such as while Unreal outside). The sequence of operations requires only one lock.

Lock degradation (not important)

www.zhihu.com/question/63…

In fact, if vmThreads are only accessed, demotion is meaningless. So it’s easy to assume that lock degradation doesn’t exist!

hyper-threading

1 ALU + two groups of Registers + PC

The resources

Openjdk.java.net/groups/hots…