Introduction to the

Prior to jdk1.6, synchronized was said to be a heavyweight lock, and since then the JVM has made a lot of improvements to it, using synchronized threads to acquire locks that can be biased, lightweight, or heavyweight depending on the state of contention.

In terms of locking technologies, there are some such as lock coarsening, lock elimination, spin locking, and adaptive spin locking, which will be explained later in this article.

Note that we’re talking about synchronized. Lock is another way to do it.

What is a heavyweight lock

To understand why the JVM is optimizing it, we need to understand what a heavyweight lock is and why it should be optimized. Let’s look at a piece of code

public synchronized void f(a) {
    System.out.println("hello world");
}
Copy the code

Javap decompiled

public synchronized void f();
    descriptor: ()V
    flags: ACC_PUBLIC, ACC_SYNCHRONIZED
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3 // String hello world
         5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;) V
         8: return
      LineNumberTable:
        line 3: 0
        line 4: 8

Copy the code

When a thread accesses this method, it first checks for the presence of ACC_SYNCHRONIZED and then obtains the corresponding monitor lock.

The monitor lock is released when the method ends or an unhandled exception is thrown in the middle.

In Hotspot this is done using ObjectMonitor, which provides the ability to acquire locks, release locks, wait for locks to be released and then compete for locks, wait for locks to be awakened, etc. Let’s look at how it does this.

Each object holds a Monitor. Monitor is a synchronization mechanism through which we can implement mutually exclusive access between threads. First, we list several key fields of ObjectMonitor that we need to discuss

  • _owner, which thread holds ObjectMonitor currently
  • _entryList, blocking queue (blocking threads competing for locks)
  • _WaitSet, the thread in the wait queue needs to wait to be woken up (via interrupt, singal, timeout return, etc.)
  • _CXq, the thread fails to acquire the lock and places it in the _CXQ queue
  • Recursions: synchronized is a reentrant lock

The process from a thread competing for a lock to the end of a method that blocks the execution of a queue thread competing for a lock is shown in the figure above.

When the lock is acquired

When the lock is released

Before JDK1.6, synchronized directly called ObjectMonitor’s Enter method to acquire the lock (figure 1), and then called ObjectMonitor’s exit method when releasing the lock (figure 2). This is called a heavyweight lock. You can see the operational complexity involved.

So think about it

If there is only one thread to access it at the same time, then even if it has shared variables, because it will not be accessed by multiple threads at the same time, there is no thread safety problem, this time in fact, there is no need to perform heavyweight locking process. You just need to use thread-safe operations when a race occurs

This leads to biased locks and lightweight locks

spinlocks

Spin-locking is enabled by default since JDK1.6. Since the wake up and suspend pair of heavyweight locks need to be called from user mode to kernel mode, a large number of concurrent calls will put a lot of pressure on the system, so the emergence of spin lock, to avoid frequent suspend and restore operations.

Spinlocks mean thread has acquired A lock in the execution, then thread B at the time of acquiring A lock, no traffic, not to give up the CPU execution time directly to infinite loop (A finite number of) constantly for the lock, if thread A completed the execution speed is very fast, then thread B can rapidly get lock object to carry out, This avoids the overhead of suspending and resuming threads and further improves response time.

The default number of spins is 10, which can be changed with -xx :PreBlockSpin

Adaptive spin

It is similar to a spin lock, except that the time and number of spins are no longer fixed. For example, on the same lock object, if the last spin succeeded in acquiring the lock, the JVM will assume that the next spin will succeed in acquiring the lock, allowing the spin to take longer to acquire the lock. If very few spins have been successfully locked on the same lock object, the JVM may simply omit the spin process.

Spin lock and adaptive lock is similar, although spin wait to avoid the thread overhead, but they don’t give up the CPU execution time, if the lock is the amount of time is very long, so may be the presence of large amounts of spin thus wasting CPU resources, so the spin lock cannot be used to replace the obstruction, it has its applicable scenario

Biased locking

The lock is biased in favor of the first thread to execute it. If the lock is not accessed by any subsequent thread, then we do not need to execute the lock.

If it is found that another thread is acquiring the lock, it will decide whether to re-favor the lock to the new thread or cancel the biased lock and upgrade to a lightweight lock according to the status of the thread that acquired the lock before.

The Mark Word lock identifier is as follows

thread ID Bias lock Lock flag bit
thread ID epoch 1 01 (Not locked)

When thread A-thread ID is 100, it finds that the lock flag bit is 01 and the bias lock flag bit is 1 (bias can be) when it tries to obtain the lock. Then CAS records the thread ID in the Mark Word of the object header

thread ID Bias lock Lock flag bit
100 epoch 1 01 (Not locked)

When executing this method again in the future, we simply need to check whether the thread ID in the Mark Word of the object header is the current thread. If so, we will run it directly

If there is another thread B trying to acquire the lock at this time, thread B, whose ID is 101, also checks the lock flag bit and whether the state can be biased, then CAS points the thread ID of Mark Word to itself, and finds that it fails. Since thread ID already points to thread A, the partial lock will be revoked, the thread with the partial lock will be suspended at A global safe point (no bytecode is executing), and the state of thread A will be checked.

In the first case, thread A has terminated, so after setting the Mark Word thread ID to null, CAS bias the thread ID to thread B and then reverts to the above state again favoring the lock thread

thread ID Bias lock Lock flag bit
101 epoch 1 01 (Not locked)

In the second case, thread A is active, and the bias lock is upgraded to A lightweight lock, and thread A is awakened to complete the subsequent operation, and thread B spins to acquire the lightweight lock.

thread ID Bias lock Lock flag bit
empty 0 00 (Lightweight locking)

Biased locking is suitable for can be found from beginning to end have only a single thread, in the case of running, omit the spin locks, lock mutex and heavyweight in the overhead, this kind of lock the lowest overhead, performance close to the best unlocked state, but if there is competition between threads, you need to pause frequently have biased locking state of thread and check, Deciding whether to re-bias or upgrade to a lightweight lock is a performance penalty, and you can turn off bias locks if you know in advance that there is likely to be competition

Some friends will say that there is competition should not immediately upgrade to the weight level lock, not necessarily, the following will be about the lightweight lock will understand.

Lightweight lock

If there are no or occasional contention between threads and the code inside the lock is executed very quickly then this is suitable for the lightweight lock scenario. If biased locking is completely unsynchronized and also eliminates the CAS and spin lock acquisition process, It only needs to determine whether the thread ID in Mark Word refers to itself (some judgment at other time points can be neglected), so lightweight locks use CAS and spin locks to acquire locks and reduce the performance cost of using operating system mutex to complete heavyweight locks

Lightweight locks are implemented as follows

JVM will create a space for storing the lock record in the stack frame of the current thread, and then copy the product header’s Mark Word (officially called product Mark Word) into the lock record. The thread will then try to replace the product header’s Mark Word with the pointer to the lock record using CAS

Assuming thread B succeeds in replacing the lock, the lock is successfully acquired and the code continues with the Mark Word as follows

Pointer to the thread stack The lock state
Stack Pointer 1 -> Point to thread B 00 (Lightweight lock)

When thread C attempts to acquire the lock, CAS fails to modify the object header and finds that it has been occupied by thread B. Then it spins to acquire the lock. Thread B completes the process and thread C succeeds in acquiring the spin

Pointer to the thread stack The lock state
Stack Pointer 2 -> Thread C 00 (Lightweight lock)

Thread D then spins to acquire the lock and finds that it is occupied by thread C. By default, it spins to acquire the lock 10 times and finds that it still cannot acquire the corresponding lock (thread C has not released it yet). Then thread D changes the Mark Word to a heavyweight lock

Pointer to the thread stack The lock state
Stack Pointer 2 -> Thread C 10 (Heavyweight lock)

When thread C completes its execution and replaces the Mark Word in the stack frame with the Mark Word in the object header, it finds that there are other threads competing for the lock (thread D changes the lock state). Then it releases the lock and wakes up the waiting thread, and the subsequent thread operations are all heavyweight locks

Pointer to the thread stack The lock state
empty 10 (weight lock)

It is important to note that locks are not degraded once they are upgraded

Lock elimination

Lock elimination is mainly the JIT compiler optimization operation, the first for hot code the JIT compiler will be compiled to machine code, sequential execution when don’t need to explain to each class bytecode into machine code and then carried out so as to promote efficiency, it will be according to escape analysis to the optimization of code to do a certain degree such as lock elimination, Stack allocation and so on

public void f(a) {
    Object obj = new Object();
    synchronized(obj) { System.out.println(obj); }}Copy the code

The JIT compiler finds that objects in f() can only be accessed by one thread and cancels synchronization

public void f(a) {
    Object obj = new Object();
    System.out.println(obj);
}
Copy the code

Lock coarsening

If you repeatedly lock and unlock the same object in a piece of code, it is relatively expensive. In this case, you can appropriately broaden the scope of locking to reduce performance consumption.

When the JIT finds that a series of consecutive operations repeatedly lock and unlock the same object, or even when the locking operation occurs in the body of the loop, the scope of lock synchronization will spread outside the entire operation sequence.

for (int i = 0; i < 10000; i++) {
    synchronized(this) {
        do();
    }
}
Copy the code

Coarsened code

synchronized(this) {
    for (int i = 0; i < 10000; i++) {
        do();
    }
}
Copy the code

Reference:

  • The art of Concurrent programming in Java
  • In-depth understanding of the Java virtual machine
  • In-depth understanding of Java concurrent programming