The Synchronized keyword

Underlying implementation principles of heavyweight Synchronized

Java language level

synchronized

Virtual machine bytecode

Monitorenter and monitorexit. ACC_SYNCHRONIZED

The operating system

mutex

The CPU hardware

Bus lock LOCK# signal

Synchronized blocks guarantee that only one thread can enter the contention area at a time. Synchronized blocks also guarantee that all variables in the block will be read from main memory. Updates to all variables are flushed into main memory when a thread exits the block. It does not matter whether these variables are volatile or not.

The processor uses bus locks to solve this problem. A bus lock is the use of a LOCK# signal provided by a processor. When one processor outputs this signal on the bus, requests from other processors will be blocked and the processor can exclusively use the shared memory. A bus lock locks the communication between the CPU and the memory. During the lock period, other processors cannot operate on the data of other memory addresses. Therefore, the cost of a bus lock is high.

In addition, the execution of the two instructions is implemented by the JVM by calling mutex, the mutually exclusive primitive of the operating system. The blocked thread will be suspended and waiting for rescheduling, resulting in switching between the “user state and kernel state”, which has a significant impact on performance.

The JVM decompiles bytecode to see that the synchronized keyword is parsed into the following command:

  • Monitorenter and Monitorexit (modify code block)
public class SynchronizedDemo { public void method() { synchronized (this) { System.out.println("Method 1 start"); }}}Copy the code

The monitorexit directive appears twice, the first time to release the lock for a normal exit from the synchronization. The second time the lock is released for an asynchronous exit.

  • ACC_SYNCHRONIZED
public class SynchronizedMethod {
    public synchronized void method() {
        System.out.println("Hello World!");
    }
}
Copy the code

Synchronized lock optimization

  • Lock expansion

When using a synchronized lock, you want to keep the scope of the synchronized block as small as possible — only synchronize in the actual scope of the shared data. The goal is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible.

In most cases, this is true. However, if a series of continuous lock unlocking operations, may lead to unnecessary performance loss, so the concept of lock coarsening is introduced.

The concept of lock coarsening is easy to understand. It is to connect multiple consecutive lock and unlock operations together to expand a wider range of locks.

As in the above example:

Each time a vector is added, a lock is required. When the JVM detects that a vector has been consecutively locked and unlocked, a larger lock and unlock operation is merged and moved out of the for loop.

Vector<String> vector = new Vector<String>(); public void vectorTest(){ for(int i = 0 ; i < 10 ; i++){ vector.add(i + ""); } System.out.println(vector); } Duplicate codeCopy the code

  • Lock elimination

In order to ensure data integrity, this part of the operation needs to be synchronized, but in some cases, the JVM detects that there is no possibility of a shared data race, and then the JVM removes these synchronized locks. Such as synchronous control of variables within a method.

If there is no competition, why lock? So lock elimination can save time on pointless lock requests. Whether a variable escapes requires data flow analysis for the virtual machine, but isn’t it clear to the programmer? Put synchronization in front of a block of code that knows there is no data race?

However, sometimes programs do not use locks as they are supposed to. When using JDK built-in apis such as StringBuffer, Vector, HashTable, etc., locking is implicit. For example, append() for StringBuffer and add() for Vector:

public void vectorTest(){ Vector<String> vector = new Vector<String>(); for(int i = 0 ; i < 10 ; i++){ vector.add(i + ""); } System.out.println(vector); } Duplicate codeCopy the code

When running this code, the JVM can obviously detect that the vector has not escaped from the vectorTest() method, so the JVM can boldly eliminate the locking inside the vector.

  • spinlocks

Thread blocking and wake up need CPU from user state to core state, frequent blocking and wake up is a heavy burden for CPU, is bound to bring great pressure to the system concurrency performance. At the same time, we found that in many applications, object locks only last for a short period of time, and it is not worth it to frequently block and wake up threads for this short period of time.

So spin locks, what is a spin lock?

A spin lock means that when a thread tries to acquire a lock, if it is already occupied by another thread, it is constantly checked to see if the lock has been released, rather than entering a thread suspension or sleep state.

While it avoids the overhead of thread switching, it consumes CPU processor time. If the thread holding the lock releases it quickly, the spin is efficient, whereas the spinning thread consumes processing resources for nothing. So you need to compare the cost of CPU idling with the cost of thread switching. Blocking or waking up a JAVA thread requires the operating system to switch CPU state, which takes processor time. If synchronizing the contents of a code block is too simple, it is likely that the state transition will take longer than the user code executes. So you can continue after a short wait for threads, thread to wait for a moment, need to get the thread to spin, after the completion of a spin, the previous thread locking synchronous resources has released the lock, then the current thread can do not need to be blocked for synchronous resource directly, avoiding the thread switching costs. This is the spin lock.

Spinlocks have their own drawbacks; they are no substitute for blocking. Spin waiting avoids the overhead of thread switching, but it takes up processor time. If the lock is held for a short period of time, the spin wait works very well. Conversely, if the lock is held for a long time, the spinning thread is a waste of processor resources. Therefore, the spin wait time must be limited, and the thread should be suspended if the lock is not successfully acquired if the spin exceeds the limit (the default is 10 spins, which can be changed using -xx :PreBlockSpin).

  • Adaptive spin lock

Spin-locking was introduced in JDK1.4.2 and is enabled using -xx :+UseSpinning. It became default in JDK 6 and introduced adaptive spin locking (adaptive spin locking).

Adaptive means that the spin time (number of spins) is no longer fixed, but is determined by the previous spin time on the same lock and the state of the lock owner. If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time.

If spin is rarely successfully acquired for a lock, it is possible to omit the spin process and block the thread directly in future attempts to acquire the lock, avoiding wasting processor resources.

  • Reentrant lock

Also known as a recursive lock, the same lock can be acquired multiple times within a thread.

For example, if a thread executing a method with a lock calls another method that requires the same lock, the thread can simply re-enter the called method without regaining the lock.

  • Biased locking

Applicable scenarios:

There is no actual contention, and in the future only the first thread that requests the lock will use the lock.

Principle:

When the thread requests the lock object, it changes the status flag bit of the lock object to 01, that is, the bias mode. The ID of the thread is then recorded in the Mark Word of the lock object using the CAS operation. The thread can then go directly to the synchronized block without even requiring CAS operations. However, as soon as a second thread is competing for the lock, the bias mode ends and the lightweight lock enters the state. Then you just need to record the bias thread ID when the lock is first owned. The bias thread then holds the lock until a race occurs. After each synchronization, check whether the thread ID of the bias lock is the same as the current thread ID. If so, directly enter the synchronization. There is no need to go to CAS to update the object header every time the lock is added and unlocked. If the inconsistency means that there is competition, the lock is not always biased to the same thread. At this time, the lock expansion should be lightweight lock to ensure fair competition between threads.

Resource consumption

  • The thread needs to record the thread ID in the Mark Word of the lock object using the CAS operation when acquiring the lock for the first time. A user – to – kernel conversion is required.
  • Not a bus lock.

Reference:

Juejin. Cn/post / 684490…

Juejin. Cn/post / 684490…

Objcoding.com/2018/11/29/…

  • Lightweight lock

Applicable scenarios:

Pursuit of response time, no thread contention or few thread contention.

Principle:

Lightweight locks are upgraded from biased locks, and each time a lightweight lock is acquired or released by CAS atomic operations, the thread that fails does not block, but spins to try to acquire the lock again by CAS. If the number of failures is too many, the lightweight lock will expand to a heavyweight lock. Since spin is cpu-consuming, you can’t keep a thread spinning forever. Based on this, it can be seen that lightweight locking is best suited to scenarios where response times are pursued, ideally with a small number of threads alternately accessing synchronized blocks to acquire locks. < Inflate to heavyweight lock if thread spin wait fails due to lock contention >

Resource consumption:

  • Each time the thread obtains and releases the lock, it needs to update the object Mark Word through CAS operation. Compared with biased lock, there are more switching costs from user mode to kernel mode for many times.
  • CPU spin cost.
  • Not a bus lock.

www.cnblogs.com/paddix/p/54…

  • Heavyweight lock

Applicable scenarios:

Heavyweight locks are designed to prevent excessive CPU spin expansion when lightweight locks are heavily contested, so heavyweight locks must be used when a large number of threads are accessing a synchronized block at the same time. Once the thread that failed to acquire the lock is blocked, the CPU lets the valid program execute, so the throughput of the data increases.

Resource consumption:

  • Thread blocking and wake up need to call the kernel instruction implementation, CPU needs to switch between user mode and kernel mode.
  • When obtaining and releasing locks, mutex kernel instructions of the operating system need to be called to achieve, and CPU user mode and kernel mode switch resource consumption.
  • The bus lock is invoked using mutex. The CPU bus is locked and other CPU operations are blocked, resulting in significant performance degradation.

Synchronized usage

Whether the synchronized keyword is applied to a method or an object, if the object it acts on is non-static, it acquires the lock as an object. Synchronized acquires a class lock if the object it acts on is a static method or a class.

Reference: www.jianshu.com/p/29854dc7b…

Java synchronized keyword scope:

  1. For instance methods, the current instance object is locked.
  2. For static methods, the current class object is locked.
  3. Acting on a block of code, it locks objects configured in Synchronized, specified by the programmer.

The Volatile keyword

Visibility and order implementation principles (MESI protocol + Memory barrier)

The semantics of the volatile keyword at various levels:

Java language level

The volatile keyword

Virtual machine bytecode

No difference between

Operating system/assembly instructions

The lock instruction

The CPU hardware

Cache consistency protocol, memory barrier

Order: JVM+ hardware memory barrier implementation to prevent JVM – and CPU-level instructions from executing out of order.

Visibility: CPU cache data consistency is implemented by CPU cache consistency protocol. Cache memory data consistency is implemented by hardware memory barrier semantics.

Applicable scenario

It guarantees visibility and order, but it doesn’t guarantee atomicity.

Do not use volatile with getAndOperate; only set or GET scenarios are appropriate for volatile.

Why is volatile not atomic?

AtomicInteger increment. For example, if you increment a volatile integer (i++), there are 3 steps:

1) Read volatile to local;

2) Increase the value of the variable;

3) Write local back to make it visible to other threads. The JVM instructions for these three steps are:

mov    0xc(%r10),%r8d ; Load
inc    %r8d           ; Increment
mov    %r8d,0xc(%r10) ; Store
lock addl $0x0,(%rsp) ; StoreLoad Barrier
Copy the code

Note that the last step is the memory barrier.

Going back to the previous JVM instruction: There are four steps from Load to store to the memory barrier. In the last step, the JVM makes the value of the latest variable visible to all threads. In the last step, all CPU cores get the latest value, but the intermediate steps (from Load to Store) ** are not safe. If other cpus change the value, it will be lost.

Reference:

Juejin. Cn/post / 684490…

zhuanlan.zhihu.com/p/24401553

If you feel you have gained something, please click a “like” to share the useful knowledge with more people

## Welcome to the Nuggets: 5:30

## Follow wechat public account: 5:30 Society (Financial enlightenment of salarymen) ##