0 foreword

I remember that when I began to learn Java, SYNCHRONIZED was used in the case of multi-threading. Compared to us at that time, synchronized was so magical and powerful that we gave it a name of “synchronization”, which also became a good medicine for us to solve multi-threading. However, as we progressed we learned that synchronized was a heavyweight lock prior to JDK1.5, so bulky compared to J.U.C. Lock that we gradually abandoned it as less efficient.

However, with all the improvements that Javs SE 1.6 has made to synchronized, synchronized doesn’t seem so heavy anymore. Let’s explore the basic use of synchronized, its implementation mechanism, how Java has optimized it, lock optimization mechanism, lock storage structure, and other upgrades.

1 Basic Usage

Synchronized is one of the most common and easiest ways to solve concurrency problems in Java. There are three main functions of Synchronized:

  1. Atomicity: access synchronization code that ensures threads are mutually exclusive;
  2. Visibility: Changes to a shared variable must be visible in the main memory before an unlock operation is performed. If you lock a variable, the working memory of the variable will be empty. Before the execution engine can use the variable, you need to load or assign the variable value from the main memory again.
  3. Order: effectively solve the reordering problem, i.e., “an unlock operation happens before another lock operation on the same lock”;

Syntactically, Synchronized can treat any non-null Object as a “lock,” and in the HotSpot JVM implementation, the lock has a special name: Object Monitor.

Synchronized has three uses:

  1. When synchronized acts on instance methods, the monitor lock is the object instance (this);
  2. When synchronized is applied to a static method, the monitor lock is the Class instance of the object. Because the Class data exists in the permanent generation, the static method lock is equivalent to a global lock of that Class.
  3. When synchronized acts on an object instance, the monitor lock is the object instance enclosed in parentheses;

Note that synchronized built-in lock is an object lock (locking objects rather than reference variables), whose granularity is object. It can be used to achieve mutually exclusive access to critical resources and is reentrant. Its greatest reentrant effect is to avoid deadlocks, such as:

A subclass synchronous method calls a parent class synchronous method. If there is no reentrant feature, a deadlock occurs.

2 Synchronization Principle

Data synchronization depends on locks. Who does lock synchronization depend on? Synchronized’s answer is to rely on the JVM at the software level, while J.U.C. Ock’s answer is to rely on special CPU instructions at the hardware level.

When a thread accesses a block of synchronized code, it first needs to acquire a lock to execute the synchronized code and must release the lock when exiting or throwing an exception. How does this work? Let’s start with a simple piece of code:

package com.paddx.test.concurrent;

public class SynchronizedDemo {
    public void method(a) {
        synchronized (this) {
            System.out.println("Method 1 start"); }}}Copy the code

View the decompiled result:

  1. Monitorenter: Each object is a monitor lock. The monitor is locked when it is occupied, and the thread attempts to acquire ownership of the Monitor when it executes the Monitorenter instruction as follows:

    1. If the number of entries to Monitor is 0, the thread enters monitor, then sets the number of entries to 1, and the thread is the owner of Monitor.
    2. If the thread already owns the monitor and just re-enters, the number of entries into the monitor is increased by one.
    3. If the monitor is occupied by another thread, the thread blocks until the number of monitor entries is zero, and then tries again to acquire ownership of the monitor.
  2. Monitorexit: The thread executing monitoreXit must be the owner of the monitor to which objectref corresponds. When the instruction is executed, the number of monitor entries decreases by 1. If the number of monitor entries decreases by 1, the thread exits the monitor and is no longer the owner of the monitor. Other threads blocked by the monitor can try to take ownership of the monitor.

    The monitorexit directive appears twice, the first time to release the lock for a normal exit from the synchronization. The second time the lock is released for asynchronous exit.

Synchronized semantics are implemented through a monitor object. In fact, wait/notify methods also rely on monitor objects. This is why only in the synchronized block or method calls to wait/notify method, otherwise will be thrown. Java lang. The cause of the exception IllegalMonitorStateException.

Let’s look at the synchronization method:

package com.paddx.test.concurrent;

public class SynchronizedMethod {
    public synchronized void method(a) {
        System.out.println("Hello World!"); }}Copy the code

View the decompiled result:

As a result of the compilation, the method is not synchronized through monitorenter and Monitorexit (which, in theory, could be), but has the ACC_SYNCHRONIZED identifier in its constant pool as opposed to regular methods. The JVM implements method synchronization based on this identifier:

When the method is invoked, the calling instruction will check whether the ACC_SYNCHRONIZED access flag of the method is set. If so, the executing thread will acquire monitor first, execute the method body after the method is successfully obtained, and release monitor after the method is executed. During method execution, the same Monitor object is no longer available to any other thread.

The two synchronization methods are essentially the same, except that method synchronization is done implicitly, without bytecode. The execution of the two instructions is realized by the JVM by calling mutex, the mutually exclusive primitive of the operating system. The blocked thread will be suspended and waiting for rescheduling, which will cause the switching between the “user mode and kernel mode”, which has a great impact on performance.

3 Synchronization Concepts

3.1 Java Object Headers

In the JVM, objects are laid out in memory in three areas: object headers, instance data, and aligned padding. As shown below:

  1. Instance data: store the attribute data information of the class, including the attribute information of the parent class;
  2. Align padding: Because the virtual machine requires that the object’s starting address be a multiple of 8 bytes. Padding data does not have to exist, just for byte alignment;
  3. Object: Java object headers typically hold two machine codes (in a 32-bit virtual machine, one machine code equals four bytes, or 32bit, and in a 64-bit virtual machine, one machine code equals eight bytes, or 64bit), but if the object is an array type, three machine codes are required. Because the JVM virtual machine can determine the size of a Java object from its metadata information, but cannot determine the size of an array from its array metadata, a block is used to record the length of the array.

Synchronized locks are stored in Java object headers. The Hotspot VIRTUAL machine object header contains two parts of data: Mark Word (Mark field) and Class Pointer (type Pointer). The VIRTUAL machine uses this Pointer to determine which instance of the Class the object is. The Mark Word is used to store the runtime data of the object itself. It is the key to achieve lightweight locking and biased locking. The Java object header structure is described as follows:

Mark Word is used to store the runtime data of the object itself, such as: HashCode, GC generation age, lock status flag, lock held by the thread, bias thread ID, bias timestamp, etc. Below is the storage structure of the Mark Word part of the Java object header without locking (32-bit virtual machine) :

Object header information is an additional storage cost unrelated to the data defined by the object itself. However, considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small amount of space memory. It will reuse its storage space according to the state of the object, that is, Mark Word will change as the program runs and may change to store the following four types of data:

On a 64-bit VM, Mark Word is 64-bit and its storage structure is as follows:

The last two bits of the object header store the flag bit of the lock. 01 is the initial state and is not locked. The object header stores the hash code of the object itself, and different contents are stored in the object header depending on the lock level. The bias lock stores the ID of the thread currently occupying the object; Lightweight stores Pointers to lock records in the thread stack. From here we can see that the “lock”, could be a lock head record + object reference pointer (whether a thread has lock the thread lock record address and object head pointer comparison), object may also be possible that the thread ID (to determine whether a thread has a lock head thread ID and object storage thread ID).

3.2 Mark Word in object header and Lock Record in thread

When a thread enters a block of synchronized code, if the synchronized object is not locked, i.e. its Lock flag bit is 01, the virtual machine first creates what we call a “Lock Record” in the stack of the current thread, which stores a copy of the Mark Word of the Lock object. Officials called the copy of the product the Taliban Mark Word. The entire Mark Word and its copies are crucial.

Lock Records are thread-private data structures, and each thread has a list of available Lock Records, as well as a global list of available records. Each locked object MarkWord is associated with a Lock Record (the Lock Word in the MarkWord of the object header points to the starting address of the Lock Record). At the same time, an Owner field in the Lock Record stores the unique identifier (or object Mark Word) of the thread that owns the Lock, indicating that the Lock is occupied by the thread. The internal structure of Lock Record is shown below:

Lock Record describe
Owner Initial NULL indicates that no thread currently owns the Monitor Record,The thread unique identifier is saved when the thread successfully owns the lockIs set to NULL when the lock is released.
EntryQ Associate a system mutex (Semaphore)Blocks all threads that attempt to lock a Monitor Record when it fails;
RcThis Represents the number of all threads that are blocked or waiting on the Monitor Record;
Nest Used to implementCount of reentrant locks;
HashCode Holds the HashCode value (and possibly GC age) copied from the object header.
Candidate Used to avoid unnecessary obstruction or waiting thread to wake up, because each time only one thread can have lock, if every time a lock is released before the thread wakes up all threads that are being blocked or wait for, will cause unnecessary context switch (from blocking to ready then lock failure because competition is blocked) resulting in serious decline in performance.Candidate has only two possible values: 0 means there are no threads to wake up and 1 means a successor thread to wake up to compete for the lock.

3.3 Monitor

Any object has a Monitor associated with it, and when a Monitor is held, it is locked. Synchronized implementations in the JVM are based on entering and exiting Monitor objects to implement method synchronization and code block synchronization. Although the implementation details vary, they are implemented through pairs of MonitorEnter and MonitorExit directives.

  1. MonitorEnter directive: Inserted at the beginning of the synchronized code block. When the code executes to MonitorEnter, it attempts to acquire ownership of the object Monitor, i.e., the lock on the object.
  2. MonitorExit directive: Inserted at the end of a method and at exceptions, the JVM guarantees that each MonitorEnter must have a MonitorExit;

So what is Monitor? It can be understood as a synchronization tool or described as a synchronization mechanism, and is usually described as an object.

Like all objects, all Java objects are born Monitor, and every Java object has the potential to become a Monitor, because in Java design, every Java object comes out of the womb with an invisible lock called an internal lock or Monitor lock.

The MarkWord lock identifier bit is 10, where the pointer points to the starting address of the Monitor object. In the Java virtual machine (HotSpot), Monitor is implemented by ObjectMonitor and its main data structure is as follows (located in the ObjectMonitor. HPP file of the HotSpot virtual machine source code, implemented in C++) :

ObjectMonitor() {
    _header       = NULL;
    _count        = 0; // Number of records
    _waiters      = 0,
    _recursions   = 0;
    _object       = NULL;
    _owner        = NULL;
    _WaitSet      = NULL; // Threads in wait state are added to _WaitSet
    _WaitSetLock  = 0 ;
    _Responsible  = NULL ;
    _succ         = NULL ;
    _cxq          = NULL ;
    FreeNext      = NULL ;
    _EntryList    = NULL ; // Threads in the waiting block state are added to the list
    _SpinFreq     = 0 ;
    _SpinClock    = 0 ;
    OwnerIsThread = 0 ;
  }
Copy the code

ObjectMonitor has two queues, _WaitSet and _EntryList, that hold the list of ObjectWaiter objects (each thread waiting for a lock is encapsulated as an ObjectWaiter object). _owner refers to the thread holding the ObjectMonitor object when multiple threads simultaneously access a piece of synchronized code:

  1. When the thread obtains the object’s monitor, it enters the _Owner area and sets the owner variable in monitor to the current thread. At the same time, the counter count in monitor increases by 1.
  2. If a thread calls wait(), it releases the currently held monitor, restores the owner variable to null, decreases count by 1, and enters the WaitSet to be woken up.
  3. If the current thread completes, monitor(lock) is also released and count is reset so that other threads can enter to acquire monitor(lock).

Also, Monitor objects exist in the object header Mark Word of every Java object. Synchronized locks are acquired in this way, which is why any object in Java can be used as a lock. Notify /notifyAll/wait methods use the Monitor lock object, so they must be used in synchronized code blocks.

Monitor Monitor has two synchronization modes: mutual exclusion and collaboration. In a multi-threaded environment, if data needs to be shared between threads, the problem of mutually exclusive access to data needs to be solved. The monitor can ensure that data on the monitor is accessed by only one thread at a time.

When do you need to collaborate? Such as:

One thread writes data to the buffer, and another thread reads data from the buffer. If the reader thread finds that the buffer is empty, it will wait. When the writer thread writes data to the buffer, it will wake up the reader thread. The JVM keeps itself waiting through the Wait method of the Object class. Upon calling the wait method, the thread releases the monitor it holds until it is notified by another thread. A thread calls notify to notify a waiting thread. The waiting thread does not execute immediately, but notifies it to release the monitor until it can retrieve it. If the monitor needed by the thread that just woke up is preempted by another thread, the thread continues to wait. The notifyAll method in the Object class solves this problem by waking up all waiting threads, with one thread always executing.

As shown in the figure above, a thread enters the Entry Set through gate 1. If there are no threads waiting in the Entry area, the thread takes the monitor as the Owner of the monitor and then executes the code for the monitor area. If there are other threads waiting in the entry area, the new thread will also wait with them. In the process of holding the monitor, the thread has two choices. One is to execute the code of the monitor area normally, release the monitor, and exit the monitor through gate 5. It may also Wait for a condition to appear, so it will go to the Wait Set through door 3 to rest until the corresponding condition is satisfied, and then go to the monitor again through door 4 to execute.

Note:

When a thread releases the monitor, the waiting threads in the entry area and the waiting area compete for the monitor. If the entry area thread wins, it enters through gate 2. If the thread in the waiting area wins, it enters through gate 4. A thread in the wait area can exit the wait area only through gate no. 4. In other words, a thread can only exit the wait state if it has acquired the monitor again.

Optimization of 4 locks

From JDK5, CAS atomic manipulation was introduced in modern operating systems (the synchronized keyword is not optimized in JDK5, so it has better performance in the Concurrent package in this version), starting with JDK6, Besides the CAS spin introduced by JDK5, optimization strategies such as adaptive CAS spin, lock elimination, lock coarser, biased lock and lightweight lock are also added. The keyword optimization greatly improves the performance, has clear semantics, simple operation, and does not need to be manually closed. Therefore, it is recommended to use this keyword when possible, and there is room for optimization in terms of performance.

A lock can be upgraded from a biased lock to a lightweight lock and then to a heavyweight lock. However, the upgrade of the lock is one-way, that is, it can only be upgraded from low to high, and there is no degradation of the lock.

Bias locking and lightweight locking are enabled by default in JDK 1.6. Bias locking can be disabled by -xx: -usebiasedlocking.

4.1 the spin lock

Thread blocking and wake up need CPU from user state to core state, frequent blocking and wake up is a heavy burden for CPU, is bound to bring great pressure to the system concurrency performance. At the same time, we found that in many applications, object locks only last for a short period of time, and it is not worth it to frequently block and wake up threads for this short period of time.

So spin locks, what is a spin lock?

A spin lock means that when a thread tries to acquire a lock, if it is already occupied by another thread, it is constantly checked to see if the lock has been released, rather than entering a thread suspension or sleep state.

Spin-locks are suitable for situations where the critical region protected by the lock is small and the lock is held for a short time. Spin waiting is not a substitute for blocking, and while it avoids the overhead of thread switching, it consumes CPU processor time. If the thread holding the lock releases the lock quickly, then the spin is very efficient, whereas the spin thread is wasting processing resources, not doing any meaningful work, typically squatting in the manger, which leads to wasted performance. Therefore, there must be a limit to the spin wait time (the number of spins), and if the spin exceeds the defined time and still does not acquire the lock, it should be suspended.

Spin-locking was introduced in JDK 1.4.2 and is turned off by default, but can be turned on using -xx :+UseSpinning and is turned on by default in JDK1.6. The default number of simultaneous spins is 10, which can be adjusted with the -xx :PreBlockSpin argument.

If you adjust the spin number of the spin lock with the -xx :PreBlockSpin parameter, it will cause a lot of inconvenience. It would be awkward if you set the parameter to 10, but many threads in the system release the lock just as you exit (if one or two more spins can acquire the lock). So JDK1.6 introduced adaptive spin locks to make virtual machines smarter and smarter.

4.2 Adaptive spin lock

JDK 1.6 introduced a more clever spin lock, known as adaptive spin locking. Adaptive means that the number of spins is no longer fixed, but is determined by the time of the previous spin on the same lock and the state of the lock owner. So how does it do adaptive spin?

If the thread spins successfully, it spins more next time, because the virtual machine thinks that if it succeeded last time, it will likely spin again, and it will allow the spin wait to last more times. On the other hand, if few spins succeed for a lock, the spins are reduced or even omitted in future attempts to retrieve the lock to avoid wasting processor resources.

With adaptive spin locking, the virtual machine will become smarter and more accurate in predicting the state of the application lock as application execution and performance monitoring information improves.

4.3 lock elimination

In order to ensure data integrity, this part of the operation needs to be synchronized, but in some cases, the JVM detects that there is no possibility of a shared data race, so the JVM removes these synchronized locks.

Lock elimination is based on the data support of escape analysis

If there is no competition, why lock? So lock elimination can save time on pointless lock requests. Whether a variable escapes requires data flow analysis for the virtual machine, but isn’t it clear to the programmer? Put synchronization in front of a block of code that knows there is no data race? But sometimes programs aren’t what we think they are, right? Locking is not shown, but when using JDK built-in apis such as StringBuffer, Vector, HashTable, etc., implicit locking occurs. For example, append() for StringBuffer and add() for Vector:

public void vectorTest(a){
    Vector<String> vector = new Vector<String>();
    for(int i = 0 ; i < 10 ; i++){
        vector.add(i + "");
    }

    System.out.println(vector);
}
Copy the code

When running this code, the JVM can obviously detect that the vector has not escaped from the vectorTest() method, so the JVM can boldly eliminate the locking inside the vector.

4.4 lock coarsening

When using a synchronized lock, you want to keep the scope of the synchronized block as small as possible — only synchronize in the actual scope of the shared data. The goal is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible.

In most cases, this is true. However, if a series of continuous lock unlocking operations, may lead to unnecessary performance loss, so the concept of lock slang is introduced.

The concept of lock slang is easy to understand, which is to connect multiple consecutive lock and unlock operations together to expand a wider range of locks

As in the above example:

Each time a vector is added, a lock is required. When the JVM detects that a vector has been consecutively locked and unlocked, a larger lock and unlock operation is merged and moved out of the for loop.

4.5 biased locking

Biased lock is an important introduction in JDK6, because HotSpot author found through research practice that in most cases, lock not only does not exist multi-thread competition, and is always obtained by the same thread many times, in order to make the cost of locking thread lower, the introduction of biased lock.

Biased locking is A mechanism used when A code block is executed by A single thread. In A multi-threaded environment (i.e. thread A has not completed the synchronization of the code block and thread B has initiated the lock application), it must be converted to A lightweight lock or A heavyweight lock.

Biased locking is turned off by default in JDK5, and enabled by default in JDK6. If the concurrent number is large and the synchronous code block execution time is long, the probability of being accessed by multiple threads at the same time is high. You can use the -xx: -usebiasedlocking parameter to prohibit biased locking (but this is a JVM parameter and cannot be set separately for an object lock).

The main purpose of introducing biased locking is to minimize unnecessary lightweight lock execution paths without multithreading competition. Because the locking and unlocking operation of lightweight lock needs to rely on multiple CAS atomic instructions, while biased lock only needs to rely on one CAS atomic instruction when replacing ThreadID (since biased lock must be revoked in case of multi-thread competition, So the performance loss of the undo operation of biased lock must also be less than the performance cost of the saved CAS atomic instruction).

Lightweight locks are designed to improve performance when synchronized blocks are executed alternately by threads, while biased locks are designed to further improve performance when synchronized blocks are executed by only one thread.

So how can biased locking reduce unnecessary CAS operations? First let’s look at the problem of uncontested locks:

Now almost all locks are reentrant, that is, the thread that has acquired the lock can lock/unlock the monitor multiple times. According to the previous HotSpot design, each lock/unlock involved some CAS operation (such as CAS operation on a waiting queue), which delayed the local call. So the idea of biased locking is that once the thread first acquires the monitor, it then “bias” the monitor to the thread, and subsequent calls can avoid the CAS operation. In other words, it sets a variable, and if it is found to be true, there is no need to go through the various locking/unlocking processes.

Why did CAS introduce local latency? It starts with the SMP (Symmetric multiprocessor) architecture, which is roughly illustrated below:

This means that all cpus share a system BUS (BUS) that connects to main memory. Each core has its own level 1 cache, and the cores are symmetrically distributed relative to BUS, so this structure is called “symmetric multiprocessor”.

The full name of CAS is compact-and-swap, which is an atomic instruction of CPU, And its function is to make CPU update the value of a certain position atomically after comparison. After investigation, it is found that its implementation is based on the assembly instruction of hardware platform, that is to say, CAS is implemented by hardware, And JVM only encapsulates assembly call. The AtomicInteger classes use these encapsulated interfaces.

Such as: Core1 and Core2 may Load a location in main memory into their L1 Cache at the same time. When Core1 changes the location in its L1 Cache, it “invalidates” the corresponding value in Core2’s L1 Cache across the bus. Once Core2 finds that its L1 Cache value is invalid (called Cache hit miss), it will load the latest value of the address from memory through the bus. The communication between people through the bus is called “Cache consistency traffic”, because the bus is designed for fixed “communication capacity”. If the Cache consistency traffic is too large, The bus will be the bottleneck. When the values in Core1 and Core2 are the same again, it is called “Cache consistency”. In this sense, the ultimate goal of lock design is to reduce Cache consistency traffic.

If there are many threads sharing the same object, the success of a Core CAS will inevitably cause a bus storm. This is called local delay. In essence, biased locking is to eliminate CAS and reduce Cache consistency traffic.

Cache consistency:

The Cache consistency mentioned above, there are protocol support, general agreement is now the msci (first by Intel support), specific reference: en.wikipedia.org/wiki/MESI_p… .

Exceptions to Cache consistent traffic:

Actually not all CAS would also lead to storm the bus, it has to do with the Cache consistency protocol, specific reference: blogs.oracle.com/dave/entry/…

Non-uniform Memory Access (NUMA) architecture:

SMP corresponds to asymmetric multi-processor architecture, which is now mainly applied to some high-end processors. The main feature is that there is no bus, no common main memory, and each Core has its own memory. This structure will not be discussed here.

So when a thread access synchronized block and obtain the lock, will head the object and stack frame lock in the record store to lock in the thread ID, after this thread to enter and exit the synchronized block don’t need to spend the CAS operation for lock resources, only need to check whether the biased locking, lock and identified as ThreadID, handling process is as follows:

  1. Check whether Mark Word is biased, that is, whether it is biased lock 1, and the lock identifier bit is 01.
  2. If yes, test whether the thread ID is the current thread ID. If yes, perform Step (5); otherwise, perform Step (3).
  3. If the test thread ID is not the current thread ID, then the CAS operation will compete for the lock, and the competition succeeds, then the Mark Word thread ID will be replaced with the current thread ID, otherwise the execution thread (4);
  4. The failure of CAS lock competition proves that there is multi-thread competition at present. When the global safety point is reached, the thread that obtains biased lock is suspended, the biased lock is upgraded to lightweight lock, and then the thread blocked at the safe point continues to execute synchronized code block.
  5. Execute synchronized code blocks;

Biased lock release adopts a mechanism that locks will be released only when competing. Threads will not release biased lock actively and need to wait for other threads to compete. Revocation of bias locks requires waiting for the global safe point (the point in time when no code is executing). The steps are as follows:

  1. Suspend a thread with a bias lock
  2. Check whether the lock object is still in the locked state. If not, the lock object is restored to the unlocked state (01) to allow other threads to compete. If yes, suspend the current thread holding the lock, and put the pointer pointing to the current thread’s lock record address into the object header Mark Word, upgrade to lightweight lock state (00), and then restore the current thread holding the lock, enter the lightweight lock competition mode;

Note: the process of suspending and resuming the current thread does not transfer the lock. It is still in the hands of the current thread, but with a “change the thread ID in the object header to a pointer to the lock record address”.

4.6 Lightweight Lock

The main purpose of introducing lightweight locks is to reduce the performance cost of traditional heavyweight locks using operating system mutex without multi-threaded competition. When the biased lock function is disabled or the biased lock is upgraded to lightweight lock due to multiple threads competing for biased lock, the lightweight lock will be tried to obtain, and the steps are as follows:

  1. When the thread enters the synchronization block, if the Lock status of the synchronization object is lockless (the Lock flag bit is “01”, whether the bias Lock is “0”), the VIRTUAL machine will first establish a space named Lock Record in the stack frame of the current thread, which is used to store the copy of the current Mark Word of the Lock object. The official product is the taliban Mark Word. The thread stack and object header state are as follows:

  2. Copy the Mark Word from the copy object header to the Lock Record.

  3. After the copy is successful, the VM uses the CAS operation to try to update the Lock Word in the object Mark Word to the pointer to the Lock Record in the current thread, and the owner pointer in the Lock Record to the Object Mark Word. If the update succeeds, perform Step (4); otherwise, perform Step (5).

  4. If the update succeeds, the current thread owns the lock on the object, and the object’s Mark Word lock bit is set to 00, indicating that the object is in a lightweight locked state. At this point, the thread stack and object header state are as shown below:

  5. If the update fails, the virtual machine first checks if the Lock Word in the Mark Word object refers to the stack frame of the current thread. If so, the current thread has the Lock on the object and can proceed directly to the synchronization block. Or multiple threads competition, locked into the spin execution (3), if still did not get at the end of the spin lock, lightweight lock expansion to heavyweight lock, lock flag status value to “10”, Mark Word is stored in the pointer to the heavyweight lock (mutex), the current thread and thread lock is waiting behind will also enter the blocking state.

The release of lightweight locks is also done through CAS operations. The main steps are as follows:

  1. Try to replace the current Mark Word with the product copied in the thread by CAS operation;
  2. If the replacement succeeds, the synchronization process is complete and the state is unlocked (01);
  3. If the replacement fails, another thread has attempted to acquire the lock (the lock has ballooned), and the suspended thread must be awakened at the same time the lock is released.

For lightweight locks, performance is improved on the basis that “for most locks, there is no contention for the entire life cycle”. If this basis is broken, in addition to the cost of mutual exclusion, there is additional CAS operation. Therefore, in the case of multi-threaded contention, lightweight locks are slower than weight locks.

  1. Why should the Mark Word in the object head be copied to the lock record in the thread stack when upgrading to a lightweight lock?

    This is because this value is used as the comparison condition of CAS when applying for the object lock. At the same time, when upgrading to the heavyweight lock, this comparison can be used to determine whether the lock has been applied for by other threads in the process of holding the lock. If it has been applied for by other threads, the suspended thread should be woken up when releasing the lock.

  2. Why is CAS unsuccessful and under what circumstances?

    CAS itself does not have a locking mechanism, which is derived by comparison. Assume the following scenario: Both thread A and thread B enter the lock status in the object head as unlocked. If thread A succeeds in updating the object header as its lock record pointer, thread B uses CAS to update the object header, it will find that the object header is no longer the object HashCode before its operation, so CAS will fail. That is, CAS failures occur when only two threads concurrently apply for locks.

    CAS spin thread B, then waiting for the object head lock mark the status or object to return without the lock head content is equal to the object HashCode (because it’s value) before the thread B do the CAS operation, which would mean the end of thread A execution (see behind the cancellation of the lightweight lock, only complete cancellation lock the thread A reset head). Thread B’s CAS operation finally succeeds, and thread B acquires the lock and the permission to execute the synchronization code. If thread A takes A long time to execute and thread B fails the CAS clock several times, the lock expands to A heavyweight lock, that is, thread B is suspended and blocked waiting for rescheduling.

How to understand “lightweight” here? “Lightweight” is in contrast to traditional locks implemented using operating system mutex. However, it is important to note that lightweight locks are not intended to replace heavyweight locks. They are intended to reduce the performance cost of traditional heavyweight locks without multi-threaded competition.

Lightweight locks are adapted to the scenario where threads alternately execute synchronized blocks. If the same lock is accessed at the same time, lightweight locks will inevitably expand to heavyweight locks.

4.7 Heavyweight Locks

Synchronized is implemented through a lock inside an object called a Monitor. But the essence of the monitor Lock depends on the underlying operating system Mutex Lock to implement. However, the operating system realizes the switch between threads, which requires the conversion from user state to core state. This cost is very high, and the conversion between states takes a relatively long time, which is why Synchronized has low efficiency. Therefore, this type of Lock, which relies on the implementation of the operating system Mutex Lock, is called a “heavyweight Lock.”

4.8 Switch between heavyweight, lightweight and biased locks

5 the pros and cons of lock

Locks are not substitutes for each other, but are different choices in different scenarios, and by no means are heavyweight locks inappropriate. Each lock can only be upgraded, can not be downgraded, that is, by partial lock -> lightweight lock -> heavyweight lock, and this process is the process of increasing overhead.

  1. If used in a single thread, biased locking is undoubtedly the least expensive and can solve the problem without CAS, just comparing object headers in memory.
  2. If there are competing threads, biased locks are upgraded to lightweight locks;
  3. If other threads fail a certain number of CAS attempts, the heavyweight lock is entered;

In the third case, you have to do partial lock creation, partial lock cancellation, lightweight lock creation, upgrade to heavyweight lock, and ultimately rely on the heavyweight lock to solve the problem, which is much more expensive than using the heavyweight lock directly. So which technique to use depends on your environment and scenario, in the vast majority of cases, biased locking works, based on the rule of thumb that HotSpot authors have found that most locks are only applied concurrently by the same thread.

6 Extended Data

  1. Synchronized implementation of JVM source analysis
  2. Spin lock, queue spin lock, MCS lock, CLH lock
  3. In-depth understanding of Java concurrency implementation principles of Synchronized