Introduction to lock optimization in virtual Machines (Adaptive spin/lock coarser/lock elimination/Lightweight/biased locking)

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Efficient concurrency is an important theme in JDK 1.6, and the HotSpot VIRTUAL machine development team put a lot of effort into implementing lock optimization techniques in this release, Such as Adaptive Spinning, Lock Elimination, Lock inflation, Lightweight Locking, Biased Locking, etc. These techniques are all aimed at more efficient sharing of data between threads and resolving contention issues to improve program execution. When we discussed mutex synchronization earlier, we mentioned that the biggest impact of mutex synchronization on performance is the implementation of blocking. The operations of suspending and resuming threads need to be carried out in the kernel state. These operations put a lot of pressure on the system’s concurrency performance. At the same time, the virtual machine development team has noticed that in many applications, shared data is locked for only a short period of time, which is not worth suspending and resuming threads. If the physical machine has more than one processor that allows two or more threads to execute in parallel at the same time, we can tell the next thread that requests the lock to “wait a minute,” but not give up the processor’s execution time, to see if the thread holding the lock will release the lock soon. To make a thread wait, we simply have the thread perform a busy loop (spin), a technique known as spin locking. Spin-locking was introduced in JDK 1.4.2, but is turned off by default. It can be turned on using the -xx :+UseSpinning parameter, which was turned on by default in JDK 1.6. Spin wait does not take the place of block, and regardless of the number of processors, spin wait itself while avoiding the thread overhead, but it is to take up processor time, so if the lock occupied a very short time, the effect of spin wait will be very good, if the lock is the amount of time is very long, so spin thread will only drain processor resources, Instead of doing any useful work, it wastes performance. Therefore, there must be a limit to the spin wait time, and if the spin exceeds the limit and the lock is not successfully acquired, the thread should be suspended the traditional way. The default number of spins is 10, which can be changed by using the -xx :PreBlockSpin argument. Adaptive spin locking was introduced in JDK 1.6. Adaptive means that the spin time is no longer fixed, but is determined by the previous spin time on the same lock and the state of the lock owner. If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time, such as 100 cycles. On the other hand, if the spin is rarely successfully acquired for a lock, it is possible to omit the spin process in future attempts to acquire the lock to avoid wasting processor resources. With adaptive spin, as application execution and performance monitoring information improves, the virtual machine will become more and more accurate in predicting the condition of the application lock, and the virtual machine will become “smarter”. Lock deletion Lock deletion refers to the execution of the virtual machine just-in-time compiler to delete locks that require synchronization on code, but are detected to be impossible to compete for shared data. The primary determination of lock elimination is based on data support from escape analysis. If it is determined that in a piece of code, all data on the heap will not escape and be accessed by other threads, it can be treated as data on the stack. It can be treated as thread private, and synchronization locking is not necessary. The reader may wonder whether variables escape and need to be determined using data flow analysis for the virtual machine, but the programmer himself should be clear about how to require synchronization when he knows there is no data contention. The answer is that a lot of synchronization is not done by the programmers themselves, and synchronized code is more common in Java programs than most readers might think. Take a look at the example in the code listing below. This very simple piece of code simply outputs the sum of three strings, with no synchronization either source code literal or program semantics. Code manifest A piece of code that does not appear to be synchronized

 public String concatString(String s1, String s2, String s3) {  
     return s1 + s2 + s3;  
 }  
Copy the code

We also know that since String is an immutable class, concatenation of strings is always done by generating new strings, so the Javac compiler automatically optimizes String concatenation. This translates to successive append() operations on StringBuffer objects prior to JDK 1.5, and to successive Append () operations on StringBuilder objects in JDK 1.5 and later. The code in the code list might look something like this. Code listing Javac converted string concatenation operation


  public String concatString(String s1, String s2, String s3) {  
      StringBuffer sb = new StringBuffer();  
      sb.append(s1);  
      sb.append(s2);  
      sb.append(s3);  
      return sb.toString();  
  }  
Copy the code

(Note 1: To be honest, since we’re talking about lockshaving and escape analysis, the virtual machine can’t be a pre-jdk 1.5 version, so it’s actually converted to a non-thread-safe StringBuilder that does string concatenation without locking. But that doesn’t stop me from using this example to demonstrate the universality of synchronization in Java objects. Now, do you still think this code doesn’t involve synchronization? Each stringBuffer.append () method has a synchronization block, and the lock is the SB object. The virtual machine looks at the variable sb and quickly discovers that its dynamic scope is limited within the concatString() method. That is, all references to sb will never “escape” outside of the concatString() method, and no other thread can access it, so there is a lock, but it can be safely removed, and after immediate compilation, the code will ignore all synchronization and execute directly. Lock expansion in principle, at the time of writing code, we always recommend limit the scope of synchronized block small — just as far as possible in the actual scope of the Shared data to make the synchronization, it is to the number of operations that need to be synchronized as small as possible, if there is a lock contention, threads that wait for lock can lock as soon as possible. Most of the time, the above principle is true, but if a series of consecutive operations repeatedly lock and unlock the same object, even if the locking operations occur in the body of the loop, frequent mutex synchronization can lead to unnecessary performance losses, even if there is no thread contention. The sequential append() method in the code listing above is one of these cases. Virtual machine to detect if there will be a bunch of odds and ends at the operation of the same object lock, will be expanded the scope of the lock synchronization (inflation) outside of the operating sequence, listing, for example, is extended to the first append () operation until the last one before append () after the operation, so you just need to lock a can. Lightweight locking Lightweight locking is a new locking mechanism introduced in JDK 1.6. The word “lightweight” in its name refers to the traditional locking mechanism implemented using operating system mutex, so it is called “heavyweight” locking. It is important to note that lightweight locks are not intended to replace heavyweight locks. They are intended to reduce the performance cost of traditional heavyweight locks using OS mutex without multithreading. To understand lightweight locking and how biased locking works, we must start with the memory layout of the HotSpot VIRTUAL machine’s objects (the object header part). The Object Header of the HotSpot VIRTUAL machine is divided into two parts of information. The first part is used to store the runtime data of the Object itself, such as HashCode, GC Generational Age, etc. The length of this data is 32 Bits and 64 Bits respectively in 32-bit and 64-bit VMS. It is officially called the “Mark Word” and is the key to implement lightweight locking and biased locking. Another section is used to store Pointers to the type of the object in the method area, and an additional section is used to store the length of the array if it is an array object. Object header information is an additional storage cost unrelated to the data defined by the object itself. Considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much information as possible in a very small space. It will reuse its storage space according to the state of the object. For example, if the object is not locked in the 32-bit HotSpot VIRTUAL machine, 25Bits of the 32 Bits of Mark Word space are used to store the object HashCode, 4Bits are used to store the age of the object, 2Bits are used to store the lock flag bit, and 1Bit is fixed to 0. Table 13-1 describes the storage contents of objects in other states (lightweight lock, heavyweight lock, GC mark, biased). HotSpot VIRTUAL machine object header Mark Word

Store content	Sign a	state
Object hash code, object generation age	01	unlocked
A pointer to a lock record	00	Lightweight locking
Pointer to a heavyweight lock	10	Swell (heavyweight lock)
Empty, no information needs to be recorded	11	The GC tag
Bias thread ID, bias timestamp, object generation age	01	Can be biased

Having briefly introduced the memory layout of objects, we return to the execution of lightweight locks. When code enters the synchronization block, if the synchronization object is not locked (the Lock flag is in the “01” state), the virtual machine will first create a space called the Lock Record in the current thread’s stack frame. Used to store the lock object is the Mark of Word’s copy (officials add that copy a Displaced prefix, which Displaced Mark Word)

The virtual machine then uses the CAS action to try to update the object’s Mark Word to a pointer to the Lock Record. If the update succeeds, the thread owns the lock on the object, and the lock flag bit of the object Mark Word (the last two Bits of the Mark Word) changes to “00”, indicating that the object is in a lightweight locked state.

If the update fails, the virtual machine first checks to see if the Mark Word of the object points to the stack frame of the current thread. If it does, the current thread already owns the lock of the object and can proceed directly to the synchronization block. Otherwise, the lock object has been preempted by another thread. If there are more than two threads competing for the same lock, the lightweight lock is no longer valid. To inflate to the heavyweight lock, the status of the lock flag changes to “10”. The pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock also blocks. The above description is the process of adding lightweight lock, and its unlocking process is also carried out by CAS operation. If the object’s Mark Word still points to the lock record of the thread, the product’s current Mark Word and copied in the thread can be replaced by CAS operation. If the replacement succeeds, the synchronization process is complete. If the replacement fails, another thread has attempted to acquire the lock, and the suspended thread must be awakened at the same time the lock is released. The reason lightweight locks improve application synchronization performance is that “for the vast majority of locks, there is no contest for the entire synchronization cycle”, which is an empirical data. Lightweight locks use CAS to avoid the mutex overhead if there is no contention, but if there is a lock contention, the CAS operation takes place in addition to the mutex overhead, so lightweight locks are slower than traditional heavyweight locks when there is contention. Biased locking Biased locking is also a lock optimization introduced in JDK 1.6 that aims to further improve program performance by eliminating synchronization primitives in the case of uncontested data. If a lightweight lock uses a CAS operation to eliminate synchronized mutex without contention, a biased lock eliminates the entire synchronization without contention, not even the CAS operation. Biased lock “slant”, it is eccentric “slant”, partial “slant”. This means that the lock is biased in favor of the first thread that acquired it, and if the lock is not acquired by another thread during subsequent execution, the thread that holds the biased lock will never need to synchronize again. The principle of biased locking should be easy to understand if the reader understands the operation between the object header Mark Word and the thread in the previous lightweight locking. Assuming biased locking is enabled on the current VM (enable -xx :+UseBiasedLocking, which is the default in JDK 1.6), when the lock object is first acquired by a thread, the VM sets the flag bit in the object header to “01”, that is, biased locking mode. At the same time, the CAS operation is used to record the ID of the thread that obtained the lock in the Mark Word of the object. If the CAS operation is successful, every time the thread that holds the biased lock enters the lock related synchronization block, All virtual machines can no longer perform any synchronization operations (such as Locking, Unlocking, and Update of Mark Word). The bias mode ends when another thread attempts to acquire the lock. Revoke Bias and revert to the unlocked (flag bit “01”) or lightweight locked (flag bit “00”) state, depending on whether the lock object is currently locked. Subsequent synchronization operations are performed as described above for the lightweight lock.

Biased locking can improve the performance of programs with synchronization but no contention. It is also a trade-off optimization, meaning that it is not always good for the program to run. If most of the locks in the program are always accessed by multiple different threads, the bias pattern is unnecessary. On the premise of specific analysis, sometimes using -xx: -usebiasedlocking to prohibit biased locking optimization can actually improve performance.

Introduction to lock optimization in virtual Machines (Adaptive spin/lock coarser/lock elimination/Lightweight/biased locking)

Related Posts

Java Multithreading (11) Blocking queues in Java

Interviewer: You said you’ve used Dubbo. What about Dubbo’s SPI?

Recently learned ABTest knowledge