Note: some of the content in this article is excerpted from others blog, if there is infringement, please contact me, delete ~
This blog focuses on the implementation principle of synchronized keyword and various optimizations for synchronized after JDK 1.6. The use of synchronized is omitted.
Bloggers are still wondering
After reading this blog post, please help the blogger answer these three questions:
- What is the specific process of multithreading for Monitor? Is the current Monitor locked based on the _count value in ObjectMonitor?
- If the JVM detects that synchronous code is executing a StringBuffer in a single-threaded environment, does it perform lock elimination, or does it use biased locking?
- For the resetting process and expansion process of biased lock, the blogger just gives his own understanding on the basis of some blogs! No authority, suggest to read the source code, the blogger skeptical about the interpretation of knowledge of this part, if found in the process of reading, the blogger revocation of biased locking and expansion, please point out, thank ~ (basically do not have from the perspective of the source code on the net, for the detailed process of biased locking revoked and upgrade is also debated)
1 the introduction
Let’s start with a code:
public class SynchronizedTest {
public synchronized void test1(a) {}public void test2(a) {
synchronized (this) {}}}Copy the code
Javap decompilation analysis of it:
javap -c SynchronizedTest.class
Compiled from "SynchronizedTest.java"
public class org.xiyoulinux.SynchronizedTest {
public org.xiyoulinux.SynchronizedTest();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public synchronized void test1(a);
Code:
0: return
public void test2(a);
Code:
0: aload_0
1: dup
2: astore_1
3: monitorenter
4: aload_1
5: monitorexit
6: goto 14
9: astore_2
10: aload_1
11: monitorexit
12: aload_2
13: athrow
14: return
Exception table:
from to target type
4 6 9 any
9 12 9 any
}
Copy the code
Comparing the javAP output, let’s make a simple summary:
- Synchronized methods: Synchronized methods are translated as normal method calls. There are no specific instructions at the JVM bytecode level to implement synchronized modified methods. In the method table of the Class file, the synchronized position 1 in the access_flags field of the method indicates that the method is synchronized and uses the object (object lock) that calls the method or the Class (Class lock) to which the method belongs as the lock object.
- Synchronized blocks: The Monitorenter directive inserts at the beginning of the synchronized block and the Monitorexit directive inserts at the end of the synchronized block. The JVM needs to ensure that each Monitorenter has a Monitorexit corresponding to it. Any object has a Monitor associated with it, and when a Monitor is held, it is locked. When the thread executes the Monitorenter instruction, it attempts to acquire the monitor ownership of the object, that is, the lock on the object. (Why one Monitorenter directive corresponds to two Monitorexit directives in bytecode will be explained later.)
2 Synchronized underlying semantic principles
2.1 Java Object Headers
To understand how synchronized works, take a look at Java object headers.
The object in the heap consists of three parts:
- Instance variable: Holds the attribute data of the class, including the attribute information of the parent class, and the length of the array if it is the instance part of the array. This part of the memory is aligned by 4 bytes.
- Padding data: The vm requires that the start address of the object be a multiple of 8 bytes. Padding data does not have to exist, just for byte alignment.
- Object header: HotSpot Vm object header consists of two parts of data: Mark Word (Mark field) and Class Point (type pointer). The virtual machine uses this pointer to determine which instance of the Class the object is. Mark Word is used to store the runtime data of the object itself. It is the key to achieve lightweight locking and biased locking. It is also used to store the runtime data of the object itself, such as HashCode, GC generational age, lock status flags, locks held by threads, biased thread ids, biased timestamps, and so on.
Java object headers are typically two characters wide (in a 32-bit virtual machine, one character is four bytes, or 32 bits), but three characters are needed if the object is an array because the JVM can determine the size of a Java object from its metadata. There is no way to determine the size of an array from its metadata, so a block is used to record the length of the array.
Object headers are stored as follows:
The length of the | content | instructions |
---|---|---|
32/64 bit | Mark Word | Store objects such as hashCode or lock information. |
32/64 bit | Class Metadata Address | A pointer to data stored in an object type |
32/64 bit | Array length | The length of the array if the current object is an array |
The default Mark Word storage structure for 32-bit JVMS is as follows:
The lock state | 25bit | 4bit | 1bit whether bias lock | 2bit lock flag bit |
---|---|---|---|---|
Unlocked state | Object HashCode | Age of object generation | 0 | 01 |
Since the information in the object header is an additional storage cost unrelated to the data defined by the object itself, the Mark Word is designed to be a non-fixed data structure to store more efficient data in consideration of the SPACE efficiency of the JVM. It reuses its storage space according to the state of the object itself, such as under a 32-bit JVM. In addition to the Mark Word default storage structures listed above, there are also the following structures that may change:
2.2 the Monitor
How would you implement the mutual exclusion of synchronized?
Java is implemented through the Monitor mechanism. All Java objects are born Monitor, and in Java’s design, every Java object carries an invisible lock called a built-in lock or Monitor lock.
Look at the picture of the Mark Word storage structure (above) :
Here we focus on heavyweight locks, also known as synchronized object locks, with a lock identifier of bit 10, where the pointer points to the starting address of a monitor object (also known as a pipe or monitor lock). Each object has a Monitor associated with it, and the relationship between the object and its Monitor can be implemented in various ways. For example, the monitor can be created and destroyed together with the object or automatically generated when a thread tries to acquire an object lock, but when a monitor is held by a thread, it is locked. In the Java virtual machine (HotSpot), monitor is implemented by ObjectMonitor. Its main data structure is as follows :(located in the HotSpot virtual machine source ObjectMonitor. CPP file, implemented in C++)
ObjectMonitor() {
_header = NULL;
_count = 0; // Number of records
_waiters = 0,
_recursions = 0;
_object = NULL;
_owner = NULL;
_WaitSet = NULL; // Threads in wait state are added to _WaitSet
_WaitSetLock = 0;
_Responsible = NULL;
_succ = NULL;
_cxq = NULL;
FreeNext = NULL;
_EntryList = NULL; // Threads in the waiting block state are added to the list
_SpinFreq = 0;
_SpinClock = 0;
OwnerIsThread = 0;
}
Copy the code
ObjectMonitor has two queues, _WaitSet and _EntryList, that hold the list of ObjectWaiter objects (each thread waiting for a lock is encapsulated as an ObjectWaiter object). _owner refers to the thread holding the ObjectMonitor object. When multiple threads access a piece of synchronous code at the same time, the set of _EntryList will be entered first. When the thread obtains the object’s monitor, the _owner variable in the monitor will be set to the current thread. At the same time, the counter _count in monitor increases by 1. If a thread calls wait(), the currently held monitor is released, the _owner variable is restored to null, the _count is reduced by 1, and the thread enters the _WaitSet collection to be awakened. If the current thread completes, it also releases the monitor(lock) and resets the value of the variable so that another thread can enter to acquire the monitor(lock).
From this point of view, the Monitor object exists in the object header of every Java object (storing Pointers), and synchronized acquires the lock this way, which is why any object in Java can be used as a lock. It is also the reason why notify/notifyAll/ Wait methods exist in the top-level Object (the lock can be any Object, so methods that can be called by any Object are defined in the Object class).
2.3 Basic principles of synchronized method
We’ve already made a brief summary of the synchronized method in the introduction, but here’s a bit to add to it:
In earlier versions of Java, synchronized was a heavyweight Lock that was inefficient because the monitor Lock relied on the underlying operating system’s Mutex Lock, and the operating system’s switch between threads required a transition from user to core state. The conversion between these states takes a relatively long time and the time cost is relatively high, which is also the reason why the early synchronized efficiency is low. The good news is that Synchronized has been optimized from the JVM level since Java 6, so it’s now optimised pretty well. After Java 6, lightweight and biased locks were introduced to reduce the performance cost of acquiring and releasing locks, but we’ll talk more about lock optimization later.
2.4 Underlying principle of synchronized code block
In the introduction, we also made a brief summary of the synchronized block. Also, to add a little bit:
When monitorenter is executed, the current thread attempts to acquire the lock’s monitor. When the monitor’s counter is 0, the thread succeeds in acquiring the monitor and sets the counter to 1. If the current thread already owns the monitor of the object lock, it can re-enter the monitor, increasing the counter value by one. If another thread already has ownership of the monitor for the object lock, the current thread is blocked until the executing thread completes, that is, the Monitorexit directive is executed, the executing thread releases the Monitor and sets the counter value to 0, and the other thread has the opportunity to hold the Monitor. Note that the compiler will ensure that regardless of how the method completes, every Monitorenter directive called in the method will have a monitorexit directive corresponding to it, whether the method ends normally or abnormally. To ensure that monitorenter and Monitorexit can be paired correctly when the method exception completes, the compiler automatically generates an exception handler that handles all exceptions and is intended to execute monitorexit. You can also see from the bytecode that there is an additional Monitorexit directive.
3 Lock the optimization
3.1 Spin-locking and adaptive spin
As mentioned earlier, synchronized was referred to as “heavyweight locking” prior to JDK 1.6 because it was the implementation of blocking that had the biggest impact on mutex performance. The operations of suspending and resuming threads need to be carried out in kernel mode. Switching from user mode to kernel mode is performance-costly.
Studies have shown that in most cases, the thread holding the lock time not too long, if hang directly operating system level thread may do more harm than good, after all, the operating system the need when switching between threads from user mode to kernel mode, the state transitions between needs a relatively long time, the time cost is relatively high. Spinlock will assume that in the near future, the current thread lock can be obtained, therefore the virtual opportunity for current wants to make a few empty thread fetching the lock loop, the current thread is not to give up the processor execution time (which is also called spin), after several cycles, if get the lock, I successfully enter the critical section.
But spin is no substitute for blocking. First, spinlocks require multiple processors or CPU environments with multiple cores per processor in order to allow two or more threads to execute in parallel (one is the thread of execution that acquired the lock and the other is the thread that spins). In addition to the number of processors required, spin avoids the overhead of thread switching, but it consumes processor time. Therefore, if the lock is held for a short period of time, spin works better. Otherwise, it simply consumes CPU resources, resulting in wasted performance.
Then there is a limit to the spin. If the spin exceeds a certain number of times before the lock is successfully acquired, the suspension must be performed. By default, the number of spins is 10.
Splocks were introduced in JDK 1.4.2, and adaptive splocks were introduced in JDK 1.6. Adaptive means that the spin time is no longer fixed:
For the same lock object, if the spin wait has just successfully acquired the lock, and the thread holding the lock is running, the virtual machine will assume that the spin wait is also likely to succeed, and it will allow the spin wait to last a relatively long time, such as 100 cycles. If the spin is rarely successfully acquired for a lock, it is possible to omit the spin process automatically when acquiring the lock in the future to avoid wasting processor resources. With adaptive spin, as application execution and performance monitoring information continues to improve, the virtual machine will become more accurate in predicting the condition of the application lock, and the virtual machine will become “smarter”.
3.2 lock elimination
Lock elimination is an optimization of another type of virtual machine lock. This optimization is more thorough, and Java virtual machines are optimized for JIT compilation (see this blog post about JIT compilation: JVMS — parsing run-time optimizations and JIT compilers) eliminate unnecessary locks by scanning the runtime context for locks that cannot possibly compete for shared resources, saving meaningless lock request time.
Lock elimination is mainly determined based on the support of escape analysis technology (for escape analysis technology, you can refer to chapter 11 of the book “In-depth Understanding of Java Virtual Machine” by Zhou Zhiming or baidu).
You may be wondering if variables escape, but the programmer himself should be able to determine, how can there be synchronization when there is no data contention? Take a look at the following code:
public String concatString(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
Copy the code
Since String is an immutable class, concatenation of strings is always done with newly generated String objects. Prior to JDK 1.5, the Javac compiler automatically optimized String concatenation, A continuous append operation that converts a connection to a StringBuffer object, and after JDK 1.5, to a continuous append operation to a StringBuilder object. That is, javac-optimized code could look like this:
public String concatString(String s1, String s2, String s3) {
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
sb.append(s3);
return sb.toString();
}
Copy the code
A StringBuffer is thread-safe and has a synchronization block in its Append method. The lock object is SB, but the virtual machine looks at the variable sb and discovers that it is a local variable that is thread-safe and does not require additional synchronization. Therefore, there is a lock, but it can be safely cleared, and after JIT compilation, the code will ignore all synchronization and execute directly. This is lock elimination.
3.3 lock coarsening
In principle, when using synchronized blocks, it is always recommended to keep the scope of the synchronized block as small as possible – to keep the number of operations that need to be synchronized as small as possible so that the thread waiting for the lock can acquire the lock as soon as possible in the case of lock contention.
In most cases, the above principle is true, but there are special cases where a series of operations repeatedly lock and unlock the same object, even if the lock and unlock operations occur in the body of the loop, frequent mutex synchronization will cause unnecessary performance loss, even if there is no thread contention.
See the append method in the code above. If the virtual machine detects such an operation, the lock synchronization scope is extended (coarsened) outside the entire operation sequence. Take the code above as an example, which extends before the first append operation and after the last append operation, requiring only one lock.
3.4 biased locking
A biased lock is biased toward the thread that first acquired it, and if the lock is not acquired by another thread during subsequent execution, the thread holding the biased lock will never need to synchronize.
Previous research by HotSpot authors has found that in most cases locks are not contested by multiple threads, but are always acquired by the same thread multiple times (such as using the StringBuffer class in a single thread), and biased locks are introduced to make it cheaper for threads to acquire locks. When the lock object is first acquired by the thread, the virtual machine sets the flag bit in the object header to “01”, which is the bias mode. At the same time, the CAS operation is used to record the ID of the thread that obtained the lock in the Mark Word of the object. If the CAS operation is successful, the VM does not need to perform any synchronization operation every time the thread that holds the biased lock enters the lock related synchronization block.
The bias mode ends when another thread attempts to acquire the lock.
As shown above, the bias mode ends when thread 2 contests the lock object. Thread 2 tells thread 1 to revoke the partial lock, and thread 1 pauses at the global safe point (where there is no bytecode execution) to unlock the lock.
A biased lock can only be CAS operated by the first thread that acquires it. Once there is a thread competing for the lock object, other threads will fail to CAS any time.
After unlock success, the JVM will judge the current thread state, if you haven’t performed synchronized code block, directly will be biased locking expansion for lightweight lock, and then continue to implement the synchronized code block, otherwise will be biased locking is withdrawn first unlocked states, next time when performing synchronized code block by the JVM to expansion for lightweight lock it.
The advantage of using biased lock is that in the case of no multi-thread competition, only one CAS operation can be performed to execute the synchronized code block. However, we must ensure that the performance resources consumed by canceling biased lock are lower than those saved by eliminating the need to add and remove the lock.
3.5 Lightweight Lock
Biased locks will swell into lightweight locks once they are contested by multiple threads.
Biased locks execute synchronized blocks without doing any synchronization, whereas lightweight locks use CAS to eliminate synchronized mutex without thread blocking when multiple threads execute synchronized blocks alternately.
Lightweight lock and lock: If the synchronized object is not locked before the thread executes the synchronized block, the JVM will create a space for storing Lock Record in the stack frame of the current thread and copy the Mark Word in the header of the object into the Lock Record, which is officially called the product Mark Word. The thread then tries to use CAS to replace the Mark Word in the object header with a pointer to the lock record. If it succeeds, the current thread acquires the lock; if it fails, other threads compete for the lock, and the current thread attempts to acquire the lock using spin. If spin fails, the lightweight lock expands to a heavyweight lock.
Lightweight lock Unlock: In lightweight unlock, CAS operation will be used to replace the product of the product to the object head. If successful, no competition will occur. If it fails, it indicates that the current lock is competing, and the lock expands to a heavyweight lock. The following is a flow chart of two threads competing for a lock at the same time, resulting in lock inflation:
As shown in the figure above, when thread 1 is still using the lightweight lock to execute a synchronized code block, thread 2 attempts to compete for the lightweight lock and fails. After the failure, thread 2 does not directly expand the lightweight lock to the heavyweight lock, but first performs spin wait. If the lock is successfully acquired, the lock does not expand. After thread 2 successfully upgraded the lock, thread 2 blocked. Thread 1 attempts to unlock CAS after executing the synchronized code block, but fails to unlock CAS. If it finds that there are threads competing for the lock, it releases the lock and wakes up the waiting thread.
4 added
4.1 Upgrading locks
There are four states of lock, which are lockless state, partial lock state, lightweight lock state and heavyweight lock state. They will gradually upgrade with the fierce competition. Locks can be upgraded and cannot be degraded. This strategy is intended to improve the efficiency of acquiring and releasing locks.
4.2 Comparison of advantages and disadvantages of each status lock
The lock | advantages | disadvantages | Applicable scenario |
---|---|---|---|
Biased locking | Locking and unlocking require no additional cost, and there is only a nanosecond difference compared to implementing asynchronous methods. | If there is lock contention between threads, there is additional lock cancellation cost. | This applies to scenarios where only one thread accesses a synchronized block. |
Lightweight lock | Competing threads do not block, improving the response time of the program. | A thread that never gets a lock uses spin and consumes CPU. | Pursue response time. Synchronous block execution is very fast. |
Heavyweight lock | Thread contention does not use spin and does not consume CPU. | Threads are blocked and response time is slow. | The synchronization block execution speed is slow. |
5 concludes
- The low-level implementation of synchronized mainly relies on Monitor (pipe procedure);
- From the pipe side we need to extend to the Java object header;
- With Java object headers behind us, we can take a brief look at the underlying implementation of Monitor (ObjectMonitor).
- Familiar with the process of multithreading for Monitor;
- Finally, the synchronization method and synchronization block are classified.
- Familiar with lock coarsening, lock elimination, spin and adaptive spin, etc.
- Familiar with the concepts of bias lock, lightweight lock and heavyweight lock;
- Familiar with the process of biased lock and lightweight lock unlocking;
- Familiar with bias lock, lightweight lock, heavyweight lock expansion process.
6 Reference Reading
Deep Understanding of the Java Virtual Machine by Zhiming Zhou
In-depth understanding of Java concurrency implementation principles of Synchronized
Synchronized in Java SE 1.6
Java concurrency – In – depth analysis of synchronized implementation principles