I remember that when I just began to learn Java, synchronized was used in the case of multi-threading. Compared to us at that time, synchronized was so magical and powerful that we gave it a name of “synchronization” at that time, which also became a good medicine for us to solve multi-threading. However, as we learn that synchronized is a heavyweight Lock, it can be so clunky compared to Lock that we dismiss it as less efficient. To be sure, synchronized doesn’t seem so heavy after all the improvements that Javs SE 1.6 has made to it. Follow LZ to explore the implementation mechanism of synchronized, how Java optimizes it, lock optimization mechanism, lock storage structure, and upgrade process;
Realize the principle of
Synchronized ensures that only one method or block of code can enter a critical section at any one time at runtime, and it also ensures memory visibility of shared variables
Every object in Java can be used as a lock, which is the basis for synchronized:
- For normal synchronization methods, the lock is the current instance object
- Statically synchronized methods where the lock is the class object of the current class
- Synchronized method block, lock is the object inside parentheses
When a thread accesses a block of synchronized code, it first needs to acquire a lock to execute the synchronized code, and must release the lock when exiting or throwing an exception. How does this work? Let’s start with a simple piece of code:
public class SynchronizedTest {
public synchronized void test1(){
}
public void test2(){
synchronized (this){
}
}
}Copy the code
Analyze the Implementation of Synchronize using the Javap tool to view the generated class file information
As you can see from above, the synchronized code block is implemented using monitorenter and Monitorexit directives, and the synchronized method (which is not visible from the underlying JVM implementation) relies on the ACC_SYNCHRONIZED implementation on the method modifier.
Synchronized code blockThe monitorenter directive inserts at the beginning of the synchronized block and the Monitorexit directive inserts at the end of the synchronized block. The JVM needs to ensure that each Monitorenter has a Monitorexit corresponding to it. Any object has a Monitor associated with it, and when a Monitor is held, it is locked. When the thread executes the Monitorenter instruction, it attempts to acquire the monitor ownership of the object, that is, the lock of the object.
Synchronized methods: Synchronized methods are translated into ordinary method invocation and return instructions such as invokevirtual and Areturn instructions. There are no special instructions at the VM bytecode level to implement synchronized modified methods. Instead, the synchronized position 1 in the access_flags field of the method is placed in the method table of the Class file, indicating that the method is synchronized and uses the object that called the method or the Class that the method belongs to as the internal OBJECT representation Klass of the JVM as the lock object. (from:www.cnblogs.com/javaminer/p…)
Let’s continue our analysis, but before we go any further we need to understand two important concepts: Java object headers, Monitor.
Java object header, Monitor
Java object headers and Monitor are the basis for synchronized! The following two concepts will be introduced in detail.
Java object head
Synchronized locks are stored in Java object headers. The Hotspot VIRTUAL machine object header contains two parts of data: a Mark Word (Mark field) and a Klass Pointer (type Pointer). The virtual machine uses this pointer to determine which instance of the class the object is. Mark Word is used to store the runtime data of the object itself. It is the key to achieve lightweight locking and biased locking, so it will be mainly described below
Mark Word.
Mark Word is used to store the runtime data of the object itself, such as HashCode, GC generation age, lock status flags, locks held by threads, bias thread ids, bias timestamps, and so on. Java object headers typically occupy two machine codes (in a 32-bit virtual machine, one machine code equals four bytes, or 32 bits), but three machine codes are required if the object is an array because the JVM can determine the size of a Java object from its metadata information. There is no way to determine the size of an array from its metadata, so a block is used to record the length of the array. Below is the storage structure of Java object headers (32-bit virtual machines) :
Object header information is an additional storage cost unrelated to the data defined by the object itself. However, considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small amount of space memory. It will reuse its storage space according to the state of the object, that is, The Mark Word changes as the program runs, and the change status is as follows (for 32-bit VMS) :
With a brief introduction to Java object headers, let’s move on to Monitor.
Monitor
What is Monitor? It can be understood as a synchronization tool or described as a synchronization mechanism, and is usually described as an object.
Like all objects, all Java objects are born Monitor, and every Java object has the potential to become a Monitor, because in Java design, every Java object comes out of the womb with an invisible lock called an internal lock or Monitor lock.
Monitor is a thread-private data structure, and each thread has a list of available Monitor Records, as well as a global list of available records. Each locked object is associated with a Monitor (the LockWord in the MarkWord of the object header points to the start address of the monitor), and an Owner field in the Monitor stores the unique identification of the thread that owns the lock, indicating that the lock is occupied by the thread. Its structure is as follows:
Owner: The initial value is NULL, indicating that no thread currently owns the Monitor Record. When the thread successfully owns the lock, the unique identifier of the thread is saved. When the lock is released, the value is set to NULL.
EntryQ: Associates a system mutex (Semaphore) that blocks all threads that attempt to lock a Monitor Record that fails.
RcThis: represents the number of all threads that are blocked or waiting on the Monitor Record.
Nest: used to count reentrant locks.
HashCode: Holds the HashCode value (and possibly GC age) copied from the object header.
Candidate: used to avoid unnecessary obstruction or waiting thread to wake up, because each time only one thread can have lock, if every time a lock is released before the thread wakes up all threads that are being blocked or wait for, will cause unnecessary context switch (from blocking to ready then lock failure because competition is blocked) resulting in serious decline in performance. Candidate has only two possible values: 0 means there are no threads to wake up and 1 means a successor thread to wake up to compete for the lock.
From:Implementation principle and Application of Synchronized in Java)
Synchronized is known to be a heavy-weight and inefficient lock, and this concept has always been in mind, but various optimizations were made for the Implementation of Synchronize in JDK 1.6 to make it seem less heavy. What about the optimizations used by the JVM?
Lock the optimization
Jdk1.6 introduces a number of optimizations to lock implementation, such as spin locking, adaptive spin locking, lock elimination, lock coarser, biased locking, lightweight locking and other techniques to reduce the overhead of locking operations. Locks mainly exist in four states, the order is: no lock state, biased lock state, lightweight lock state, heavyweight lock state, they will gradually upgrade with the fierce competition. Note that locks can be upgraded and not degraded. This strategy is intended to improve the efficiency of acquiring and releasing locks.
spinlocks
Thread blocking and wake up need CPU from user state to core state, frequent blocking and wake up is a heavy burden for CPU, is bound to bring great pressure to the system concurrency performance. At the same time, we found that in many applications, object locks only last for a short period of time, and it is not worth it to frequently block and wake up threads for this short period of time. So spin locks. What is spinlock? The idea of a spinlock is to let the thread wait a certain amount of time, without being immediately suspended, to see if the thread holding the lock will release it soon. How to wait? Perform a meaningless loop (spin). Spin waiting is not a substitute for blocking, not to mention the number of processors required (multi-core, it seems like there are no single-core processors anymore), and while it avoids the overhead of thread switching, it takes up processor time. If the thread holding the lock releases the lock quickly, then the spin is very efficient, whereas the spin thread is wasting processing resources, not doing any meaningful work, typically squatting in the manger, which leads to wasted performance. Therefore, there must be a limit to the spin wait time (the number of spins), and if the spin exceeds the defined time and still does not acquire the lock, it should be suspended. Spin-locking was introduced in JDK 1.4.2 and is turned off by default, but can be turned on using -xx :+UseSpinning and is turned on by default in JDK1.6. The default number of simultaneous spins is 10, which can be adjusted with -xx :PreBlockSpin; If you adjust the spin number of the spin lock with the -xx :preBlockSpin parameter, it will cause a lot of inconvenience. If I set the parameter to 10, but many threads in the system release the lock just after you quit (if you can get the lock with one or two more spins), you will not be embarrassed. So JDK1.6 introduced adaptive spin locks to make virtual machines smarter and smarter.
Adapt to spin locks
JDK 1.6 introduced a more clever spin lock, known as adaptive spin locking. Adaptive means that the number of spins is no longer fixed, but is determined by the time of the previous spin on the same lock and the state of the lock owner. How does it do that? If the thread spins successfully, it spins more next time, because the virtual machine thinks that if it succeeded last time, it will likely spin again, and it will allow the spin wait to last more times. On the other hand, if few spins are successful for a lock, the spins are reduced or even omitted in future attempts to acquire the lock to avoid wasting processor resources. With adaptive spin locking, the virtual machine will become smarter and more accurate in predicting the state of the application lock as application execution and performance monitoring information improves.
Lock elimination
In order to ensure data integrity, we need to synchronize this part of the operation, but in some cases, the JVM detects that there is no possibility of a shared data race, so the JVM removes these synchronization locks. Lock elimination is based on the data support of escape analysis. If there is no competition, why lock? So lock elimination can save time on pointless lock requests. Whether a variable escapes requires data flow analysis for the virtual machine, but is it not clear to us programmers? Do we put synchronization in front of blocks of code where we know there is no data race? But sometimes programs aren’t what we think they are, right? We don’t show locking, but when we use JDK built-in apis such as StringBuffer, Vector, HashTable, etc., there are implicit locking operations. For example, append() for StringBuffer and add() for Vector:
public void vectorTest(){
Vector<String> vector = new Vector<String>();
for(int i = 0 ; i < 10 ; i++){
vector.add(i + "");
}
System.out.println(vector);
}Copy the code
When running this code, the JVM can obviously detect that the vector has not escaped from the vectorTest() method, so the JVM can boldly eliminate the locking inside the vector.
Lock coarsening
We know that when using a synchronized lock, we need to keep the scope of the synchronized block as small as possible — only synchronize in the actual scope of the shared data. The goal is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible. In most cases, the above view is correct, and LZ has always adhered to this view. However, if a series of continuous lock unlocking operations, may lead to unnecessary performance loss, so the concept of lock slang is introduced. The concept of lock slang is easy to understand, which is to connect multiple consecutive lock and unlock operations together to expand a wider range of locks. If the JVM detects that a vector needs to be locked and unlocked consecutively, it will merge a larger range of lock and unlock operations and move the lock and unlock operations out of the for loop.
Lightweight lock
The main purpose of introducing lightweight locks is to reduce the performance cost of traditional heavy locks using operating system mutex without multithreading competition. When the biased lock function is disabled or the biased lock is upgraded to a lightweight lock due to multiple threads competing for the biased lock, the lightweight lock will be attempted to obtain. The procedure is as follows: Obtain the lock
- Determines whether the current object is in a lock-free state (hashcode, 0, 01). If so, the JVM first creates a space called a Lock Record in the current thread’s stack frame. Used to store the lock object is the Mark of Word’s copy (officials add that copy a Displaced prefix, which Displaced Mark Word); Otherwise, perform Step (3).
- JVM uses CAS operation to try to update the object’s Mark Word to point to the Lock Record. If the Lock is successfully contested, the Lock flag bit is changed to 00 (indicating that the object is in the lightweight Lock state) and the synchronization operation is performed. If no, perform Step (3).
- Determine whether the Mark Word of the current object points to the stack frame of the current thread. If it does, it means that the current thread has held the lock of the current object, and the synchronous code block is directly executed. Otherwise, it indicates that the lock object has been preempted by another thread. In this case, the lightweight lock should be expanded to the heavyweight lock, and the lock flag bit should be changed to 10. The waiting thread will enter the blocking state.
Release lock Lightweight locks are also released through CAS operations as follows:
- Retrieve data stored in the lightweight lock of herbivore herbivore product;
- Replace the extracted data in the Mark Word of the current object with CAS operation. If it succeeds, the lock is released successfully. Otherwise, perform (3).
- If the CAS operation fails to be replaced, it indicates that another thread is trying to obtain the lock. In this case, the suspended thread must be woken up when the lock is released.
For lightweight locks, its performance is improved on the basis of “for most locks, there is no competition in the whole life cycle”. If this basis is broken, in addition to the cost of mutual exclusion, there are additional CAS operations. Therefore, in the case of multi-threaded contention, lightweight locks are slower than weight locks.
The following figure shows the process of acquiring and releasing lightweight locks
Biased locking
The main purpose of introducing biased locking is to minimize unnecessary lightweight lock execution paths without multithreading competition. As mentioned above, the locking and unlocking operation of lightweight lock depends on multiple CAS atomic instructions. So how can biased locking reduce unnecessary CAS operations? We can see this by looking at the structure of Mark Work. You only need to check whether it is a biased lock, lock id is, and ThreadID. The process is as follows: Obtain the lock
- Check whether Mark Word is biased, that is, whether it is biased lock 1, and the lock identifier bit is 01.
- If yes, test whether the thread ID is the current thread ID. If yes, perform Step (5); otherwise, perform Step (3).
- If the thread ID is not the current thread ID, the CAS operation will compete for the lock, and the competition is successful, then the Mark Word thread ID will be replaced with the current thread ID, otherwise the execution thread (4);
- The failure of CAS lock competition proves that there is multi-thread competition at present. When the global safety point is reached, the thread that obtains biased lock is suspended, the biased lock is upgraded to lightweight lock, and then the thread blocked at the safe point continues to execute synchronized code block.
- Execute synchronized code blocks
The release of bias lock uses a mechanism that only competition can release the lock. Threads do not actively release bias lock and need to wait for other threads to compete. Revocation of bias locks requires waiting for the global safe point (the point in time when no code is executing). The steps are as follows:
- Suspend the thread with bias lock to judge whether the lock object stone is still locked;
- Undo bias Sue and restore to lockless state (01) or lightweight lock state;
The following figure shows the process of acquiring and releasing bias locks
Heavyweight lock
The heavyweight Lock is implemented through the internal monitor of the object. The essence of the monitor is the Mutex Lock implementation of the underlying operating system. The switching between the threads of the operating system needs to switch from the user state to the kernel state, and the switching cost is very high.
The resources
- Understanding the Java Virtual Machine
- Fang Tengfei: The Art of Java Concurrent Programming
- Implementation principle and Application of Synchronized in Java)