Synchronized keyword for concurrent programming

Whether an object is secure depends on whether it is accessed by multiple threads (access is how objects are accessed). To make objects thread-safe, synchronization mechanisms are needed to coordinate access to mutable state of objects. (Java uses synchronized, volatile variables, explicit locks, and atomic variables.)

1. The synchronized understanding

1.1 synchronized characteristics

The underlying synchronization is implemented using the operating system’s Mutex Lock.

Memory visibility: If you lock a variable, the value of the variable will be emptied from the working memory. Before the execution engine can use the variable, load or assign its value again. The variable must first be synchronized back to main memory (store and write operations).
Operation atomicity: Two synchronized blocks holding the same lock can only be entered serially

1.2 Memory semantics of locks

When a thread releases the lock, the JMM fluses the shared variables in the thread’s local memory to the main memory
When a thread acquires a lock, the JMM invalidates the thread’s local memory. This makes critical section code protected by the monitor have to read shared variables from main memory

1.3 Use of synchronized locks

Public synchronized void method() {} public synchronized void method() {} public synchronized void method() {} Synchronized (Lock. Class) {} ## static Object monitor = new Object(); synchronized (Lock. synchronized (monitor) {}Copy the code

2. Principle of synchronized lock

2.1 Synchronized low-level implementation

Synchronized’s underlying implementation is mainly distinguished between methods and code blocks

public class DemoSynchronized01 { private static final Object lock = new Object(); Public static void main(String[] args) {synchronized (lock) {system.out.println ("hello word"); }} public synchronized void test() {system.out.println ("test"); }}Copy the code

After compiling the code, view its bytecode, and the core code is as follows:

mac@wxw synchronize % javap -c DemoSynchronized.class Compiled from "DemoSynchronized.java" public class com.wxw.juc.synchronize.DemoSynchronized { public static int race; public com.wxw.juc.synchronize.DemoSynchronized(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]) throws java.lang.InterruptedException; Code: 0: iconst_0 1: istore_1 2: iload_1 3: iconst_2 4: if_icmpge 28 7: new #2 // class java/lang/Thread 10: dup 11: invokedynamic #3, 0 // InvokeDynamic #0:run:()Ljava/lang/Runnable; 16: invokespecial #4 // Method java/lang/Thread."<init>":(Ljava/lang/Runnable;) V 19: invokevirtual #5 // Method java/lang/Thread.start:()V 22: iinc 1, 1 25: goto 2 28: getstatic #6 // Field countDownLatch:Ljava/util/concurrent/CountDownLatch; 31: invokevirtual #7 // Method java/util/concurrent/CountDownLatch.await:()V 34: getstatic #8 // Field java/lang/System.out:Ljava/io/PrintStream; 37: getstatic #9 // Field race:I 40: invokevirtual #10 // Method java/io/PrintStream.println:(I)V 43: return static {}; Code: 0: iconst_0 1: putstatic #9 // Field race:I 4: new #13 // class java/util/concurrent/CountDownLatch 7: dup 8: iconst_2 9: invokespecial #14 // Method java/util/concurrent/CountDownLatch."<init>":(I)V 12: putstatic #6 // Field countDownLatch:Ljava/util/concurrent/CountDownLatch; 15: return }Copy the code

As you can see from the decompiled code, the JVM uses the ACC_SYNCHRONIZED tag for synchronization methods. For synchronous code blocks. The JVM uses monitorenter and Monitorexit to implement synchronization.

Synchronized modifies code blocks

The synchronization code block is implemented using monitorenter and Monitorexit directives. Monitorenter instructions can be understood as locks, and Monitorexit as locks. Each object maintains a counter that records how many times it has been locked. This counter is 0 for unlocked objects, increases to 1 when a thread acquires the lock (monitorenter), and increases again when the same thread acquires the lock again. When the same thread releases the lock (monitorexit directive), the counter decrement. When the counter is zero. The lock is released and other threads can acquire it.

Synchronized modification method

Synchronization at the method level is implicit. Synchronized methods have an ACC_SYNCHRONIZED flag in their constant pool. When a thread accesses a method, it checks for ACC_SYNCHRONIZED, and if so, obtains the monitor lock, then executes the method, and then releases the monitor lock. If another thread requests to execute the method at this point, it will be blocked because it cannot acquire the monitor lock. It is important to note that if an exception occurs during method execution and is not handled internally, the monitor lock is automatically released before the exception is thrown outside the method.

Both ACC_SYNCHRONIZED, monitorenter and monitorexit are implemented based on Monitor. In Java virtual machine (HotSpot), Monitor is implemented based on C++ and implemented by ObjectMonitor.

ObjectMonitor provides several methods, such as Enter, Exit, wait, notify, and notifyAll. Sychronized The Enter method of the objectMonitor is called when the lock is added, and the exit method is called when the lock is unlocked. In fact, only prior to JDK1.6 did synchronized implementations directly call ObjectMonitor’s Enter and exit, known as heavyweight locks. Why is it heavy to operate locks this way?

Java threads are mapped to OS native threads. Blocking or waking up a thread requires the help of the OPERATING system. This requires a transition from user state to core state, so state transitions take a lot of processor time. For simple synchronized blocks of code (such as get or SET methods modified by synchronized), state transitions can take longer than user code execution, so synchronized is a heavyweight manipulation in the Java language.

In JDK1.6, therefore, the optimization of lock a lot, and lightweight locks, biased locking, lock elimination, adaptive spin locks, lock coarsening (spin lock in 1.4 there is off by default, only JDK1.6 is on by default), these actions are in order to more efficient to share data between threads, solve the problem of competition.

2.2 lock elimination

Lock elimination refers to the elimination of locks that require synchronization on some code but are detected as impossible to compete for shared data when the virtual machine just-in-time compiler runs. The primary determination of lock elimination is based on data from escape analysis. If it is determined that all data on the heap in a piece of code will not escape and be accessed by other threads, it can be treated as stack data, as thread private, and synchronization locking is not necessary.

At JIT compilation time, the run context is scanned to remove locks that cannot possibly be contled

The reader may wonder if variable escape is determined using data flow analysis for the virtual machine, but the programmer himself should know better than to require synchronization when he knows there is no data contention. The answer is that a lot of synchronization is not done by the programmers themselves, and synchronized code is more common in Java programs than most readers might think. Take a look at the example in the following code, which is very simple and simply outputs the sum of three strings, with no synchronization either source code literal or program semantics.

We also know that since String is an immutable class, concatenation of strings is always done by generating new strings, so the Javac compiler automatically optimizes String concatenation. This translates to successive append () operations on StringBuffer objects prior to JDK 1.5, and to successive Append () operations on StringBuilder objects in JDK 1.5 and later, and the above code might look something like this.Now, do you still think this code doesn’t involve synchronization? Each stringBuffer.append () method has a synchronization block, and the lock is the SB object. The virtual machine looks at the variable sb and quickly discovers that its dynamic scope is limited within the concatString () method. That is, all references to sb will never “escape” outside of the concatString () method, and no other thread can access it, so the lock can be safely eliminated, and the code will execute immediately ignoring all synchronization after just-in-time compilation.

2.3 lock coarsening

In extreme cases, by expanding the range of locking, avoid repeatedly locking and unlocking

In principle, when writing code, it is always recommended to keep the scope of the synchronized block as small as possible and to synchronize only in the actual scope of the shared data. This is to keep the number of operations that need to be synchronized as small as possible, so that if there is a lock contention, the thread waiting for the lock can acquire the lock as quickly as possible.

Most of the time, the above principle is true, but if a series of consecutive operations repeatedly lock and unlock the same object, even if the locking operations occur in the body of the loop, frequent mutex synchronization can lead to unnecessary performance losses, even if there is no thread contention.

The sequential append() method in the code above is one such case. If the virtual machine detects a string of fragmented operations that lock the same object, it will extend (coarser) the scope of lock synchronization outside of the entire operation sequence, in this case, the append() operation, which locks the while loop so that it only needs to be locked once.

2.4 ObjectMonitor

ObjectMonitor, the UNDERLYING JVM model of synchronized, uses three bidirectional linked lists to store blocked threads: _CXQ (Contention Queue), _EntryList (EntryList), _WaitSet (WaitSet)

When a thread fails to acquire a lock and blocks, it will first be added to the _CXQ list, and the nodes of the _CXQ list will be further transferred to the _EntryList at some point.

The thread at the head of the _EntryList, called the supposed successor, is awakened when the thread holding the lock releases the lock, and then tries to preempt the lock.

When we call wait(), the thread is placed in _WaitSet. It is not placed again in _CXQ or _EntryList until notify()/notifyAll() is called. By default, the thread is placed in the _CXQ header.

The overall process of objectMonitor is as follows:

3. Lock the upgrade process

3.1 biased locking

Biased locking reduces the cost of lock acquisition for unified threads

In most cases, locks are not contested by multiple threads and are always acquired multiple times by the same thread

If a thread obtains the lock, the lock will enter the biased mode, and the structure of Mark Word will also change into the biased lock structure. When the thread requests the lock again, it does not need to do any synchronization operation, that is, the process of acquiring the lock only needs to check that the lock marker bit of Mark Word is biased lock 01. And ThreadID with the current ThreadID equal to Mark Word, which saves a lot of lock requests.

Not suitable for multi-threaded situations where lock competition is fierce

Biased locking is to join the new lock after Java 6, it is a kind of for locking operation means of optimization, through the study found that in most cases, the lock does not exist a multithreaded competition not only, and always by the same thread for many times, so in order to reduce the same thread locks (will involve some CAS operation, time-consuming) introduced by the cost of biased locking. Biased locking the core idea is that if a thread got a lock and then lock into bias mode, the structure of Mark Word become biased locking structure, when the thread lock request again, no need to do any synchronization operation, namely the process of acquiring a lock, which saves a large amount of relevant lock application operation, thus the performance of the provider. Therefore, in the case of no lock contention, biased locking has a good optimization effect, after all, it is very likely that the same thread applies for the same lock for many consecutive times. But for lock more competitive situation, biased locking failure, because such occasions is likely to lock the thread is not the same for each application, so this situation should not be used to lock, or you will do more harm than good, it is important to note that the biased locking failed, does not immediately into a heavyweight locks, but upgraded to a lightweight lock first.

The purpose of biased locking is to eliminate the synchronization primitives in the case of uncontested data and further improve the performance of the program. If a lightweight lock uses a CAS operation to eliminate synchronized mutex without contention, a biased lock eliminates the entire synchronization without contention, not even the CAS operation.

Assuming bias is enabled on the current VM (enable -xx :+UseBiasedLocking, which is the default in JDK 1.6), when the lock object is first acquired by a thread, the VM will set the flag bit in the object header to “01”, that is, bias mode. At the same time, the CAS operation is used to record the ID of the thread that obtained the lock in the Mark Word of the object. If the CAS operation is successful, every time the money process that holds the biased lock enters the lock related synchronization block, All virtual machines can no longer perform any synchronization operations (such as Locking, Unlocking, and Update of Mark Word).

The bias mode ends when another thread attempts to acquire the lock. Revoke Bias and revert to the unlocked (flag bit “01”) or lightweight locked (flag bit “00”) state, depending on whether the lock object is currently locked. Subsequent synchronization operations are performed as described above for the lightweight lock. The state transformation of biased lock, lightweight lock and the relationship between object Mark Word are shown in the figure.

Synchronized but uncontended program performance. It is also a trade-off optimization, that is, it is not always good for the program to run, and if most of the locks in the program are always accessed by multiple different threads, the bias pattern is redundant. On the premise of specific analysis, sometimes using parameter XX: -usebiasedlocking to prohibit biased locking optimization can improve performance.

Locking process

Find an idle Lock Record from the current thread’s stack frame and point the obj property to the current Lock object
When acquiring bias lock, various judgments will be made first, as shown in the flowchart of locking. Finally, only two scenarios can attempt to acquire lock: anonymous bias and batch rebias.
Use CAS to try to fill your thread ID into the lock object Markword. If the modification is successful, the lock will be obtained.
If the two scenarios in Step 1 are different or the CAS modification fails, the biased lock is revoked and the lightweight lock is upgraded.
If the thread succeeds in acquiring the biased lock, it simply checks whether the thread ID in the lock object Markword is itself each time it enters the synchronized block. If so, it directly enters the lock with almost no extra overhead.

Unlock process:

The obj attribute is set to null. The important point here is that the thread ID of the lock object markword is not restored to 0.
In the biased locking process, the state change of Markword is shown in the figure below:

3.2 Lightweight Locking process

Lightweight locks are upgraded from biased locks, which work when one thread enters a synchronized block and the biased locks are upgraded to lightweight locks when the second thread joins the lock contention.

Application scenario: Threads alternately execute synchronized blocks

If multiple threads access the same lock at the same time, this causes the lightweight lock to expand to the heavyweight lock

If biased locking fails, the virtual machine does not immediately upgrade to heavyweight locking, but instead attempts to use an optimization called lightweight locking (added after 1.6), in which the Mark Word structure also changes to lightweight locking. Lightweight locks improve application performance on the basis that “for the vast majority of locks, there is no competition for the entire synchronization cycle”, note that this is empirical data. It is important to understand that lightweight locks are suitable for scenarios where threads alternately execute synchronized blocks. If the same lock is accessed at the same time, this will cause the lightweight lock to expand into a heavyweight lock.

When the code enters the synchronization block, if the synchronization object is not locked (the Lock flag bit is in the “01” state), the VIRTUAL machine will first establish a space called Lock Record in the current thread’s stack frame. Store a copy of the product of the product of the taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban taliban

The virtual machine then uses the CAS action to try to update the object’s Mark Word to a pointer to the Lock Record. If the update succeeds, the thread owns the lock on the object, and the Mark Word lock bit (the last 2 bits of the Mark Word) changes to 00, indicating that the object is in a lightweight locked state. At this point, the thread stack and object header state are shown in the figure. Unlock process:

Assign the obj attribute to null
Restore the product temporarily stored in the displaced_header attribute to the markword of the locked object using CAS.

3.3 Heavyweight lock process

If the update operation (lightweight lock) fails, the virtual machine first checks to see if the object’s Mark Word points to the current thread’s stack frame. If it does, the current thread already has the lock on the object. If not, the lock object has been preempted by another thread. If there are more than two threads competing for the same lock, the lightweight lock is no longer valid. To inflate to the heavyweight lock, the status of the lock flag changes to “10”. The pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock also blocks.

If the object’s Mark Word still points to the thread’s lock record, Replace the product’s product with CAS operation and replace the product’s product with CAS operation. If the replacement is successful, the new synchronization process is completed. If the replacement fails, another thread has attempted to acquire the lock, and the suspended thread must be awakened at the same time the lock is released.

The reason lightweight locks improve application synchronization performance is that “for the vast majority of locks, there is no contest for the entire synchronization cycle”, which is an empirical data. Lightweight locks use CAS to avoid the mutex overhead if there is no contention, but if there is a lock contention, the CAS operation takes place in addition to the mutex overhead, so lightweight locks are slower than traditional heavyweight locks when there is contention.

Locking process: When lightweight locks compete, they swell into heavyweight locks.

Assign an ObjectMonitor and populate the related properties.
Change the lock object markword to this ObjctMonitor address + heavyweight lock marker bit (10)
Attempts to acquire the lock, or if that fails, attempts to spin the lock
If multiple attempts fail, encapsulate the thread as an ObjectWaiter, insert it into the CXQ list, and the current thread blocks
When other locks are released, the nodes in the linked list will be awakened, and the awakened node will try to obtain the lock again. After obtaining the lock successfully, it will remove itself from the CXQ (EntryList) linked list. The relationship between thread stack, lock object and ObjectMonitor at this time is shown as the following figure:

Unlock process:

Return the reentrant counter -1, the _recursions property in ObjectMonitor.
The lock is first released, and the owner property of the lock is set to null. At this point, other threads, such as spinning threads, can acquire the lock.
Wakes up the next thread node from EntryList or CXQ linked list.

3.4 small summary

Only one thread enters the critical section: the bias lock
Multiple threads alternately enter critical sections: lightweight locks
Multithreading into critical zone at the same time: heavyweight lock

The upgrade process

Biased locking:The JVM assumes that only one thread will execute the synchronized code (there is no contest environment), so the Mark Word will record the thread ID directly. If the thread executes the code, it will check whether the thread ID is equal to the thread ID. If the thread ID is equal, the current thread will acquire the lock and execute the synchronized code. If the thread ids are not equal, CAS is used to try to change the current thread ID. If the CAS is successful, the lock is still obtained and the synchronization code is executed

If CAS fails, it indicates that there is a competitive environment. At this time, biased Lock will be cancelled and upgraded to lightweight Lock. In the lightweight Lock state, the current thread will create Lock Record under the stack frame, and the Lock Record will copy the Mark Word information, and the Owner pointer points to the Lock object. When the thread executes the synchronization code, the CAS view points the Mark Word to the Lock Record. Assuming the CAS modification is successful, the lightweight Lock is obtained.

If the CAS fails, the adaptive spin (retry) is performed. If the CAS fails again after a certain number of spins, the lock is upgraded to a heavyweight lock

4. To summarize

4.1 Differences between Synchronised and ReentrantLock

Underlying implementation: Synchronized is a key word in Java and a LOCK at the JVM level; ReentrantLock is a LOCK implementation at the JDK level.
Manual release or not: synchronized does not need to manually obtain or release the lock. When an exception occurs, the lock is automatically released, preventing deadlocks. ReentrantLock If an exception occurs, deadlock may occur if the unLock() is not actively released, so ReentrantLock needs to be released in the finally block.
Fairness of lock: synchronized is unfair lock; ReentrantLock Is an unfair lock by default, but you can select a fair lock by parameter.
Interruptible: synchronized is not interruptible; ReentrantLock can be interrupted.
Flexibility: Synchronized allows a waiting thread to wait until a lock is acquired. The use of ReentrantLock is more flexible, with immediate success returns, response interrupts, timeouts, and so on.
Performance: With the continuous optimization of Synchronized in recent years, there is no significant difference between ReentrantLock and synchronized in performance, so performance should not be the main reason for choosing them. The official recommendation is to use synchronized as much as possible, except when synchronized does not meet the requirements, then Lock can be used.
Synchronized is used in combination with wait() and notify()/notifyAll() to implement the wait/notification mechanism.
The ReentrantLock class relies on the Condition interface and the newCondition() method.

Condition was introduced after JDK1.5 and has great flexibility, such as the ability to implement multiple notification, which means that multiple Condition instances (object monitors) can be created within a Lock object, thread objects can be registered in a given Condition, In this way, thread notification can be carried out selectively and thread scheduling is more flexible. When notifying with notify()/notifyAll(), the thread to be notified is selected by the JVM. The synchronized keyword is equivalent to a single Condition instance in the Lock object, to which all threads are registered. A notifyAll() method notifies all waiting threads, which is inefficient, whereas a Condition instance’s signalAll() method only wakes up all waiting threads registered in the Condition.

Added new features: (1) waiting can be interrupted; ② Can realize fair lock; ③ Can realize selective notification (lock can bind multiple conditions) :

4.2 Spin locks and adaptive spin locks

(1) Spin lock

When we discussed mutex synchronization earlier, we mentioned that the biggest impact of mutex synchronization on performance is the implementation of blocking. The operations of suspending and resuming threads need to be carried out in the kernel state, which brings great pressure to the system’s concurrency performance. At the same time, the virtual machine development team has noticed that in many applications, shared data is locked for only a short period of time, which is not worth suspending and resuming threads. If the physical machine has more than one processor that allows two or more threads to execute simultaneously in parallel, we can tell the next thread that requests the lock to “hold on” but not give up the processor’s execution time to see if the thread that holds the lock will release the lock soon. To make a thread wait, we simply make it perform a busy loop (spin), a technique known as spin locking.

The idea of a spin lock is to let one thread loop while it enters synchronous code that is already occupied by another thread, avoiding the context-switching overhead of blocking wake up.

advantages
- In many cases, the locked state of shared data is short-lived and it is not worth switching threads
- Avoid context-switching overhead by having the thread execute a busy loop to wait for the lock to be released without giving up the CPU
disadvantages
- If the lock is held by other threads for extended periods of time, there is a significant performance cost

Spin-locking was introduced in JDK 1.4.2, but is turned off by default. It can be enabled using the :XX:+UseSpinning parameter, which was turned on by default in JDK 1.6. Spin wait does not take the place of block, and regardless of the number of processors, spin wait itself while avoiding the thread overhead, but it is to take up processor time, therefore, if the lock occupied a very short time, the effect of spin wait will be very good, on the contrary, if the lock is the amount of time is very long, so spin thread will only drain processor resources, It doesn’t do any useful work, and it wastes performance. Therefore, there must be a limit to the spin wait time, and if the spin exceeds the limit and the lock is not successfully acquired, the thread should be suspended the traditional way. The default number of spins is 10, which can be changed by using the -xx :PreBlockSpin argument.

(2) Adaptive spin lock

Adaptive means that the time (number) of spins is no longer fixed
It is determined by the last spin time on the same lock and the state of the lock owner.

If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time. If spin is rarely successfully acquired for a lock, it is possible to omit the spin process and block the thread directly in future attempts to acquire the lock, avoiding wasting processor resources. For example, spin locks:

4.3 Synchronized locks can be degraded

Yes, the specific trigger time: in safePoint, the attempt to downgrade the lock will be triggered when the cleanup task is performed. When the lock is degraded, the following operations are performed:

Restores the markword object header of the lock object
Reset the ObjectMonitor, and then put the ObjectMonitor into the global free list for future use.

reference

The Internet’s most hardcore synchronized interview in-depth analysis
Synchronized lock