Fully understand the principle of synchronized, the upgrade of the lock

preface

The Synchronized principle is a difficult one in an interview. All kinds of materials on the Internet are too messy and difficult to understand. It took me a lot of time to sort out this note after reading a lot of materials and blogs. It will help you a lot.

1. Memory layout

To understand the principles of Synchronized, you must first understand the Memory layout of Java objects.

I will first introduce the Java memory layout.

When you create an instance object of a class using the keyword new, the object is stored in the memory heap, and it is allocated a memory address, have you thought about the following questions:

How does this instance object exist in memory?
How much memory does an Object occupy?
How are properties in objects allocated in memory?

Ps: There are many ways to create an object. You can think of some!

The layout of Java objects in memory is divided into three areas:Object head,The instance dataandAlignment filling. The diagram below:

The instance variables

Instance data. Store the attribute data information of the class, including the attribute information of the parent class.

If the object has property fields, there will be data information. If the object has no property fields, there will be no data.
It takes different bytes depending on the field type. For example, Boolean takes 1 byte, int takes 4 bytes, and so on. This portion of memory is aligned with 4 bytes. The order in which this part is stored is affected by the order in which the vm allocation policy parameters (FieldsAllocationStyle) and fields are defined in the Java source code. The default allocation policies of the HotSpot VM are Longs/Doubles, INTS, SHORTS/Chars, Bytes/Booleans, and Oops (Ordinary Object Pointers). As you can see from the allocation strategy, fields of the same width are always assigned together. If this condition is met, variables defined in the parent class appear before subclasses. If the CompactFields parameter is true (the default is true), narrower variables in the subclass may also be inserted into the gap between the parent class variables.

Fill in the data

Padding data does not have to exist, just for byte alignment. Since HotSpot VM’s automatic memory management system requires that the object’s starting address be an integer multiple of 8 bytes, in other words, the object’s size must be an integer multiple of 8 bytes. The object header is exactly a multiple (1 or 2) of 8 bytes, so when the object instance data part is not aligned, it needs to be filled by alignment.

Why align data?

One reason for field memory alignment is to have fields appear only in cached rows on the same CPU. If the fields are not aligned, then it is possible to have fields that span cached rows. That is, reading the field may require replacing two cache rows, and the storage of the field may pollute both cache rows at the same time. Both cases are detrimental to the efficiency of the program. In fact, the ultimate goal of filling it is for efficient computer addressing.

Object head

Object head is the basis of synchronized lock object, we focus on the analysis.

We can find its description in the Hotspot official documentation (below) :

object header Common structure at the beginning of every GC-managed heap object. (Every oop points to an object header.) Includes fundamental information about the heap object’s layout, type, GC state, synchronization state, and identity hash code. Consists of two words. In arrays it is immediately followed by a length field. Note that both Java objects and VM-internal objects have a common object header format.

As you can see, it is a common format for Both Java objects and objects inside virtual machines, and consists of two words (computer terms). In addition, if the object is a Java array, there must also be a piece of data in the object header to record the length of the array, because the virtual machine can determine the size of the Java object from the metadata information of ordinary Java objects, but not from the array metadata.

It says that the object header consists of two words. What are these two words? If we look up from the Hotspot documentation above, we can see that there are two other definitions: Mark Word pointer and Klass Pointer:

klass pointer The second word of every object header. Points to another object (a metaobject) which describes the layout and behavior of the original object. For Java objects, the “klass” contains a C++ style “vtable”. mark word The first word of every object header. Usually a set of bitfields including synchronization state and identity hash code. May also be a pointer (with characteristic low bit encoding) to synchronization related information. During GC, may contain GC state bits.

You can see the two words in the object header: the first word is mark Word and the second word is klass pointer.

Mark Word

That is, the tag field. Used to store the runtime data of the object itself, such as HashCode, GC generation age, lock status flags, locks held by threads, bias thread ids, bias timestamps, and so on. Mark Word is 32bit in 32-bit JVMS and 64bit in 64-bit JVMS. We open its source code package, the corresponding path/its/hotspot/SRC/share/vm/oops, Mark Word corresponding to c + + code markOop. The HPP, can see from the comments of what they are made, all the code in this paper is based on Jdk1.8.

Students who need source code can pay attention to the public account “Java Top Students”, reply “OpenJDK” free access.

Because the object header information is an additional storage cost unrelated to the data defined by the object itself, the Mark Word is designed to be a non-fixed data structure for storing more efficient data, reusing its storage space based on the state of the object itself.

Mark Word stores different contents in different lock states. In 32-bit JVMS, it stores the following:In 64-bit JVMS this is stored:

Although they vary in length across different bit JVMS, the basic composition is the same.

Lock flag bit (LOCK) : identifies the lock state. 11 indicates the state of the object to be collected by GC. Only the last two lock flags (11) are valid.
Biased_lock: indicates whether biased lock is used. Because the lock identifier of normal and biased locks is 01, it is impossible to distinguish them.
Age: Indicates the number of times an object is GC. When this threshold is reached, the object will be moved to the old age.
Object hashCode (hash) : Call system.identityHashcode () at runtime to evaluate, delay the evaluation, and assign the result here. When the object is locked, the calculated result of 31 bits is not enough to indicate that the hashcode is transferred to Monitor in bias locking, light locking, weight locking.
Biased lock Thread ID (JavaThread) : In biased mode, when a thread holds an object, the object is set to the ID of that thread. In subsequent operations, there is no need to attempt to acquire the lock.
Epoch: Bias lock In the CAS lock operation, the bias identifier indicates which lock the object prefers.
Ptr_to_lock_record: pointer to the lock record in the stack in the lightweight lock state. When lock acquisition is uncontested, the JVM uses atomic operations instead of OS mutexes. This technique is called lightweight locking. In the case of lightweight locking, the JVM uses CAS operations to set a pointer to the lock record in the object’s header.
Ptr_to_heavyweight_monitor: pointer to the object Monitor Monitor in the heavyweight lock state. If two different threads are competing on the same object at the same time, lightweight locking must be upgraded to Monitor to manage waiting threads. In the case of heavyweight locking, the JVM sets a pointer to Monitor on the object’s ptr_TO_HEAVYweight_monitor.

Klass Pointer

A type pointer is a pointer that an object points to its class metadata, and the VIRTUAL machine uses this pointer to determine which class instance the object is.

Array length (only array objects have it)

If the object is an array, there must also be a piece of data in the object header to record the length of the array. Because a virtual machine can determine the size of a Java object from the metadata information of a normal Java object, it cannot determine the size of an array from the metadata of an array.

At this point, we know the overall structure layout of objects in heap memory, as shown below:

2. Synchronized low-level implementation

Here we’ll focus on synchronized object locks. The lock identifier bit is 10 on both 32-bit and 64-bit machines, where the pointer points to the starting address of a monitor object (also known as a pipe or monitor lock). Each object has a Monitor associated with it, and the relationship between the object and its Monitor can be realized in various ways, such as: A monitor can be created and destroyed with an object or automatically generated when a thread tries to acquire an object lock, but when a monitor is held by a thread, it is locked. In the Java virtual machine (HotSpot), monitor is implemented by ObjectMonitor. Its main data structure is as follows (located in the ObjectMonitor. HPP file of the HotSpot virtual machine source code, implemented in C++)

ObjectMonitor() { _header = NULL; _count = 0; _waiters = 0, _recursions = 0; _object = NULL; _owner = NULL; _WaitSet = NULL; // Threads in wait state are added to _WaitSet _WaitSetLock = 0; _Responsible = NULL ; _succ = NULL ; _cxq = NULL ; FreeNext = NULL ; _EntryList = NULL ; // Threads in the lock block state are added to the list. _SpinClock = 0 ; OwnerIsThread = 0 ; }Copy the code

Let’s take a look at some key attributes in the source code above:

_WaitSet and _EntryList: Used to hold a list of ObjectWaiter objects (ObjectWaiter objects: each thread waiting for a lock is encapsulated as an ObjectWaiter object).
_owner: points to the thread holding the ObjectMonitor object.

When multiple threads access a piece of synchronous code at the same time, the _EntryList collection is entered first. When the thread obtains the object’s monitor, it enters the _Owner area and sets the owner variable in monitor to be in the current thread’s monitorCounter countIf the thread calls wait(), the currently held monitor is released, the owner variable is restored to null, the count is reduced by 1, and the thread enters the WaitSet to be awakened. If the current thread completes, it also releases the monitor(lock) and resets the value of the variable so that another thread can enter to acquire the monitor(lock). As shown below (Image source:Thread Synchronization) :

From this point of view, a Monitor object exists in the object header of every Java object (storing Pointers), which is how synchronized locks are acquired, and why any object in Java can be used as a lock. It is also the reason why notify/notifyAll/ Wait methods exist in the top-level Object.

Below we will further analyze the specific semantic implementation of synchronized at the bytecode level.

3. Synchronized modifies the underlying principle of code blocks

Now let’s redefine a synchronized modified block (I ++) and operate on the shared variable I in the block as follows:

public class TestSafeAddI { public int i; public void addI() { synchronized (this) { i++; }}}Copy the code

Using the decompile tool, view the compiled bytecode (complete) :

There are several tools for viewing ByteCode files. Here I provide two: Method 1: Luyten tool run the tool, then Settings select ByteCode, and then import the local. Class file. Students who need to change tools, in the public account “Java top students”, reply “Luyten” can be obtained. Method 2: If you are using idea editor, you can select the compiled. Class file in IDEA and View->Show ByteCode ps: I am using the latest version of IDEa2020.

class com.top.test.mutiTheread.TestSafeAddI
        Minor version: 0
        Major version: 52
        Flags: PUBLIC, SUPER

public int i;
        Flags: PUBLIC

public void <init>();
        Flags: PUBLIC
        Code:
        linenumber      3
        0: aload_0         /* this */
        1: invokespecial   java/lang/Object.<init>:()V
        4: return

public void addI();
        Flags: PUBLIC
        Code:
        linenumber      7
        0: aload_0         /* this */
        1: dup
        2: astore_1
        3: monitorenter
        linenumber      8
        4: aload_0         /* this */
        5: dup
        6: getfield        com/top/test/mutiTheread/TestSafeAddI.i:I
        9: iconst_1
        10: iadd
        11: putfield        com/top/test/mutiTheread/TestSafeAddI.i:I
        linenumber      9
        14: aload_1
        15: monitorexit
        16: goto            24
        19: astore_2
        20: aload_1
        21: monitorexit
        22: aload_2
        23: athrow
        linenumber      10
        24: return
        StackMapTable: 00 02 FF 00 13 00 02 07 00 10 07 00 11 00 01 07 00 12 FA 00 04
        Exceptions:
        Try           Handler
        Start  End    Start  End    Type
        -----  -----  -----  -----  ----
        4      16     19     24     Any
        19     22     19     24     Any
Copy the code

We will focus on the following code in bytecode:

3: monitorenter // enters the synchronization method //.......... Others omitted 15: Monitorexit // Exits the synchronization method 16: goto 24 // omitted others....... 21: Monitorexit // Exits the synchronization methodCopy the code

From the bytecode, we can see that the synchronized block is implemented using monitorenter and MonitoreXi instructions, where Monitorenter indicates the start of the synchronized block and Monitorexit indicates the end of the synchronized block.

When executing the Monitorenter command:

The current thread will attempt to hold the objectref’s monitor. If the entry counter of the objectref’s Monitor is 0, the thread can successfully acquire the monitor and set the counter value to 1.
If the current thread already owns objectref’s Monitor, it can re-enter the monitor and increment the counter by one. This is the reentrant nature of synchronized. (See this article about reentrant locks: Synchronized?)
If another thread already owns objectref’s monitor, the target lock object’s counter is not zero. The current thread will block until the executing thread completes.

When monitorexit is executed:

For the Java VIRTUAL machine, decrease the lock object counter by 1. When the counter drops to 0, the lock has been released. Other threads will then have the opportunity to own monitor.
If the counter is not 0, the current thread still holds the object lock.

Note that one Monitorenter directive can correspond to multiple Monitorexit directives. This is because the Java virtual machine needs to ensure that the acquired lock can be unlocked on both normal and abnormal execution paths. That is, the compiler will ensure that regardless of how the method completes, every Monitorenter directive called in the method executes its monitorexit counterpart, regardless of whether the method ends normally or abnormally. To ensure that monitorenter and Monitorexit can be paired correctly when the method exception completes, the compiler automatically generates an exception handler that claims to handle all exceptions. The exception handler is intended to execute monitorexit. You can also see from the bytecode that there is an additional Monitorexit directive, which is the monitorexit directive that is executed to release monitor when the exception ends.

4. Basic principle of synchronized modification method

Synchronized modifiers differ from modifiers to code blocks.

Let’s change the above method to synchronized:

public class TestSafeAddI { public int i; public synchronized void addI() { i++; }}Copy the code

The decompiled bytecode looks like this:

class com.top.test.mutiTheread.TestSafeAddI
        Minor version: 0
        Major version: 52
        Flags: PUBLIC, SUPER

public int i;
        Flags: PUBLIC

public void <init>();
        Flags: PUBLIC
        Code:
        linenumber      3
        0: aload_0         /* this */
        1: invokespecial   java/lang/Object.<init>:()V
        4: return

public synchronized void addI();
        Flags: PUBLIC, SYNCHRONIZED
        Code:
        linenumber      7
        0: aload_0         /* this */
        1: dup
        2: getfield        com/top/test/mutiTheread/TestSafeAddI.i:I
        5: iconst_1
        6: iadd
        7: putfield        com/top/test/mutiTheread/TestSafeAddI.i:I
        linenumber      8
        10: return
Copy the code

Monitorenter and Monitorexit are not present when synchronized is used. From the bytecode, we can see that the access tag of the synchronized method includes ACC_SYNCHRONIZED. The ACC_SYNCHRONIZED access flag identifies the method as a synchronized method, and the JVM uses the ACC_SYNCHRONIZED access flag to determine whether a method is declared as a synchronized method and therefore to execute the corresponding synchronized call. To enter this method, the Java virtual machine needs to perform monitorenter operations. To exit the method, either by returning normally or throwing an exception to the caller, the Java virtual machine needs to perform the Monitorexit operation.

Here the lock objects corresponding to the Monitorenter and Monitorexit operations are implicit. For instance methods, the lock object corresponding to these two operations is this; For static methods, the lock objects corresponding to these two operations are Class instances of their classes.

It’s also important to note that synchronized was an inefficient heavyweight lock in early Java versions. Because the monitor Lock relies on the Mutex Lock of the underlying operating system, the operating system needs to switch from user mode to core mode to implement the switch between threads. The conversion between these states takes a relatively long time and the time cost is relatively high, which is also the reason why the early synchronized efficiency is low.

5. Upgrade the lock

The lock upgrade can be understood as Java virtual machine optimization for synchronized

To minimize costly thread blocking, wake up operations, the Java virtual machine goes into spin, runs over the processor and polls to see if the lock is released before the thread enters the blocking state and if it can’t compete for the lock once it’s woken up. If the lock is released, the current thread does not have to block and acquires the lock. We call this a spin lock. After Java6, lightweight locking and biased locking were introduced to reduce the performance costs associated with acquiring and releasing locks. Biased Locking (For Biased Locking).

Lock upgrade: There are four lock states (as can be seen from the Mark Word diagram above) : lock free, bias, lightweight, and heavyweight. As locks compete, locks can be upgraded from bias locks to lightweight locks to heavyweight locks.

Ps: Some people think that Java doesn’t do lock degradation. In fact, lock degradation does occur, and when the JVM enters a SafePoint, it checks for idle Monitors and tries to degrade them.

Heavyweight locks have been analyzed in detail before. We’ll cover biased locking, lightweight locking, spin locking, and other optimizations for the JVM.

6, biased lock

Biased locking is a new addition to Java 6 that optimizes locking operations.

Biased locking is the most optimistic case: in most cases, locks are not only not contested by multiple threads, but are always acquired multiple times by the same thread. Therefore, biased locking is introduced to reduce the cost of acquiring locks for the same thread.

The core idea of biased lock is: if a thread obtains the lock, the lock will enter biased mode, and the structure of Mark Word will also change into biased lock structure. When the thread requests the lock again, it can directly obtain the lock without any synchronization operation. This saves a lot of work on the lock request and thus provides performance for the application.

If the lock object supports biased locking, the Java virtual machine records the address of the current thread in the tag field of the lock object via CAS and sets the last three digits of the tag field to 101. (Easy to understand, I put the structure diagram of Mark Word here again)

CAS is an atomic operation that compares whether the value of the destination address is equal to the expected value and, if so, replaces it with a new value.

During the rest of the run, whenever a thread requests the lock, the Java virtual machine only needs to determine whether the last three digits of the lock object flag field are 101, whether the current thread’s address is included, and whether the epoch value is the same as the epoch value of the lock object’s class. If so, the current thread holds the bias lock and can return directly.

Understand the epoch value:

Let’s start with the undo bias lock. When the thread requesting the lock and the thread address of the lock object flag field do not match (and the epoch value is equal, if not, then the current thread can bias the lock to itself), the Java virtual machine needs to revoke the bias lock. This undo process is cumbersome, requiring the thread holding the bias lock to reach a safe point and then replace the bias lock with a lightweight lock.

If one kind of lock object to the total number of undo exceeded a threshold (corresponding to the Java virtual machine parameters – XX: BiasedLockingBulkRebiasThreshold, the default is 20), then the Java virtual opportunities announced the biased locking failure of a class.

The idea is to maintain an epoch value in each class, which you can think of as the generation bias lock. When bias locking is set, the Java virtual machine needs to copy the EPOCH value into the tag field of the lock object.

When declaring a class’s biased lock invalid, the Java virtual machine increments the epoch value of the class to indicate that the biased lock of the previous generation is invalid. A new bias lock requires a copy of the new EPOCH value.

To ensure that a thread currently holding a biased lock does not lose the lock, the Java virtual machine needs to traverse the Java stack of all threads, find instances of that class that have been locked, and increment the epoch value in their tag field by one. This operation requires all threads to be in the safe point state.

If the total number of undo than another threshold (corresponding to the Java virtual machine parameters – XX: BiasedLockingBulkRevokeThreshold, the default value is 40), then the Java virtual opportunity to think this class is no longer suitable for biased locking. At this point, the Java virtual machine unlocks the biased lock of the class instance and sets the lightweight lock directly for the class instance during subsequent locking.

7. Lightweight locks

If a biased lock fails, it does not immediately expand to a heavyweight lock, but is upgraded to a lightweight lock first

Lightweight locking was introduced in Java6.

Lightweight locking is an optimistic case: multiple threads request the same lock at different times, meaning there is no lock contention.

The last two digits of the mark word field are used to indicate the lock status of the object. 00 represents a lightweight lock, 01 represents no lock (or biased lock), and 10 represents a heavyweight lock.

When a lock is added, the Java virtual machine determines whether it is already a heavyweight lock. If not, it marks out a space in the current frame of the current thread as the lock record for that lock, and copies the lock object’s mark field into the lock record (i.e. the mark field of the previous lock object). This value will be 0 if it is the same thread: the subsequent lock record is cleared to zero.

The Java virtual machine then attempts to replace the tag field of the lock object with a CAS (compact-and-swap) operation.

Assume that the current lock object has a mark field of X… XYZ, the Java virtual machine compares whether the field is X… X01 (the lock flag bit 01 indicates biased lock). If so, it is replaced with the address of the lock record that was just assigned. Because of memory alignment, the last two digits are 00 (the lock flag bit 00 indicates a lightweight lock). At this point, the thread has successfully acquired the lock and can continue execution.

If it’s not X… X01, so there are two possibilities. First, the thread repeatedly acquires the same lock (currently holding a lightweight lock). At this point, the Java virtual machine will clear the lock record to zero to indicate that the lock was acquired repeatedly (reentrant locks can be read below :). Second, other threads hold the lock (currently holding a lightweight lock). At this point, the Java virtual machine expands the lock to a heavyweight lock and blocks the current thread.

When to unlock operation, if the current lock record (you can imagine a thread lock all of the records as a stack structure, press the lock into a lock record at a time, unlock the popup a lock record, the record refers to is the lock on the top of the stack of the lock) a value of 0, the representative to enter the same lock, repeat them back.

Otherwise, the Java virtual machine attempts to use CAS to compare whether the value of the tag field of the lock object is the address of the current lock record. If so, it is replaced with the value in the lock record, which is the original tag field of the lock object. At this point, the thread has successfully released the lock.

If not, it means that the lock has been inflated to a heavyweight lock. At this point, the Java virtual machine enters the heavyweight lock release process, waking up the thread that was blocked competing for the lock.

8. Spin locks

When lightweight locking fails, the virtual machine also performs an optimization called spin locking to prevent threads from actually hanging at the operating system level.

This is based on in most cases, the thread holding the lock time not too long, if hang directly operating system level thread may do more harm than good, after all, the operating system to realize the need when switching between threads from user mode to kernel mode, the state transitions between needs a relatively long time, the time cost is relatively high.

So spinlock will assume that in the near future, the current thread lock can be obtained, therefore the virtual opportunity for current wants to make a few empty thread fetching the lock loop (which is also called spin), generally not too long, may be 50 or 100 cycles, after several cycles, if get the lock, I successfully enter the critical section. If the lock is not acquired, the thread is suspended at the operating system level.

This is how spin locking is optimized, and it can actually improve efficiency. Finally there is no way to upgrade to the heavyweight lock.

For example: We can use waiting at a traffic light as an example. The blocking of a Java thread is equivalent to a stalled stop, and the spin state is equivalent to an idle stop. If the red light waiting time is very long, then flameout parking is relatively fuel-efficient; If the red light wait time is very short, such as when we only do an integer addition in a synchronized block, then the lock is bound to be released within a short period of time, making idling more appropriate. However, in the case of the Java virtual machine, it cannot see the remaining time of the red light, and therefore cannot choose whether to spin or block based on the length of wait time. The Java virtual machine offers an adaptive spin solution that dynamically adjusts the spin time (the number of cycles) based on whether or not a lock can be acquired during the previous spin wait. In our case, if we had waited for the green light to turn green, we would have waited longer; If you did not turn off the fire before the green light, then the time to turn off the fire is shorter.

Another side effect of the spin state is an unfair locking mechanism. A blocked thread has no way to immediately compete for the released lock. A thread in a spin state, however, is likely to get the lock first. (See this article about fair and unfair locks: Fair and Unfair Locks – How ReentrantLock implements Fair and Unfair locks)

9. Lock elimination

Eliminate another lock lock is virtual machine optimization and the optimization more thoroughly, the Java virtual machine in the JIT compiler (can be simple to understand for the first time a piece of code will be executed when compile, also known as instant compiled), run through the context of the scan, remove lock there can be no Shared resource competition, in this way to eliminate unnecessary lock, Saves time on meaningless lock requests.

The append of the following StringBuffer is a synchronous method, but in the add method the StringBuffer is a local variable and is not used by other threads, so the StringBuffer cannot compete for shared resources and the JVM automatically locks it.

public class StringBufferRemoveSync { public void add(String str1, String str2) {//StringBuffer is thread-safe, since sb is only used in the append method and cannot be referenced by other threads // Therefore sb is a resource that cannot be shared. The JVM automatically removes the internal lock StringBuffer(); sb.append(str1).append(str2); } public static void main(String[] args) { StringBufferRemoveSync rmsync = new StringBufferRemoveSync(); for (int i = 0; i < 10000000; i++) { rmsync.add("abc", "123"); }}}Copy the code

conclusion

I didn’t tidy up enough. For example, I didn’t mention the compression pointer and field rearrangement of memory layout.

Inadequacies, students who have questions can leave a message to discuss oh.