Click “like” to see, form a habit, the public account search [dime technology] pay attention to more original technical articles. This article has been included in GitHub org_Hejianhui /JavaStudy.

preface

  • Concurrent programming begins with a holistic understanding of the underlying workings of an operating system
  • In-depth understanding of the Java Memory model (JMM) and the volatile keyword
  • In-depth understanding of THE CPU Cache Consistency Protocol (MESI)

The main reasons for thread safety problems in concurrent programming are as follows: 1. Shared data 2. Multiple threads work together to share data. The keyword synchronized guarantees that only one thread can execute a method or block of code at any one time, and synchronized guarantees that a thread’s changes are visible (visibility), thus replacing volatile.

The significance of designing synchronizer

In multithreaded programming, it is possible for multiple threads to access the same shared, mutable resource at the same time. This resource is called a critical resource. Such resources could be: objects, variables, files, and so on.

  • Sharing: Resources can be accessed by multiple threads at the same time;
  • Mutable: A resource can be modified during its lifetime.

The problem: Since the process executed by the thread is uncontrollable, synchronization mechanisms are needed to coordinate access to the mutable state of the object.

How to solve the problem of thread concurrency safety?

In fact, all concurrent patterns solve thread-safety problems by serializing access to critical resources. That is, only one thread can access a critical resource at a time, also known as synchronous mutex access.

In Java, there are two ways to implement synchronized mutually exclusive access: synchronized and Lock

The essence of a synchronizer is locking. Purpose of locking: serialized access to critical resources, that is, only one thread can access critical resources at a time (synchronous mutually exclusive access)

However, there is one difference: when multiple threads execute a method, the local variables inside the method are not critical resources, because these local variables are in the private stack of each thread, resulting in no sharing and no thread-safety issues.

Analysis of synchronized principle

Synchronized internal lock is an object lock (the lock is an object rather than a reference), whose granularity is an object. It can be used to achieve mutually exclusive access to critical resources and can be reentrant.

Locking method:

  1. Synchronizes instance methods where the lock is the current instance object

A synchronized non-static method locks an instance of that class. A synchronized non-static method can only be invoked by multiple threads at the same time in the same instance.

public class Juc_LockOnThisObject {

    private Integer stock = 10;

    public synchronized void decrStock(a) {
        --stock;
        System.out.println(ClassLayout.parseInstance(this).toPrintable()); }}Copy the code
  1. Synchronizes class methods. The lock is the current class object

Synchronized modifies static methods by locking the class itself, not the instance. All synchronized static methods in the same class can only be called by multiple threads at the same time.

public class Juc_LockOnClass {
    private static int stock;
    public static synchronized void decrStock(a){ System.out.println(--stock); }}Copy the code
  1. Synchronized code block, lock is the object inside parentheses

A synchronized block directly locks a specified object. A synchronized block locks the object in multiple places. Only multiple threads can execute one of the blocks simultaneously.

public class Juc_LockOnObject {

    public static Object object = new Object();

    private Integer stock = 10;

    public void decrStock(a) {
        //T1,T2
        synchronized (object) {
            --stock;
            if (stock <= 0) {
                System.out.println("Stock sold out");
                return; }}}}Copy the code

Underlying principle of synchronized

Synchronized is based on the JVM built-in lock, through the internal object Monitor(Monitor lock), based on the entry and exit of Monitor object implementation method and code block synchronization, the implementation of Monitor lock depends on the underlying operating system Mutex lock (Mutex lock) implementation. It is a heavyweight lock with low performance. Of course, JVM built-in locking has been greatly optimized since version 1.5, Such as Lock Coarsening, Lock elimination, Lightweight Locking, Biased Locking, Adaptive (Spinning) and other techniques to reduce the overhead of locking operations, and the concurrency performance of built-in locks has been nearly equal to that of Locks.

The synchronized keyword is translated into bytecode when compiledmonitorentermonitorexitThe two instructions are respectively at the start and end positions of the synchronized block logic code.

Example:

public class Juc_LockOnObject {

    public static Object object = new Object();

    private Integer stock = 10;

    public void decrStock(a) {
        //T1,T2
        synchronized (object) {
            --stock;
            if (stock <= 0) {
                System.out.println("Stock sold out");
                return; }}}}Copy the code

The decompilation results are as follows:

Monitor Monitor lock

Each synchronization object has its own Monitor(Monitor lock), which is locked as follows:

Any object has a Monitor associated with it, and when a Monitor is held, it is locked.

Synchronized implementations in the JVM are based on entering and exiting Monitor objects to implement method synchronization and code block synchronization. Although the implementation details are different, Synchronized can be implemented through pairs of MonitorEnter and MonitorExit directives.

  • monitorenterEach object is a monitor lock. The monitor is locked when it is occupied, and the thread attempts to acquire ownership of the Monitor when it executes the Monitorenter instruction as follows:
    • If the number of entries to Monitor is 0, the thread enters monitor, then sets the number of entries to 1, and the thread is the owner of Monitor.
    • If the thread already owns the monitor and just re-enters, the number of entries into the monitor is increased by one.
    • If another thread already owns Monitor, it blocks until the number of monitor entries is zero, and then tries again to acquire ownership of monitor.
  • Monitorexit: The thread executing monitoreXit must be the owner of the monitor to which objectref corresponds. When the instruction is executed, the number of monitor entries decreases by 1. If the number of monitor entries decreases by 1, the thread exits the monitor and is no longer the owner of the monitor. Other threads blocked by the monitor can try to take ownership of the monitor.

Monitorexit, the instruction appears twice, the first time releasing the lock for a normal exit from synchronization; The second time for the occurrence of abnormal exit lock release;

Synchronized semantics are implemented through a monitor object. In fact, wait/notify methods also rely on monitor objects. This is why only in the synchronized block or method calls to wait/nofity method, otherwise will be thrown. Java lang. IllegalMonitorStateException abnormalities.

Example: Look at a synchronization method

package com.niuh;

public class SynchronizedMethod {
    public synchronized void method(a) {
        System.out.println("Hello Word!"); }}Copy the code

Decompilation result:As a result of the compilation, method synchronization does not pass instructionsmonitorentermonitorexit(in theory, it could be done with these two instructions), but it has a larger constant pool than the normal methodACC_SYNCHRONIZEDIdentifier.

The JVM implements method synchronization based on this identifier: When the method is invoked, the calling instruction will check whether the ACC_SYNCHRONIZED access identifier of the method is set. If so, the executing thread will acquire monitor first, execute the method body after the method is successfully obtained, and release monitor after the method is executed. During method execution, no other thread can obtain the same Monitor object.

The two synchronization methods are essentially the same, except that method synchronization is done implicitly, without bytecode. The execution of the two instructions is realized by the JVM by calling mutex, the mutually exclusive primitive of the operating system. The blocked thread will be suspended and waiting for rescheduling, which will lead to the direct switching between “user state and kernel state”, which has a great impact on performance.

What is Monitor?

It can be understood as a synchronization tool or described as a synchronization mechanism, and is usually described as an object. Like all objects, all Java objects are born Monitor, and every Java object has the potential to become a Monitor, because in Java design, every Java object comes out of the womb with an invisible lock called an internal lock or Monitor lock. The MarkWord lock identifier bit is 10, where the pointer points to the starting address of the Monitor object. In the Java virtual machine (HotSpot), Monitor is implemented by ObjectMonitor and its main data structure is as follows (located in the ObjectMonitor. HPP file of the HotSpot virtual machine source code, implemented in C++) :

ObjectMonitor() {
    _header       = NULL;
    _count        = 0; // Number of records
    _waiters      = 0,
    _recursions   = 0;
    _object       = NULL;
    _owner        = NULL;
    _WaitSet      = NULL; // Threads in wait state are added to _WaitSet
    _WaitSetLock  = 0 ;
    _Responsible  = NULL ;
    _succ         = NULL ;
    _cxq          = NULL ;
    FreeNext      = NULL ;
    _EntryList    = NULL ; // Threads in the waiting block state are added to the list
    _SpinFreq     = 0 ;
    _SpinClock    = 0 ;
    OwnerIsThread = 0 ;
  }
Copy the code

ObjectMonitor has two queues, _WaitSet and _EntryList, that hold the list of ObjectWaiter objects (each thread waiting for a lock is encapsulated as an ObjectWaiter object). _owner refers to the thread holding the ObjectWaiter object, when multiple threads simultaneously access a piece of synchronized code:

  1. It’s going to go in first_EntryListCollection, entered when the thread gets the object’s monitor_ownerField and set the _owner variable in monitor to the current thread and the counter in monitorcountAdd 1;
  2. If the thread callswait()Method,The currently held Monitor is released, the owner variable is restored to null, count is reduced by 1, and the thread enters_WaitSetWaiting to be awakened in the collection;
  3. If the current thread completes, the schema also releases the monitor and resets the count value so that other threads can enter and acquire the monitor.

Also, Monitor objects exist in the object header Mark Word of every Java object. Synchronized locks are acquired in this way, which is why any object in Java can be used as a lock. Notify /notifyAll/wait methods use the Monitor lock object, so they must be used in synchronized code blocks. Monitor Monitor has two synchronization modes: mutual exclusion and collaboration. In a multi-threaded environment, if data needs to be shared between threads, the problem of mutually exclusive access to data needs to be solved. The monitor can ensure that data on the monitor is accessed by only one thread at a time.

Well, we know that synchronized adds a lock to an object, but how does the object record the lock state? The lock state is recorded in the Mark Word for each object. Let’s look at the memory layout of the object.

Object memory layout

On the HostSpot VM, the layout of objects stored in memory is divided into three areas: Header, Instance Data, and Padding.

  • Object headers: such as hash code, age of the object, object lock, lock status identifier, bias lock (thread) ID, bias time, array length (array object), etc. Java object headers typically take up two machine codes (in a 32-bit virtual machine, one machine code equals four bytes, or 32bit, and in a 64-bit virtual machine, one machine code equals eight bytes, or 64BIG), but if the object is of array type, three machine codes are required. Because the JVM can determine the size of a Java object from its metadata, but not from its array metadata, it uses a block to record the length of the array.
  • Instance data: store the attribute data information of the class, including the attribute information of the parent class;
  • Align padding: Because the virtual machine requires that the starting position of the object be an integer multiple of 8 bytes. Padding data does not have to exist, just for byte alignment;

Object head

The HotSpot VIRTUAL machine object header contains two pieces of information

The first part is the “Mark Word”, which is used to store the runtime data of the object itself, such as HashCode, GC generation age, lock status identification, thread-held locks, biased lock ID, biased timestamp, etc. It is the key to implement lightweight and biased locks.

The length of this data is 32 Bits and 64 Bits, respectively, on 32-bit and 64-bit VMS (regardless of the scenario where the compression pointer is enabled). The data is officially called a Mark Word. Objects need to store a lot of runtime data, which is actually beyond the 32 – and 64-bit Bitmap structures can record, but the object header information is an additional storage cost unrelated to the data defined by the object itself. Considering the space efficiency of the VIRTUAL machine, Mark Word is designed as a non-fixed data structure to store as much information as possible in a very small amount of memory, which is reused to its own storage space according to the state of the object.

For example, in the Mark Word 32-bit Bits space where the object is not locked in the 32-bit HotSpot VIRTUAL machine:

  • 25Bits for storing object hash codes
  • 4Bits Stores the generational age of an object
  • 2Bits Stores the lock identifier bit.
  • 1Bits the value is 0
  • The following table describes the object storage contents in other states (lightweight lock, heavyweight lock, GC mark, biased)

If the object is an array, three machine codes are required, because the JVM can determine the size of a Java object from its metadata, but not from its array metadata, so it uses a block to record the length of the array.

Object header information is an additional storage cost unrelated to the data defined by the object itself. However, considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small amount of space memory. It will reuse its storage space according to the state of the object, that is, Mark Word changes as the program runs. The change status is as follows:

32-BIT VMS

64-bit VMNow our virtual machine is basically 64-bit, and 64-bit object headers are a bit of a waste of space,The JVM turns pointer compression on by default, so basically the object header is also recorded in 32-bit form.

Manual setting -xx :+UseCompressedOopsCopy the code

What information will be compressed?

  1. Object’s global static variables (that is, class attributes)
  2. Object header information: On the 64-bit platform, the size of the native object header is 16 bytes, and after compression, it is 12 bytes
  3. Object reference type: 64-bit platform, the size of the reference type itself is 8 bytes, compressed to 4 bytes
  4. Object array type: On the 64-bit platform, the array type itself is 24 bytes in size, compressed 16 bytes

In Scott Oaks’ Java Performance Guru’s Guide, chapter 8, Section 8.22, it is mentioned that when heap size is greater than 32GB, the compressed pointer is not used, and object references take up about 20% of the heap space, meaning that 38GB of memory is equivalent to 32GB of heap space with pointer compression enabled.

Why is that? Look at the following the scarlet letter in the reference (from its wiki:wiki.openjdk.java.net/display/Hot…). . The 32-bit maximum addressing space is 4GB. With compressed Pointers enabled, an address is not 1byte, but 8byte, because Java objects are 8byte aligned on both 32bit machines and 64bit machines, and classes are the basic unit in Java. The corresponding heap memory is filled with objects one by one.

Compressed oops represent managed pointers (in many but not all places in the JVM) as 32-bit values which must be scaled by a factor of 8 and added to a 64-bit base address to find the object they refer to. This allows applications to address up to four billion objects (not bytes), or a heap size of up to about 32Gb. At the same time, data structure compactness is competitive with ILP32 mode.

Object header analysis tool

The runtime object headlock status analysis tool JOL, which is the OpenJDK open source toolkit, introduces the following Maven dependencies

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.10</version>
</dependency>
Copy the code

Print markword

System.out.println(ClassLayout.parseInstance(object).toPrintable());
//object is our lock object
Copy the code

Lock expansion upgrade process

There are four lock states: unlocked, biased, lightweight and heavyweight. As locks compete, locks can be upgraded from biased to lightweight to heavyweight, but locks can be upgraded in one direction, meaning they can only be upgraded from low to high, with no degradation of the lock. Bias locking and lightweight locking are enabled by default in JDK1.6 and can be passed-XX:-UseBiasedLockingTo disable bias locking. The following figure shows the whole process of lock upgrade:

Biased locking

Biased locking is to join the new lock after Java 6, it is a kind of for locking operation means of optimization, through the study found that in most cases, the lock does not exist a multithreaded competition not only, and always by the same thread for many times, so as to reduce the same thread locks (will involve some CAS operation, time-consuming) introduced by the cost of biased locking. Biased locking the core idea is that if a thread got a lock and then lock into bias mode, the structure of Mark Word become biased locking structure, when the thread lock request again, no need to do any synchronization operation, namely the process of acquiring a lock, which saves a large amount of relevant lock application operation, thus the performance of the provider. Therefore, in the case of no competition, biased locking has a good optimization effect, after all, it is very likely that the same thread will apply for the same lock many times in a row. But for lock after all competitive situation, biased locking is failure, because such occasions is likely to lock the thread for each application is different, so this situation should not be used to lock, or you will do more harm than good, it is important to note that the biased locking failed, does not immediately into a heavyweight locks, but upgraded to a lightweight lock first.

The default open biased locking Open the biased locking: - XX: XX: + UseBiasedLocking BiasedLockingStartupDelay =0Disable biased locking: -xx: -usebiasedlockingCopy the code

Lightweight lock

If biased locking fails, the virtual machine does not immediately upgrade to heavyweight locking, but instead attempts to use an optimization called lightweight locking (added after 1.6), in which the Mark Word structure also changes to lightweight locking. The reason lightweight locks improve application performance is that “for the vast majority of locks, there is no contest for the entire synchronization cycle”, which is empirical data. It is important to understand that lightweight locks are suitable for scenarios where threads alternately execute synchronized blocks. If the same lock is accessed at the same time, this will cause the lightweight lock to expand into a heavyweight lock.

spinlocks

When lightweight locking fails, the virtual machine also performs an optimization called spin locking to prevent threads from actually hanging at the operating system level. This is based on in most cases, the thread holding the lock time not too long, if hang directly operating system level thread may do more harm than good, after all, the operating system to realize the need when switching between threads from user mode to kernel mode, the state transitions between needs a relatively long time, time cost is relatively high, So spinlock will assume that in the near future, the current thread lock can be obtained, therefore the virtual opportunity for current wants to make a few empty thread fetching the lock loop (which is also called spin), generally not too long, may be 50 or 100 cycles, after several cycles, if get the lock, I successfully enter the critical section. If the lock is not available, threads are suspended at the operating system level, which is an optimized way to spin locks, which can actually improve efficiency. Finally, there is no way to upgrade to a heavyweight lock.

Lock coarsening

Under normal circumstances, in order to ensure the effective between multi-threaded concurrent, will ask each thread holding the lock time as short as possible, but in some cases, a program for the same lock uninterrupted request, synchronization and release, high frequency, will consume a certain amount of system resources, because to be particular about, synchronization and release to lock itself will lead to performance loss, The high frequency of lock requests can be detrimental to system performance optimization, even though the time of a single synchronization operation may be short. Lock coarsing teaches us that everything has a certain degree, and in some cases we want to consolidate multiple lock requests into a single request to reduce the performance cost of a large number of lock requests, synchronizations, and releases in a short period of time.

At one extreme:


public void doSomethingMethod(a){
    synchronized(lock){
        //do some thing
    }
    // There is also some code that does other things that do not need to be synchronized, but can be executed quickly
    synchronized(lock){
        //do other thing}}Copy the code

The code above is to have two pieces of need synchronous operation, but in between the two pieces need synchronization code, need to do some other work, the work will only spend a little time, then we can put the job code in the lock, the two synchronized code block merged into one, in order to reduce the lock request for many times, synchronization, releases the consumption of the system performance, The merged code looks like this:

public void doSomethingMethod(a){
    // Lock coarser: consolidate into one lock request, synchronization, release
    synchronized(lock){
        //do some thing
        // Do other work that doesn't need to be synchronized but can be done quickly
        //do other thing}}Copy the code

Note: this is done on the premise that the code that does not need to be synchronized can be completed quickly. If the code that does not need to be synchronized takes a long time, the execution of the synchronized block will take a long time, which is not reasonable.

Another extreme case that requires lock coarsening is:

for(int i=0; i<size; i++){synchronized(lock){
    }
}
Copy the code

Code above every loop lock requests, synchronization and released, look will be a problem, and within the JDK will do some optimization to a request for this type of code lock, but it is better to write lock code outside the loop body, so a lock request can meet our requirements, unless there is special need: Loops take a long time, but other threads can’t afford to wait, so give them a chance to execute.

The code for the lock is as follows:

synchronized(lock){
    for(int i=0; i<size; i++){ } }Copy the code

Lock elimination

Lock elimination is a virtual machine is another kind of lock of optimization, the optimization more thoroughly, the Java virtual machine in the JIT compiler (can be as simple as immediately when a piece of code is compiled to be performed for the first time, also known as the instantaneous compiling), run through the context of the scan, remove lock there can be no Shared resource competition, in this way to eliminate unnecessary lock, The append of the following StringBuffer is a synchronous method, whereas the StringBuffer in the add method is a local variable and is not used by other threads. Therefore, StringBuffer cannot be locked automatically by the JVM in a shared resource contention situation. Lock elimination is based on the data support of escape analysis.

Lock elimination, provided that Java must be running in Server mode (which is more optimized than client mode) and escape analysis must be enabled

-xx :+DoEscapeAnalysis Enables escape analysis. -xx :+EliminateLoacks enables lock eliminationCopy the code

Lock elimination is a type of lock optimization that occurs at the compiler level. Sometimes we write code that does not need to be locked at all, but does. For example, the Append operation of the StringBuffer class

@Override
public synchronized StringBuffer append(String str) {
    toStringCache = null;
    super.append(str);
    return this;
}
Copy the code

As you can see from the source, the append method uses the synchronized keyword, which is thread-safe. But we might only use StringBuffer as a local variable inside a thread:

package com.niuh;

public class Juc_LockAppend {

    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        int size = 10000;
        for (int i = 0; i < size; i++) {
            createStringBuffer("Dime technology"."Born to share technology.");
        }
        long timeCost = System.currentTimeMillis() - start;
        System.out.println("createStringBuffer:" + timeCost + " ms");
    }

    public static String createStringBuffer(String str1, String str2) {
        StringBuffer sBuf = new StringBuffer();
        sBuf.append(str1);The append method is a synchronous operation
        sBuf.append(str2);
        returnsBuf.toString(); }}Copy the code

SBuf, the local object of createStringBuffer(), is valid only in the scope of this method. When different threads call createStringBuffer() at the same time, different sBuf objects are created. Therefore, if the append operation uses a synchronous operation, it is a waste of system resources.

This can be optimized by the compiler to remove the lock, provided that Java must be running in Server mode (which is more optimized than client mode) and escape analysis must be enabled:

-server -XX:+DoEscapeAnalysis -XX:+EliminateLocks
Copy the code

Escape analysis: For example, the code above looks at whether sBuf can escape its scope. If sBuf is returned as the return value of a method, it may be used as a global object outside the method, and thread-safety issues may arise. In this case, we can say that sBuf escaped and should not unlock the append operation. But our code above does not escape. Lock elimination can provide a performance boost.

Escape analysis

Using escape analysis, the compiler can optimize code as follows:

  1. Synchronous ellipsis. If an object is found to be accessible only from one thread, operations on the object can be performed without regard to synchronization.
  2. Convert heap allocation to stack allocation. If an object is allocated in a subroutine so that Pointers to it never escape, the object may be a candidate for stack allocation, not heap allocation.
  3. Separate objects or scalar substitutions. Some objects may be accessible without needing to be a continuous memory structure, and some (or all) of the object may be stored not in memory, but in CPU registers.

Are all objects and arrays allocated space in heap memory? The answer is: not necessarily

At Java code runtime, escape analysis can be turned on by specifying release as a JVM parameter:

-xx :+DoEscapeAnalysis Enables escape analysis. -xx: -doescapeAnalysis disables escape analysisCopy the code

Since JDK1.7, escape analysis is enabled by default. To disable escape analysis, specify -xx: -doescapeAnalysis

Escape analysis example: Loop to create the NiuhStudent object 50W times

package com.niuh;

public class T0_ObjectStackAlloc {

    /** * Perform two tests ** turn off escape analysis, while tuning heap space to avoid GC in the heap, if there is GC information will be printed * VM run parameters: - Xmx4G - Xms4G - XX: - DoEscapeAnalysis - XX: XX: + PrintGCDetails - + HeapDumpOnOutOfMemoryError open escape analysis * * * the VM running parameters: - Xmx4G - Xms4G - XX: + DoEscapeAnalysis - XX: XX: + PrintGCDetails - + HeapDumpOnOutOfMemoryError execution after the main method of * * * * jmap JPS check process -histo Process ID */
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        for (int i = 0; i < 500000; i++) {
            alloc();
        }
        long end = System.currentTimeMillis();
        // Check the execution time
        System.out.println("cost-time " + (end - start) + " ms");
        try {
            Thread.sleep(100000);
        } catch(InterruptedException e1) { e1.printStackTrace(); }}private static NiuhStudent alloc(a) {
        //Jit performs escape analysis on code at compile time
        // Not all objects are stored in the heap, some are stored in the thread stack space
        NiuhStudent student = new NiuhStudent();
        return student;
    }

    static class NiuhStudent {
        private String name;
        private intage; }}Copy the code

In the first case, the escape analysis is turned off and the heap space is adjusted to avoid GC in the heap. If there is GC information, it will be printed

Set the VM running parameters: - Xmx4G - Xms4G - XX: - DoEscapeAnalysis - XX: XX: + PrintGCDetails - + HeapDumpOnOutOfMemoryErrorCopy the code

50W instance objects are created as shown below The second case: Enables escape analysis

Set the VM running parameters: - Xmx4G - Xms4G - XX: + DoEscapeAnalysis - XX: XX: + PrintGCDetails - + HeapDumpOnOutOfMemoryErrorCopy the code

No 50W instance objects are created as follows

PS: The above code is submitted to Github: github.com/Niuh-Study/…

GitHub Org_Hejianhui /JavaStudy GitHub Hejianhui /JavaStudy