One, the introduction

The text content is a little too much, if there is a wrong or bad place, please also advise ~~~~~~~

Two, stubborn bronze

2.1 Is multithreading necessarily fast?

The concurrency method is set to be the same as the client’s property. The concurrency method is set to be the same as the client’s property.

Public class ConcurrencyTest {private static final Long Count = 10000; public static void main(String[] args) throws InterruptedException { concurrency(); serial(); } /** * Threads add ** @interruptedexception */ private static void concurrency() throws InterruptedException {long start = System.currentTimeMillis(); Thread Thread = new Thread(new Runnable() {@override public void run() {int a = 0; for (int i = 0; i < count; i++) { a += 5; }}}); thread.start(); int b = 0; for (int i = 0; i < count; i++) { b--; } // Wait for thread.join(); long end = System.currentTimeMillis() - start; System.out.println("concurrency "+ end); } /** * private static void serial() {long start = system.currentTimemillis (); int a = 0; for (int i = 0; i < count; i++) { a += 5; } int b = 0; for (int i = 0; i < count; i++) { b--; } long end = System.currentTimeMillis() - start; Println (" Serial total time "+ end); system.out. println("serial total time" + end); }}Copy the code

The answer here is “not necessarily”. Xiaobian has tested several sets of data as follows (partial results are extracted) :

Multithreading versus single thread efficiency testing

cycles

Single thread execution

Multithreaded execution

The efficiency of

10000

0

1

slow

10000

0

0

equal

One hundred thousand

2

2

equal

One hundred thousand

1

1

equal

The above results make it clear that our answer is correct. Why is multithreading slower than single threading in some cases? This is because multithreading has the overhead of creation and context switching.

2.2 Context Switch

So what is context switching?

Currently, even single-core processors support multiple threads of code execution, and the CPU implements this mechanism by allocating CPU time slices to each thread. A time slice is the amount of time that the CPU allocates to each thread, and since the time slice is typically tens of milliseconds, the CPU needs to constantly switch threads to execute. Let’s say that when thread A gets A CPU slice equal to 10 milliseconds, after 10 milliseconds, the CPU needs to switch to thread B to execute the program. When thread B’s time slice is complete, thread A is cut back to continue execution.

It is obvious that our CPU is like a loop of context switching to achieve the effect of simultaneous execution. The current execution of a time slice will switch to the next task. However, the state of the current task is saved before the switchover, so that the state before the task can be restored when the task is switched next time. So the process from a task being saved to being loaded again is a context switch.

2.3 Test context switching times

Here we need to use a command called “vmstat 1”, this command is on the Linux system, can monitor the operating system process, virtual memory, CPU activity. As shown in the figure below, CS(Content Switch) represents The Times of context Switch. It can be seen from the figure that the CS value of the system generally remains between 600 and 800. When we have been running the ConcurrencyTest program, it is very clear that CS soared to more than 1000.

2.4 Java Memory model

Before we learn the principles of Sync, we need to clear up a conceptual understanding of the Java memory model. Very important, very important, very important

The Java Memory Model, or JMM for short, controls the communication between Java threads. The JMM determines when a shared variable written by one thread is visible to another thread. As can be seen from the figure, shared variables between threads are stored in main memory. Each thread has its own local memory (also known as working memory), which stores shared variables in main memory. This is equivalent to having a copy of the shared variables in main memory. To provide efficiency, threads do not interact directly with main memory, but instead read data through local memory.

If thread A and thread B communicate, they need to go through the following two steps:

1) Thread A flusher the updated shared variable from local memory A to main memory.

2) Thread B re-reads the updated shared variable from main memory.

2.5 Data interaction between main memory and working memory

So what are the steps in the interaction between main memory and working memory?

Lock: A variable that acts on main memory and identifies a variable as a thread-exclusive state.

Unlock: A variable that acts on main memory. It releases a locked variable so that it can be locked by another thread.

Read: A variable acting on main memory that reads the value of the main memory variable.

Load: A variable that acts on main memory and places the value of the variable from main memory into a copy of the variable in thread-local memory.

Use: variable applied to working memory, passing a variable in working memory to the execution engine.

Assign: a scoped working memory variable that assigns a value received from the execution engine to the working memory variable.

Store: scoped working memory variable that transfers a variable value from working memory to main memory for subsequent write operations.

Write: scoped variable in working memory to transfer stroe operation from the value of a variable in working memory to a variable in main memory.

Previous note: explain the above steps in more detail

The JMM is a specification that defines several rules, which are selected as important to this article:

1. If you want to copy a variable from main memory to working memory, you need to perform read and load operations in order. If you want to synchronize variables from working memory to main memory, you need to perform Store and write operations in order. But the Java memory model only requires that these operations be performed sequentially, not sequentially.

A variable can be locked by only one thread at a time. However, the lock operation can be repeated by the same thread for several times. After the lock operation is performed for several times, the variable can be unlocked only by the UNLOCK operation. Lock and unlock must come in pairs.

3. If you execute a lock operation on a variable, the value of the variable will be emptied from the working memory. Before the execution engine can use the variable, load or assign operations will be performed to initialize the variable.

4, Java memory model synchronization rules xiaobian temporarily mentioned so many, interested partners can go to understand

Three, order silver

3.1 Visibility problems caused by multi-threading

What is the visibility problem?

Visibility: Changes made to main memory by one thread can be observed by other threads in a timely manner.

When a shared property is modified by thread two, thread one cannot get the latest value, causing an infinite loop. As the Java memory model explains, threads interact with local memory.

1. The thread reads the falg property into its private local memory with a value of true.

Thread 2 sets falg to false and flusher it to main memory, but thread 1 is unaware that falg has been modified.

public class SyncExample5 { static boolean falg = true; Static Object lock = new Object(); Public static void main(String[] args) throws InterruptedException {// Thread 1 new Thread(new Runnable() {@override public Void run() {while (falg) {void run() {falg (falg) { // synchronized (lock){} // synchronized (lock){} // synchronized (lock){} // synchronized (lock){} // system.out.println (falg); } } }).start(); Thread.sleep(2000L); @override public void run() {falg = false; System.out.println("falg value changed "); } }).start(); }}Copy the code

How does Sync solve the visibility problem?

This is where local memory interacts with working memory. Remember the 8 steps in the text?

If synchronization is added to a program, Lock and Unlock operations are performed. The Lock operation invalidates properties in the local memory and reads data again from the main memory.

3.2 Atomicity problems caused by multithreading

What is the atomic problem?

Atomicity: Provides mutually exclusive access that can only be accessed by one thread at a time.

In this case, a task is accumulated 1000 times, and 5 threads are started to accumulate. The final result should be 5000, but the result is different due to multiple threads.

public class SyncExample6 { static int index = 0; static Object lock = new Object(); Public static void main(String[] args) throws InterruptedException {// index Runnable task = () -> {synchronized (lock) {for (int I = 0; i < 1000; i++) { index++; } / /}}; For (int I = 0; i < 5; i++) { Thread thread = new Thread(task); thread.start(); } thread.sleep (2000L);} thread.sleep (2000L); System.out.println("index = " + index); }}Copy the code

We use the Java command to compile the above code:

javac SyncExample6.java

Javap -p -v syncexample6.class so that we can see what sync is actually doing underneath.

Find “lambda” after compiling the code0 “, because our synchronization mechanism is written in the main method, using lambda expressions.

 private static void lambda$main$0();
    descriptor: ()V
    flags: ACC_PRIVATE, ACC_STATIC, ACC_SYNTHETIC
    Code:
      stack=2, locals=3, args_size=0
         0: iconst_0
         1: istore_0
         2: iload_0
         3: sipush        1000
         6: if_icmpge     39
         9: getstatic     #18                 // Field lock:Ljava/lang/Object;
        12: dup
        13: astore_1
        14: monitorenter
        15: getstatic     #14                 // Field index:I
        18: iconst_1
        19: iadd
        20: putstatic     #14                 // Field index:I
        23: aload_1
        24: monitorexit
        25: goto          33
        28: astore_2
Copy the code

What is the cause of the atomicity problem?

This involves the knowledge point of context switch mentioned at the beginning of the article. Index ++ involves 4 instructions in total, as follows

15: getStatic #14 // Step 1: Obtain the index value 18: iconST_1 // Step 2: Prepare constant 1 19: iadd // Step 3: add 20: putStatic #14 // Step 4: reassign the valueCopy the code

These four instructions are the four steps of index ++, assuming that as soon as the thread comes in, it executes to step 3, at which point the CPU switches threads. Switch to thread 2, thread 2 performs step 1, at which point the index value is still 0, because the thread has not performed step 4 at all. When thread 2 is finished, thread 1 will return to thread 1. Thread 3 will not retrieve index, which will cause the calculation result is incorrect.

How does Sync solve the atomic problem?

  14: monitorenter
  15: getstatic     #14                 // Field index:I
  18: iconst_1
  19: iadd
  20: putstatic     #14                 // Field index:I
  23: aload_1
  24: monitorexit
Copy the code

When we add sync, monitorenter and Monitorexit directives are inserted.

Let’s assume that thread 1 reaches Step 3 and is switched to thread 2. When thread 2 executes monitorenter, it finds that the monitorenter object is occupied by another thread, so it waits. Now cut back to thread 1, which completes the step of Monitorexit to release the lock. This is when thread two can acquire the lock. This ensures that only one thread can operate on it at a time, thus ensuring atomicity.

The Monitorenter directive inserts into the sync block to the start position after compilation, while the Monitorexit inserts into the end of the sync block and the exception position. The JVM needs to ensure that each Monitorenter must have a monitorexit. Any object has a Monitor associated with it, and when a Monitor is held, it handles the lock state. When a thread executes a Monitorenter instruction, it attempts to acquire ownership of the object’s monitor, that is, the lock object.

3.3 Order problems caused by multi-threading

What is the problem of order?

Orderliness refers to the order in which the code in a program is executed. Java optimizes code both at compile time and at run time, so that the final order in which the program is executed is not necessarily the order in which we wrote the code.

@jcstresstest // specify the predicted results and type, with additional description, if 1,4 is ok, ACCEPTABLE @outcome (id = {"1", "4"}, expect = expect.ACCEPTABLE, desc = "OK ") @outcome (id = {"0"}, @state public class TestJMM {int num = 0; @state public class TestJMM {int num = 0; boolean ready = false; @Actor public void actor1(I_Result r) { if (ready) { r.r1 = num + num; } else { r.r1 = 1; } } @Actor public void actor2(I_Result r) { num = 2; ready = true; }}Copy the code

It’s time to post a snippet of code that uses the Jcstress high-concurrency testing framework to demonstrate the problems that order causes.

If there is a thread in actor1 and a thread in actor2, consider how r.r1 can be used.

Xiaobian to tell you, in fact, there are three answers, respectively: 1, 4, 0

In case of 1:

1) if acTOR1 is executed first, ready = false, then r.r1 = 1;

Num = 2, thread switch to acTOR1, ready = false, r.r1 = 1;

In the case of 4:

Actor1: ready = true, num = 2; actor1: ready = true, num = 2;

If 0 occurs:

1) This is the point. Suppose acTOR2 gains execution and the code order of ACtor2 changes due to instruction reordering.

When ready = true is executed, the thread switches to ACTOR1. At this point, ready is already true, but num is still 0, so we have 0.

@actor public void actor2(I_Result r) {// Due to reordering, the following code is changed: ready = true; num = 2; }Copy the code

Maven clean Install will generate a jar, and you can start the JAR directly by running maven clean Install.

How does Sync solve the ordering problem?

Actor1 and ACTOR2 have the same lock object. Even if the reorder is executed until acTOR2 is ready = true, the thread will switch to ACTOR1, but acTOR1 will have to wait. Num = 2 (num = 2); actor1 (num = 2);

// Specify the results and types of the predicted tests to use @jcstresstest @outcome (id = {"1"}, expect = expect.ACCEPTABLE, desc = "OK ") // Since sync is an order problem, zero will not appear. @outcome (id = {"4"}, expect = expect.ACCEPTABLE_INTERESTING, desc = "denger") @outcome (id = {"0"}, @state public class TestJMM {int num = 0; @state public class TestJMM {int num = 0; boolean ready = false; Object lock = new Object(); @Actor public void actor1(I_Result r) { synchronized (lock) { if (ready) { r.r1 = num + num; } else { r.r1 = 1; } } } @Actor public void actor2(I_Result r) { synchronized (lock) { num = 2; ready = true; }}}Copy the code

The test results are as follows:

Glory gold

4.1 Sync Reentrant feature

What is reentrant?

That is, a thread can repeatedly acquire the same lock as synchronzIED multiple times. The sync underlying lock object contains a counter (recursions variable) that records how many times the thread acquired the lock. When we acquire the lock from the same thread, the counter will be +1, after the synchronization block, counter -1. The lock object is released until the count reaches zero.

public class SyncExample8 { public static void main(String[] args) { new MyThread().start(); } } class MyThread extends Thread { @Override public void run() { synchronized (MyThread.class) { System.out.println(getName() + "into sync block 1"); Synchronized (mythread.class) {system.out.println (getName() + "enter block 2"); }}}}Copy the code

Run the following result, we can see very clearly after the output “synchronization code block 1”, do not need to wait for the lock release, can enter the second synchronization code block. Such a feature can prevent deadlocks and better encapsulate code (i.e., synchronize code in blocks that can be written in multiple ways).

The input result is as follows:

Thread-0 enters synchronized block 1. Thread-0 enters synchronized block 2

4.2 Sync Uninterruptible feature

Uninterruptible means that thread two is uninterruptible while waiting for thread one to release the lock.

After one thread has acquired the lock, another thread remains blocked or waiting. The first thread does not release the lock, and the second thread remains blocked or waiting, so sync is an unbreakable lock.

public class SyncExample9 { private static Object lock = new Object(); public static void main(String[] args) throws InterruptedException { Runnable run = () -> { synchronized (lock) { String  name = Thread.currentThread().getName(); System.out.println(name + "enter sync code block "); Thread.sleep(888888L); thread.sleep (888888L); } catch (InterruptedException e) { e.printStackTrace(); }}}; Thread t1 = new Thread(run); t1.start(); Thread.sleep(1000L); Threadt2 = new Thread(run); t2.start(); System.out.println(" start interrupt thread 2 "); // Force thread 2 to interrupt t2.interrupt(); System.out.println(" thread state "+ t1.getState()); System.out.println(" thread2 state "+ t2.getState()); }}Copy the code

As soon as our thread entered the synchronization code, it held the lock and slept (confirming that sleep does not release the lock object).

Thread two starts to try to acquire the lock, and when it fails to acquire the lock, it becomes blocked. Even if we forcibly interrupt thread two, we end up with thread two still blocked.

Thread 2 Thread 1 state TIMED_WAITING Thread 2 state BLOCKED

4.3 Disassembly Learning the sync principle

Disassemble Java code using JavAP to introduce the concept of Monitor.

public class SyncExample10 { private static Object lock = new Object(); public static void main(String[] args) throws InterruptedException { synchronized (lock) { System.out.println("1"); } } public synchronized void test() { System.out.println("1"); }}Copy the code

We compiled SyncExample10 using the javac and javap commands

javac SyncExample10.java

javap -v -p SyncExample10.class

The compiled instructions are as follows. We’ll focus on the contents of the main method, focusing on monitorenter and Monitorexit

public static void main(java.lang.String[]) throws java.lang.InterruptedException; descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=3, args_size=1 0: Getstatic #2 3: dUP 4: astore_1 5: monitorenter // here 6: getStatic #3 9: LDC #4 11: invokevirtual #5 14: aloAD_1 15: Monitorexit // here 16: goto 24 19: Astore_2 20: ALOAD_1 21: Monitorexit // here 22: ALOad_2 23: athrow 24: returnCopy the code

Monitorenter instruction

When we enter a block of synchronized code, the Monitorenter directive is executed first. Each object is associated with a Monitor, which is locked when occupied and cannot be accessed by other threads. When another thread executes monitorente, it attempts to take ownership of the monitor corresponding to the current object.

Monitor has two important member variables:

Owner: When a thread acquires the lock on the object, it assigns the current value to the owner.

Recursions: the number of times a thread owns a lock and the variable used to acquire the lock is +1.

Monitorenter performs the following operations:

1) If the number of monitor entries is 0, the current thread can enter monitor and add the number of monitor entries (recursions)+1. The current thread becomes the owner of montiro.

2) If the thread already owns the monitor, it is allowed to re-enter the monitor and enter the number of times +1 (repeatability);

3) If the monitor is already owned by another thread, the thread currently trying to acquire the monitor will be blocked until the number of monitor entries becomes 0, and then it can be acquired again.

Monitorexit instruction

Since the counter will execute +1 when we synchronize the block entry, it will execute -1 when we exit;

Note that the thread that can execute the Monitorexit directive must be the one that has monitor ownership of the current object. When we execute the Monitorexit directive and the counter drops to 0, the current thread no longer has monitor ownership. Other blocked threads can then try again to acquire ownership of the Monitor.

If you look closely at the instructions compiled above, monitoreExit has two. Why?

Because you need to ensure that if the synchronized code block execution throws an exception, you also need to release the lock object. Wait until the next time an interviewer asks you if synchronized releases its objects if it does. The answer is yes.

ACC_SYNCHRONIZED modification

What we have just seen is the compiled instruction for the synchronized code block in the mian method. Here are the compiled instructions for the synchronized method

As you can see, the synchronized methods add ACC_SYNCHRONIZED modifier after disassembly, implicitly call Monitorenter, Mointorexit, call Monitorenter before the synchronized method is executed, and call Monitorexit after the method is finished.

public synchronized void test(); descriptor: ()V flags: ACC_PUBLIC, ACC_SYNCHRONIZED Code: stack=2, locals=1, args_size=1 0: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #4 // String 1 5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;) V 8: return LineNumberTable: line 19: 0 line 20: 8Copy the code

5. Noble platinum

5.1 Montior monitor lock

As mentioned above, each object is associated with a Monitor. The actual locking is done by the Monitor.

So what exactly is monitor? Monitor is written in C++.

Hg.openjdk.java.net/jdk8/jdk8/h… The website has been found for you, click on the left zip, gz download can be. If you have a bad Internet connection, you can download hotspot source code from the website. The downloaded files are as follows:

After downloading, you can download a CLion tool to view the code, or open it directly with a text editor.

How do Java objects relate to Monitor?

Here’s another thing: each object is divided into three areas in memory: Header, Instance Data, and Padding. The object header contains a monitor reference address to a specific Monitor object.

What does monitor contain?

We find the monitor first object corresponding to the source file: / SRC/share/vm/runtime/objectMonitor HPP, turning down can see objectMonitor constructor, there are a series of member attributes.

ObjectMonitor() { _header = NULL; _count = 0; _waiters = 0, _recursions = 0; _object = NULL; _owner = NULL; // Identifies the thread that owns the monitor _WaitSet = NULL; _WaitSetLock = 0; _Responsible = NULL ; _succ = NULL ; _cxq = NULL ; FreeNext = NULL; FreeNext = NULL; _EntryList = NULL ; // Store thread waiting for lock block state _SpinFreq = 0; _SpinClock = 0 ; OwnerIsThread = 0 ; _previous_owner_tid = 0; }Copy the code

Pick a few more important ones:

Recursions: The number of times a thread obtains locks during the monitorenter session. The number of times a thread obtains locks is +1 and the number of times it exits the synchronization block is -1.

_OWNER: When a thread acquires ownership of monitor, the object is saved to _owner.

_WaitSet: When a thread is in wait state, it is stored in _WaitSet.

_CXQ: When threads compete for locks, they will be added to the _CXQ linked list if the lock contention fails.

_EntryList: When a new thread tries to acquire a lock object and fails to obtain it, it stores it in _EntryList.

5.2 monitor competition

Under what circumstances will there be competition?

Lock contention occurs when multiple threads execute synchronized blocks of code.

When a thread executes a synchronized block of code, it first executes monitorenter, which calls the function interpreterRuntime.cpp

The source file is as follows: the SRC/share/vm/interpreter/interpreterRuntime CPP, search: monitorenter

IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem)) // If (UseBiasedLocking) {// Retry fast entry if bias is revoked to avoid unnecessary inflation ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK); } else {// ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK); } // the code omits IRT_ENDCopy the code

How do threads compete for locks?

For heavy lock, monitorenter function will be called: ObjectSynchronizer: : slow_enter,

Final call to this function: ObjectMonitor: : enter, source is located in: / SRC/share/vm/runtime/ObjectMonitor CPP

void ATTR ObjectMonitor::enter(TRAPS) { // The following code is ordered to check the most common cases first // and to reduce RTS->RTO cache line upgrades on SPARC and IA32 processors. Thread * const Self = THREAD ; void * cur ; Cur = Atomic::cmpxchg_ptr (Self, &_owner, NULL); if (cur == NULL) { assert (_recursions == 0 , "invariant") ; assert (_owner == Self, "invariant") ; return ; If (cur == Self) {// recursions ++; return ; If (Self->is_lock_owned ((address)cur)) {assert (_recursions == 0, "internal state error");  // count +1 _recursions = 1; // Assign the current thread setting to _owner _owner = Self; OwnerIsThread = 1 ; return ; } // TODO-FIXME: change the following for(;;) loop to straight-line code. for (;;) { jt->set_suspend_equivalent(); EnterI (THREAD); if (! ExitSuspendEquivalent(jt)) break ; _recursions = 0 ; _succ = NULL ; exit (false, Self) ; jt->java_suspend_self(); }}Copy the code

The spin optimization of the lock is omitted, as discussed later in this article

The specific operation process of the above code is as follows:

1) CAS attempts to set the _owner property of monitor to the current thread

2) If owner = current thread, the current thread enters monitor again and executes _recursions ++. Record the number of reentries.

3) If the current thread entered monitor for the first time, set _recursions = 1, _owner = current thread, and the thread successfully acquired the lock and returned.

4. If the lock fails to be obtained, wait for the lock to be released

5.3. The monitor to wait

As mentioned above, if the lock contention fails, the EnterI (THREAD) function will be called, again in objectMonitor. CPP source code search :: :EnterI

The following code is omitted:

void ATTR ObjectMonitor::EnterI (TRAPS) { Thread * Self = THREAD ; assert (Self->is_Java_thread(), "invariant") ; assert (((JavaThread *) Self)->thread_state() == _thread_blocked , "invariant") ; If (TryLock (Self) > 0) {assert (_succ! = Self , "invariant") ; assert (_owner == Self , "invariant") ; assert (_Responsible ! = Self , "invariant") ; return ; } // Spin attempts to acquire the lock if (TrySpin (Self) > 0) {assert (_owner == Self, "invariant"); assert (_succ ! = Self , "invariant") ; assert (_Responsible ! = Self , "invariant") ; return ; } // The current thread is wrapped as an ObjectWaiter object node, and the state is set to ObjectWaiter::TS_CXQ ObjectWaiter node(Self); Self->_ParkEvent->reset() ; node._prev = (ObjectWaiter *) 0xBAD ; node.TState = ObjectWaiter::TS_CXQ ; ObjectWaiter * NXT; // Use CAS to push node to _cxq. for (;;) { node._next = nxt = _cxq ; if (Atomic::cmpxchg_ptr (&node, &_cxq, nxt) == nxt) break ; // Interference - the CAS failed because _cxq changed. Just retry. // As an optional optimization we retry the lock. // Try acquiring the lock again if (TryLock (Self) > 0) {assert (_succ! = Self , "invariant") ; assert (_owner == Self , "invariant") ; assert (_Responsible ! = Self , "invariant") ; return ; }} // Suspend thread for (;;) If (TryLock (Self) > 0) break; assert (_owner ! = Self, "invariant") ; if ((SyncFlags & 2) && _Responsible == NULL) { Atomic::cmpxchg_ptr (Self, &_Responsible, NULL) ; } // park self if (_Responsible == Self || (SyncFlags & 1)) { TEVENT (Inflated enter - park TIMED) ; Self->_ParkEvent->park ((jlong) RecheckInterval) ; // Increase the RecheckInterval, but clamp the value. RecheckInterval *= 8 ; if (RecheckInterval > 1000) RecheckInterval = 1000 ; } else { TEVENT (Inflated enter - park UNTIMED) ; Self->_ParkEvent->park(); } // try to get the lock if (TryLock(Self) > 0) break; } return ; }Copy the code

The specific process of the above code is summarized as follows:

1) After entering EnterI, it first tries to obtain the lock object again

2) Encapsulate the current thread as ObjectWaiter object node and set the state to ObjectWaiter::TS_CXQ;

3) In the for loop, CAS pushes the node into the _CXq list. Multiple threads may push themselves into the _CXq list at the same time.

4) After the node node pushes to the _CXQ list, it spins to try to get the lock. If it still doesn’t get the lock, it suspends the current thread through Park and waits to wake up.

5) When the current thread is awakened, it will continue execution from the hang point and try the lock again through TryLock.

5.4 monitor the release

When will Monitor be released?

When the thread finishes executing the synchronized block, the monitorexit directive is called to release the lock, which is then released.

Again, search for :: :exit in objectMonitor. CPP

What is the release monitor procedure?

Exit function code is as follows, of course, small editor also has most of the deletion, leaving the main part of the code.

Void ATTR ObjectMonitor::exit(bool not_SUSPENDED, TRAPS) {// If (_recursions! = 0) { _recursions--; // this is simple recursive enter TEVENT (Inflated exit - recursive) ; return ; } ObjectWaiter * w = NULL; int QMode = Knob_QMode ; If (QMode == 2&&_cxq! = NULL) { w = _cxq ; assert (w ! = NULL, "invariant") ; assert (w->TState == ObjectWaiter::TS_CXQ, "Invariant") ; // Wake up the thread ExitEpilog (Self, w); return ; W = _EntryList; w = _EntryList; if (w ! = NULL) { guarantee (w->TState == ObjectWaiter::TS_ENTER, "invariant") ; // Wake up the thread ExitEpilog (Self, w); return ; }}Copy the code

To observe the above code, you need to call the ExitEpilog function to wake up the thread, or search the objectMonitor. CPP source code for :: :ExitEpilog

void ObjectMonitor::ExitEpilog (Thread * Self, ObjectWaiter * Wakee) { assert (_owner == Self, "invariant") ; _succ = Knob_SuccEnabled ? Wakee->_thread : NULL ; ParkEvent * Trigger = Wakee->_event ; Wakee = NULL ; // Drop the lock OrderAccess::release_store_ptr (&_owner, NULL) ; OrderAccess::fence() ; // ST _owner vs LD in unpark() if (SafepointSynchronize::do_call_back()) { TEVENT (unpark before SAFEPOINT) ; } DTRACE_MONITOR_PROBE(contended__exit, this, object(), Self); // The most important thing here is to call unpark to wake up Trigger->unpark(); // Maintain stats and report events to JVMTI if (ObjectMonitor::_sync_Parks ! = NULL) { ObjectMonitor::_sync_Parks->inc() ; }}Copy the code

The specific process of the above code is summarized as follows:

1) Exit the synchronization block and let _recursions 1. When _recursions equals 0, the thread releases the lock.

2) According to different policies (specified by QMode), finally get the thread that needs to be awakened (in code: w)

3) The ExitEpilog function is finally called, and unpark performs the wake up operation.

6. Eternal Diamonds

6.1 the CAS is introduced

Abbreviation of CAS word CompareAndSwap, compare and replace. CAS needs to have three operands: the memory address V, the old expected value A, and the target value B to be updated.

When the CAS instruction executes, if the value of memory address V is equal to the expected value A, the target value B is saved to memory, otherwise nothing is done. The entire compare and replace operation is an atomic operation.

CAS is an optimistic locking technology. When multiple threads try to update the same variable using CAS, only one thread can update the value of the variable, while all other threads fail. The failed thread is not suspended, but is informed that the contention failed and can try again.

Advantages: Can avoid dangers such as priority inversion and deadlocks, competition is cheaper, coordination occurs at finer levels of power, allows for a higher degree of parallelism, and so on.

Disadvantages:

If the CAS fails for a long time, the system will continue to try the CAS. If the CAS fails for a long time, the CPU may incur heavy overhead.

2. Only one shared atomic operation can be guaranteed. We can use cyclic CAS to guarantee atomic operation, but when multiple shared variables are operated, cyclic CAS cannot guarantee atomic operation.

If the memory address V was first read with A value of A and is still A when it is ready to be assigned, can we say that its value has not been changed by other threads?

If its value was changed to B during that time and then changed back to A, CAS would assume that it had never been changed, A bug known as the ABA problem of CAS operations. To solve this problem, we provide a labeled atom reference class “AtomicStampendReference” that guarantees the correctness of the CAS by controlling the version of the variable value.

Therefore, it is important to consider whether the “ABA” problem affects the correctness of the program’s concurrency before using CAS. If ABA problems need to be solved, switching to traditional mutual-exclusive synchronization may be more efficient than atomic classes

Once YOU’ve covered CAS, you’ll be able to cover some of the following implementation principles, using AtomicInteger as an example, which is a class in the JDK that provides guarantees for atomic operations.

    /**
     * Atomically increments by one the current value.
     *
     * @return the updated value
     */
    public final int incrementAndGet() {
        return unsafe.getAndAddInt(this, valueOffset, 1) + 1;
    }
Copy the code

Let’s go inside and look at the methods in it. Take the incrementAndGet method for example, which does the +1 operation on top of the original value, and its implementation calls the Unfafe class method. Let’s go inside again.

public final int getAndAddInt(Object var1, long var2, int var4) { int var5; do { var5 = this.getIntVolatile(var1, var2); } while(! this.compareAndSwapInt(var1, var2, var5, var5 + var4)); return var5; }Copy the code

The Unfafe class gives Java the ability to manipulate memory like Pointers in C, but it also introduces pointer problems. Using the Unsafe class excessively can make errors more likely. Unsafe objects, therefore, are not officially recommended for Use in Java, and cannot be called directly, only obtained by radiation.

GetandAddInt: getandAddInt: getandAddInt

Var1: Passes in this, the AtomicInteger instance object;

Var2: offset, the most recent value in memory can be obtained by combining var1;

Var4: the value to be accumulated, i.e., 1;

First by var1 + var2 access to memory the latest value, and then call compareAndSwapInt method, this method will pass var1 + var2 parameter value of the memory’s latest, comparing with the value of the var5, if successful, this var5 + var4 results update into memory. If not, the loop continues. That’s what we just introduced CAS to say, compare and replace.

6.2 Sync Lock Upgrade process

Before JDK1.5, sync is a heavyweight lock, after 1.6, for the sync to do a lot of all kinds of optimization, including biased locking, lightweight, adaptive spin lock elimination, lock, lock coarsening, etc., these technologies are in order to more efficient share data between threads, and solve the problem of competition, so as to achieve the execution efficiency of the program.

Of course, lock must upgrade the process: no lock – partial lock – lightweight lock – heavyweight lock.

Each lock has a different usage scenario, and before we can understand the characteristics of each lock, we need to understand the layout of objects in memory!

6.3 Object Layout

Each object is divided into three areas in memory: Header, Instance Data, and Padding.

Object:

When a thread attempts to access a sync-modified block of code, it first obtains the lock, which is stored in the object header.

In the case of the Hotspot VIRTUAL machine, the object header contains a Mark Word(field marker), a Klass Pointer (Pointer type) and, if the object is an array type, the length of the array.

What about the Hotspot VIRTUAL machine? The JVM can be understood as a set of specifications, while Hotspot is a virtual machine product. For example, if you want to find a girlfriend/boyfriend, since there is a certain requirement or specification for finding a friend, JVM can be regarded as the specification and Hotspot is the specific boyfriend/girlfriend.

You don’t believe? System.out.println(System.getProperties()); Run this code and find out what your java.vm.name equals.

java.vm.name=Java HotSpot(TM) 64-Bit Server VM

**Mark Word: ** stores the object’s HashCode, generational age, and lock tag by default. This is also an important part of sync lock implementation. During runtime, the data stored in Mark Word changes with the location of the lock. On a 64-bit VM, Mark Word is 64-bit in size and its storage structure is shown as follows:

Mark Word 64-bit VM storage architecture

Above the table data can’t fooling around, isn’t that right, we can view the source code: the SRC/share/vm/oops/markOop HPP

Inside the notes are written very clearly, compared to the following notes reflect the above table, more intuitive.

//  32 bits:
//  --------
//             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//             size:32 ------------------------------------------>| (CMS free block)
//             PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
//  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
//  size:64 ----------------------------------------------------->| (CMS free block)
Copy the code

**Klass Pointer ** : A Pointer to an object’s type that points to its class metadata and that the JVM uses to determine which object is an instance.

Object header = Mark Word + Klass Point size without pointer compression pair enabled:

In a 64-bit system, Mark Word = 8 bytes, pointer type = 8 bytes, object header = 16 bytes = 128bits.

Instance data:

A member variable defined in a class

Align fill:

Alignment padding doesn’t have to exist, it doesn’t have any special meaning, it’s just a placeholder. Since the automatic memory management system of the HotPort VM requires that the starting address of the object be an integer multiple of 8 bytes, when the instance data part of the object is not aligned, it needs to be filled by alignment.

Now, let’s try to layout an object in memory and see what it looks like:

Let’s start with the JAR package, which provides what we want to see, as follows:

<dependency> <groupId>org.openjdk.jol</groupId> <artifactId>jol-core</artifactId> <version>0.10</version> </dependency> public class SyncExample4 { static Apple apple = new Apple(); Public static void main(String[] args) {// Use ClassLayout System.out.println(ClassLayout.parseInstance(apple).toPrintable()); } } class Apple { private int count; private boolean isMax; }Copy the code

The object header is translated into object header alignment. Loss due to the next object alignment Since Apple has a Boolean property that takes up one byte, the computer fills 7 bytes to improve the efficiency of execution and GC garbage collection.

com.example.concurrency.sync.Apple object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object  header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 43 c0 00 f8 (01000011 11000000 00000000 11111000) (-134168509) 12 4 int Apple.count 0 16 1 boolean Apple.isMax false 17 7 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 7 bytes external = 7 bytes totalCopy the code

The object header contains 31 bits of HashCode. Why didn’t I see it?

So let’s run a little bit of code to see what apple’s HashCode is, and let’s see that apple’s HashCode is 7ea987AC, and let’s see that the VALUE has changed as well. Here’s an idea, because of the size side storage, we need to look backwards.

public class SyncExample4 { static Apple apple = new Apple(); Public static void main(String[] args) {// Check HashCode system.out.println (Integer.tohexString (apple.hashCode())); System.out.println(ClassLayout.parseInstance(apple).toPrintable()); } } class Apple { private int count; private boolean isMax; } 7ea987ac # WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope com.example.concurrency.sync.Apple object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 ac 87 a9 (00000001 10101100 10000111 10101001) (-1450726399) 4  4 (object header) 7e 00 00 00 (01111110 00000000 00000000 00000000) (126) 8 4 (object header) 43 c0 00 f8 (01000011 11000000 00000000 11111000) (-134168509) 12 4 int Apple.count 0 16 1 boolean Apple.isMax false 17 7 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 7 bytes external = 7 bytes totalCopy the code

If you are careful, you will find that the header of the object occupies 16 bytes. The three object headers are 12 bytes long.

Pointer compression is turned on by default in the JVM, and can be turned off with an argument:

If you look at the result in print, it’s 16 bytes.

OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object  header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 80 68 f5 1f (10000000 01101000 11110101 00011111) (536176768) 12 4 (object header) 02 00 00 00 (00000010 00000000 00000000 00000000) (2)Copy the code

A Java object consists of three parts: an object header, instance data, and an alignment fill. The header contains a Mark Word, a Klass Pointer (and, if the object is an array, the length of the array).

Seven, the supreme star shine

Mark Word 64-bit VM storage architecture

7.1 biased locking

Principle of bias locking

In most cases, locks are not only not contested by multiple threads, but are always acquired multiple times by the same thread. In order to make the cost of the lock obtained by the thread lower, the lock bias is introduced.

Comparing with the Mark Word storage structure, when a thread accesses the synchronization code quickly, it will change the biased lock identifier in The Mark Word from 0 to 1, and store the ID of the current thread. When the thread enters and exits the synchronization code in the future, it does not need to conduct CAS operation to lock and unlock. A simple test is to see if the object header stores a bias lock to the current thread. If the result is successful, the thread has acquired the lock. If it fails, check whether the bias lock identifier in the Mark Word is set to 1. If not, use CAS to compete for the lock.

We can use code to observe:

Bias locking is enabled by default in Java 6 and Java 7, but it is activated a few seconds after the application starts. We need to turn off delayed startup first.

public class SyncExample4 {
 
    public static void main(String[] args) {
        Apple apple = new Apple();
        apple.start();
    }
}
 
class Apple extends Thread {
 
    private Object lock = new Object();
 
    @Override
    public void run() {
        synchronized (lock) {
            System.out.println(ClassLayout.parseInstance(lock).toPrintable());
        }
    }
}



 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           05 d8 86 22 (00000101 11011000 10000110 00100010) (579262469)
      4     4        (object header)                           9c 7f 00 00 (10011100 01111111 00000000 00000000) (32668)
      8     4        (object header)                           e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Copy the code

Because of size-side storage, the bias lock and lock-bit identifier were originally last, now we need to look at the first eight digits: 00000101

The first 1 represents a biased lock, and the lock identifier is 01, which also corresponds to our table.

Bias lock revocation

Biased locks use a mechanism that waits for a contention to release the lock, so the thread holding the biased lock will release the lock only when another thread is contending for it. However, the cancellation of the bias lock must wait until a global safe point at which no bytecode is being executed. It first suspends all threads (including those with biased locks), and then determines whether the current biased lock is biased. If the biased lock identifier is equal to 1, it will cancel back to 0.

Benefits of biased locking

The advantage of biased locking is also obvious. It is very efficient when only the same thread accesses the synchronized code block, just need to determine whether the thread and Mark Word stored in the same thread. If most of the locks in your program are accessed by different threads, then locking bias is unnecessary.

We can turn off biased locking with a JVM parameter: -xx: -usebiasedlocking

7.2 Lightweight Lock

What is a lightweight lock

Lightweight lock is a new type of locking mechanism in JDK6 has joined, the introduction of lightweight lock is for the purpose of, in a multithreaded execution alternately synchronized code block, to avoid the performance of the consumption caused by heavy lock, but if multiple threads at the same time into the critical region, leads to a lightweight heavyweight lock lock expansion upgrades to appear not to replace heavyweight so lightweight locked doors.

The stack frame

In the JVM, we have a heap and a stack, and the stack contains the methods of our object, and a method is a “stack frame.” Among them, the method can also store contents of herbier, including herbier Mark Word. What is the function of this product? And then we look down

Principle of lightweight lock

Before a thread executes synchronization code faster, the JVM creates space in the current thread’s frame to store the lock record and copies the Mark Word from the object header into the lock record. Product of the taliban taliban taliban taliban product of the taliban taliban taliban product of the taliban taliban product The JVM uses the CAS operation to try to update the object’s Mark Word to a pointer to the lock record. If it succeeds, the first thread acquires the lock and changes the lock bit identifier to 00. If it fails, it needs to determine whether the Mark Word of the current object points to the pointer of the current thread. If it does, it means that when the thread already holds the lock of the object, the synchronization code is executed quickly. If the lock object is not occupied by another thread, the light value of the lock object should be expanded to the heavy value of the lock, and the lock identifier should be changed to 10, and the following thread will be blocked.

Release of lightweight locks

When unlocking the product of herbier, CAS operation will be used to replace the product Mark Word back to the object head. If successful, it means that no competition occurs. If it fails, it indicates that the current lock is competing, and the lock expands to a heavyweight lock.

7.3 the spin lock

Spin-locking was introduced in JDK1.4 and is turned off by default. In JDK1.6, it is turned on by default.

Why spin locks? In plain English, spin locks are loops to acquire locks. Because in our lock upgrade process, if the thread fails to compete for the lock, it will immediately be suspended and wait to wake up, which is actually a high performance cost. It is possible that the lock was released while the thread was still being suspended, so there is a spinlock operation.

When a thread fails to compete for a lock, it spins first to try to acquire the lock. If the lock is held for a short period of time, spin wait works very well. On the other hand, if the lock is held for a long time, the spin thread will consume processor resources without doing anything. The default value of the spin default is 10 spins, which can be changed using the -xx :PreBlockSpin argument.

Adaptive optional lock

Adaptive optional locking comes in because we may have some performance costs associated with spin-locking, but we don’t know how many spins are appropriate. Adaptive self selection means that the spin time is no longer fixed, but is determined by the previous spin time in the same lock and the state of the owner. Assuming that it takes 10 spins on the same synchronized block of code to acquire the lock, the virtual machine will assume that it will also acquire the lock this time and allow slightly longer spins. Assuming that a synchronized block of code never spins, the virtual machine might omit the spinning process to avoid wasting performance.

Say it is better to light to some actual code, source code path: SRC/share/vm/runtime/objectMonitor. CPP, search: : TrySpin_VaryDuration

Int ObjectMonitor::TrySpin_VaryDuration (Thread * Self) {int CTR = Knob_FixedSpin; if (ctr ! = 0) { while (--ctr >= 0) { if (TryLock (Self) > 0) return 1 ; SpinPause () ; } return 0 ; } // Adaptive spin for (CTR = Knob_PreSpin + 1; --ctr >= 0 ; ) {if (TryLock(Self) > 0) {// Succeed, change the spin time int x = _SpinDuration; if (x < Knob_SpinLimit) { if (x < Knob_Poverty) x = Knob_Poverty ; _SpinDuration = x + Knob_BonusB ; } return 1 ; } SpinPause () ; }}Copy the code

7.4 the lock

Let’s start with the following code:

 public String getContent() {
        return new StringBuffer().append("a").append("b").append("c").toString();
    }



 @Override
    public synchronized StringBuffer append(String str) {
        toStringCache = null;
        super.append(str);
        return this;
    }
Copy the code

Appends in StringBuffers are synchronous, but our getContent method, we’re always doing a new object each time. So different threads come in and lock different objects, so there’s no thread problem at all. This is when the VIRTUAL machine even if the compiler (JIT) is running, some code requirements on the synchronization, but is detected that there is no possibility of shared data contention lock elimination, this is lock elimination.

7.5 lock coarsening

What is lock coarsening? The JVM detects that a series of small operations are locking the same object, amplifies the scope of the synchronized code block, and places it outside the string of operations so that the lock only needs to be added once.

 public static void main(String[] args) {
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < 100; i++) {
            sb.append("a");
        }
    }



 @Override
    public synchronized StringBuffer append(String str) {
        toStringCache = null;
        super.append(str);
        return this;
    }
Copy the code

If you look at the code above, the append method of a StringBuffer has a synchronization keyword in it, but we loop out 100 times and enter the lock 100 times and exit the lock 100 times, so the JVM coarses the lock. Remove the append method sync keyword and expand it outside, so it only needs to enter and exit once.

public static void main(String[] args) { StringBuffer sb = new StringBuffer(); synchronized (sb) { for (int i = 0; i < 100; i++) { sb.append("a"); }}}Copy the code

Eight, the strongest king

Final chapter: How to write code for synchroized optimization

Finally hit the king, don’t think hit the king on the line, there are some daily operations we also need to pay attention to.

Reduce the scope of sync blocks:

Synchronous code blocks are streamlined, execution is faster, and lightweight locks and spin locks can be done instead of upgrading to heavyweight locks.

public static void main(String[] args) { StringBuffer sb = new StringBuffer(); synchronized (sb) { System.out.println("a"); }}Copy the code

Reduce sync lock granularity:

Test01 and 02 do not have any business code, but the more identical the lock object, the less efficient the concurrency.

public class SyncExample4 {
 
    public void test01(){
        synchronized (SyncExample4.class){}
    }
    public void test02(){
        synchronized (SyncExample4.class){}
    }
}
Copy the code

Read/write separation:

As far as possible, read without locking, write and delete with locking, so that multiple threads can read data at the same time.

Here’s an example:

HashTable container under concurrent environment of intense competition, low efficiency because of multiple threads to compete the same lock, if the container have much lock, every lock part used to lock the container data, so multithreaded access container inside different data section of the data, the lock contention between threads will not exist, so as to effectively improve the concurrent access rates. This is the lock fragmentation technique of ConcurrentHashMap. Data is stored in segments and each segment is assigned a lock. When one thread accesses one segment, the other segments can be accessed by other threads.