Java small white series (3) : Synchronized advanced

One, foreword

“Java small white series (2) : keyword Synchronized”, we introduce the usage, role and slightly lower principle of Synchronized:

When is bytecode injection used?
When is the flag bit used?
When is an object lock used?
When are class locks used again?

If you forgot, check out Java White Series (ii) : Synchronized

If Synchronized is mutually exclusive whenever we want to perform multithreaded data synchronization, the lock will be heavier and lower in performance in our eyes. As a result, locks have been optimized in the JDK since 1.6 to be less extreme and completely exclusive when not necessary, and we’ll dive into the underlying JVM to talk about “Java locks.”

Two, lock optimization

2.1 understand the locks provided by the operating system

Having worked with UNIX/Linux operating systems, we should be familiar with how threads synchronize:

Mutex;
Condition variable + sleep/wake up mechanism (if the condition is not met, it will automatically enter sleep until the condition is met);
Read/write lock (allows multiple threads to read simultaneously and can be considered shared; But write is exclusive and does not allow other read and write);
Semaphore (SEM);

Differences between MUtex and SEM:

Mutex allows only one thread to enter the critical region, while SEM allows multiple threads to enter the critical region;
Sem emphasizes that when multiple threads enter the critical area, they should be orderly.
Therefore, it is better to use SEM when there are multiple resources. When the resource degrades to 1, it is equivalent to MUtex.

Java’s Synchronized is based on Mutex, so only one thread can be occupied at a time while the others are blocked.

2.2 Storage of Synchronized locks

When multiple threads are synchronized, locks are used to control thread entry/blocking, so locks also need to be stored somewhere. Let’s think about what information a lock might contain.

Lock status: no lock, lock;
Current holder (thread);
The current wait (thread);

At the same time, let’s consider another problem: multiple threads concurrently access, need to first try to acquire the lock, that is, multiple threads need to concurrently apply for the lock, and then set the lock, so the lock also need thread synchronization, which falls into an infinite loop. Clever Java designers discovered this problem and came up with a clever solution: store the lock information directly in the object, in the object header.

2.3. Object Header

When a Class is loaded into the JVM (that is, instantiated as an object), the object has an object header in memory in addition to its own data.

For ordinary objects, there are two types of information in the object header: Mark Word and Class Metadata Address.
If it’s an Array, there’s an Array length;

Note:

Mark Word and Class Metadata Address are 32-bit / 64-bit on 32-bit / 64-bit systems.
Array Length is 32 bits at all times;

2.4, Mark Word

Let’s look at the format of Mark Word on a 32-bit / 64-bit system

We can see:

Mark Word stores lock information as well as hashcode and GC generation age;
Lock status: No lock -> Partial lock -> Lightweight lock -> Heavyweight lock (Sequence deepened)

3. Introduction of lock status

Before I introduce the four types of locks, I’ll start with a diagram from the official (OpenJDK) :

The figure above shows how JDK1.6 optimizes Synchronized locking. Previously, there were only two types of lock: no-lock and heavyweight. Therefore, we used to say that Synchronized was heavy and had poor performance. However, with JDK1.6 optimization, Synchronized has a hard time “bulking” into a heavyweight lock state under non-extreme circumstances. At the end of the above sentence, I use one word: bloat! Lock from no lock to heavy lock process, is a continuous expansion of the process, and irreversible!

3.1. Biased lock

$\color{red}{function: Reduce the cost of the lock for the same thread! }$

In most cases, there is no multithreaded contention for a lock and it is always acquired multiple times by the same thread, so the lock is biased.

If a thread acquires a lock, the lock goes into bias mode, where Mark Word records the ID of the thread. When the thread requests the lock again, there is no need to do any synchronization operation, that is, it needs to check the lock marker bit of Mark Word = biased lock and threadID = the threadID when acquiring the lock. Therefore, the operation of lock application is omitted.

3.2 lightweight locks

Lightweight locks are derived from bias lock inflation (escalation), which is immediately upgraded to lightweight when a second thread requests the object lock. Note: this is only the second thread to apply for the lock, there is no two threads competing for the lock; For example, two threads alternately execute synchronized code blocks one in front of the other.

3.3. Heavyweight Locks

Similarly, when there are multiple threads competing for a lock at the same time, the lock is immediately upgraded to a heavyweight lock, and the unlock resulting from applying for a lock becomes larger.

3.4. Thread status

Locks have different forms, as well as different states when multiple threads are competing.

Let’s start by looking at the various states of threads in the operating system (simple version) :

Let’s look at the state of the Java thread:

Why am I giving the status of threads in the operating system? In fact, comparing the two pictures, we can see that there is no difference between them. If we are more abstract (a simpler generalization of thread state), there are only three kinds:

Ready;
Running state;
The state is Blocked/dormant.

Let’s talk about Monitor again

The JVM uses Monitor to Monitor the entry and exit of threads, and locks are stored and expanded in the section “State of the lock” (Synchronized). In this section, we dig a little deeper into HotSpot, the JVM implementation, to see the rough implementation.

4.1. The meaning of each object of Java thread state

ContentionList: All threads requesting locks are first entered into the ContentionList queue;
EntryList: Threads from the ContentList that qualify as candidates will be moved to this queue;
WaitSet: where threads blocked by calling the wait method are placed;
OnDeck: There is at most one thread competing for the lock at any one time, which we call OnDeck;
Owner: Owner of the lock.

4.2 Virtual queue: ContentionList

The ContentionList is not a real queue, but a virtual queue, so called because it is a seemingly queue logically composed of Node and its next Pointers. The first in, first out (FIFO) queue is a first in, first out (FIFO) queue, but the last in, first out (LIFO) list is a LIFO. Each new Node added to the queue is at the head of the queue. Change the pointer of the first node to the new node, and set next of the new node to point to subsequent nodes.

4.3, EntryList

ContentionList can be accessed concurrently by multiple threads. In order to reduce the contention between ContentionList queues, EntryList is established. In unlock, the Owner thread migrates threads from the ContentionList to the EntryList and specifies that one of the threads (usually the Head) is in the Ready state, OnDeck. The Owner does not give the lock to the OnDeck thread, but gives the right to compete to the OnDeck thread, which also needs to compete for the lock, although sacrificing certain fairness, but greatly improving throughput. In HotSpot, the behavior of OnDeck is called “competing switch”.

4.4, OnDeck

If OnDeck gets the lock, it becomes the Owner thread; If not, it stays in the EntryList. For fairness, its position does not change (still at the head of the queue). If the Owner thread is blocked by wait, it is moved to the WaitSet queue. If the notify/notifyAll is subsequently awakened, the EntryList queue is entered again.

4.5 Monitor object structure: ObjectMonitor

Monitor is an object that has its own structure. It depends on the implementation of the OS and switches between user and kernel mode, which has some performance overhead.

ObjectMonitor::ObjectMonitor() { _header = NULL; _count = 0; _waiters = 0, _recursions = 0; _object = NULL; _owner = NULL; // The thread that holds the Monitor calls it: Owner _WaitSet = NULL; _WaitSetLock = 0 ; _Responsible = NULL ; _succ = NULL ; _cxq = NULL ; // ContentionList FreeNext = NULL ; _EntryList = NULL ; _EntryList _SpinFreq = 0; _EntryList _SpinFreq = 0; _SpinClock = 0 ; / / spin! OwnerIsThread = 0; OwnerIsThread = 0; OwnerIsThread = 0; }Copy the code

If a thread is blocked, it will enter the kernel state, which will cause the user state to switch to the kernel state and increase the performance overhead of the lock. This is acceptable if you block for a long time; However, if it is blocked for a short time, the thread may switch back to user mode as soon as it switches to kernel mode. So the solution to this very temporary blockage is spin.

While the spin program avoids the performance overhead of locking, it also increases the CPU’s burden in the short term (e.g., 100 cycles can consume CPU time slices).

Five, lock other optimization

5.1 Lock elimination

It eliminates contention at JIT compilation by scanning the running context to remove contention locks that are unlikely to exist.

public class Test { public void method() { Object object = new Object(); Synchronized (object) {system.out.println (" runtime found no contention "); synchronized (object) {system.out.println (" runtime found no contention "); }}}Copy the code

At run time it is optimized to:

public class Test { public void method() { Object object = new Object(); System.out.println(" run time found no contention "); }}Copy the code

5.2. Lock coarsening

This is also an optimization that reduces the number of locks added and released by increasing the range of locks.

public class Test { public void method() { for (int i = 0; i < 10000; I ++) {synchronized (this) {system.out.println (" runtime found no contention "); }}}}Copy the code

Is optimized as:

public class Test { public void method() { synchronized (this) { for (int i = 0; i < 10000; I++) {system.out.println (" run time found no contention "); }}}}Copy the code

We can think that the above two optimizations can sometimes be avoided artificially in advance, that is, with good design in the stage before coding, then these situations can actually be avoided.

Six, the concluding

Synchronized is an important part of concurrent programming, and we can see how important it is through the continuous optimization of the JDK; Only a true understanding of how it works will improve runtime performance.