One point about Synchronized, 99% of the articles on the Internet are wrong

Was originally in writing face bully series, write to write to write this one:

You know the principle of Synchronized?

As for Synchronized, I went through the JVM HotSpot 1.8 source code last year to do a wave of research. At that time, I found that there is a point, almost all articles on the Internet including “The Art of Java Concurrent Programming” also said a point.

When the lightweight CAS fails, the current thread attempts to acquire the lock using spin.

That’s what I thought at first, because that’s what they say, and it makes sense.

Because heavyweight locks block the thread, if the lock is executed very fast, then a little bit of spin will stop the other thread from needing to lock, so it won’t have to block the thread and wake it up.

This is not the case when I look at the source code. It is in synchronizer.cpp.

So if the CAS fails, there is no spin operation. If the CAS succeeds, it returns directly. If the CAS fails, the following lock expansion method is performed.

I went to the expansion of the lock code ObjectSynchronizer: : inflate flipped through, also didn’t see the spin operation.

So from the source code lightweight CAS lock failure does not spin but expands directly into a heavyweight lock.

To optimize performance, however, spin does exist in Synchronized.

This is after a heavyweight lock has been upgraded, and if the thread does not fight for the lock, it spins to wait for the lock to be released.

Let’s still look at the source code to speak, only comments in fact already said very clear:

After all, the overhead of queuing and waking up a blocked thread is a bit high.

Consider the TrySpin operation, which has adaptive spin. The actual function name TrySpin_VaryDuration indicates that spin changes.

Synchronized spins are Synchronized, heavyweight locks fail to compete with each other and lightweight locks do not (at least in the 1.8 source code). If someone disagrees with you, please write this article to them.

But having said that, I’d like to move on to Synchronized, which seems to get a lot of publicity.

How far does this article go about Synchronized?

Later, if an interviewer asks you what source code you read?

After reading this article, you can answer: I’ve seen the source code for the JVM.

Of course, the source code is a bit much, I put Synchronized related to all the operations have been over, or a little difficult.

However, readers who have read my source code analysis before will know that I always draw a flow chart to organize, so even if the code does not understand, the flow is still clear!

All right, let’s go!

Let’s start with heavyweight locks

Synchronized was only a heavyweight lock before 1.6.

Because there will be thread blocking and wake up, this operation is realized by system call of the operating system, common Linux is to use the MUtex of pThread to achieve.

I have taken a screenshot of the source code for calling thread blocking and can see that mutex is indeed used.

When it comes to system calls, there will be context switching, namely user-mode and kernel-mode switching, which we know is quite expensive.

That’s why it’s called a heavyweight lock, and that’s why we have the adaptive spin operation mentioned above, because we don’t want to go this far!

Let’s look at how heavyweight locks work

The Synchronized keyword modifies code blocks, instance methods, and static methods, all of which are essentially applied to objects.

The code block applies to the object in parentheses, the instance method is the current instance object, this, and the static method is the current class.

There’s a concept called a critical region.

We know that there is a race because there is a shared resource, and multiple threads want that shared resource, so a region is created, and the code that operates on that shared resource is in that region.

In order to enter this area, you must hold a lock, otherwise you cannot enter. This area is called the critical area.

When you modify a block of code with Synchronized

At this point, the compiled bytecode will have monitorenter and Monitorexit directives. I’m used to understanding them as critical sections. Enter means to enter the critical section and exit means to exit the critical section.

These instructions are actually related to the object that decorates the code block, the lockObject in the previous code.

Each object has a Monitor object associated with it, and the thread executing the Monitorenter directive attempts to acquire ownership of the Monitor. If it does, it successfully acquires the lock.

This monitor will be examined in detail later, but let’s first look at what the generated bytecode looks like.

The bytecode compiled by the lockObject method is at the top of the picture, and the lockObject method is at the bottom, which makes it easier to understand.

As you can see from the screenshot, monitorenter executes before System.out, which contention locks to enter the critical section.

The monitorexit directive follows the call to release the lock and exit the critical section.

I also marked monitorexit in the figure, which also needs to be unlocked due to exceptions, otherwise it will be deadlocked.

From the generated bytecode, we can also see why synchronized does not need to be manually unlocked.

Someone is carrying a load for us! The bytecode generated by the compiler is done for us, and exceptions are taken into account.

When synchronized modifiers are used

The bytecode generated by the modifier method is not quite the same as that generated by the modifier code block, but it is essentially the same.

The Monitorenter and Monitorexit directives are not present in the bytecode at this time, but the access tag for the current method has been tampered with.

I’m using the IDEA plugin here to look at bytecodes, so the literal results are different, but the flag is the same: 0x0021, a combination of ACC_PUBLIC and ACC_SYNCHRONIZED.

The principle is that the method is marked with the ACC_SYNCHRONIZED flag, which distinguishes it from the synchronized flag in the runtime constant pool, so that the JVM knows that the method is synchronized. Therefore, when entering the method, it will execute the lock contention operation, and only get the lock can continue to execute.

Then, whether it is normal exit or abnormal exit, the operation will be unlocked, so the essence is the same.

There is also an implicit lock object that I mentioned above, the modifier instance method is this, and the modifier class method is the current class (there are pitfalls to this, which I’ve examined in this article).

I remember an interview question that seemed to come from bytedance, where the interviewer asked about the bytecode level difference between synchronized modifications and code blocks. .

How do you say? Bytedance is getting closer before you know it.

Let’s dive further into synchronized

Synchronized acts on objects. Synchronized acts on objects. Synchronized acts on objects.

In Java, object structures are divided into object headers, instance data, and aligned padding.

Object headers are divided into MarkWord, Klass Pointer, and array lengths (only arrays have them). Our focus is on locks, so we only focus on MarkWord.

Let me draw the memory layout of MarkWord in different 64-bit states. (I’m not going to change the monitor, but I’m not going to change it.

The MarkWord structure is complicated by the need to save memory and have the same memory area be used for different purposes at different stages.

Remember this diagram, all locking operations are strongly associated with MarkWord.

As you can see in the figure, the lock flag bit of the object header is 10 and there is a pointer to the monitor object, so the lock object and the monitor are related in this way.

The monitor is implemented in c++ in HotSpot, called ObjectMonitor, which is the implementation of the pipe procedure, and also called monitor.

It looks like this, and I have commented the meanings of the key fields, and I have also commented the header file:

Remember for a moment, the source code and these fields are very relevant.

Underlying principle of synchronized

If you don’t understand it, you can get a general impression of the flow:

Okay, let’s move on.

Earlier we mentioned monitorenter, which executes the following code: monitorenter

We’re currently analyzing heavyweight locks, so we don’t care about biased code, and the slow_Enter method, shown in the screenshot at the beginning of the article, will eventually be executed in ObjectMonitor:: Enter.

The key is to set the _owner in ObjectMonitor to the current thread via CAS. If this is set successfully, the lock is acquired successfully.

Then reentrant is represented by the increment of recursions.

If CAS fails, the following loop is executed:

The EnterI code is already in the screenshot above. Here again, I add the important enqueue operations and remove some unimportant code:

Try to get the lock, if not, then auto-spin, if not, wrap it as an ObjectWaiter object and add it to _CXq. If you still can’t get the lock, then block, so here’s another block method.

You can see that either branch executes Self->_ParkEvent->park(), which is the call to pthread_mutex_lock mentioned above.

So far, the process of lock grabbing is very clear, so let me draw a picture to clarify it.

Now let’s look at how to unlock it

ObjectMonitor:: Exit is the method called when unlocking.

(1) recursions++ (0) and _recursions++ (0) are used to unlock the lock.

The unlocked thread will then wake up the waiting thread. There are several modes. Let’s look at them.

If QMode == 2&&_cxq! = NULL:

If QMode == 3&&_cxq! When = NULL, I intercept some code:

If QMode == 4&&_cxq! = NULL:

If QMode is not 2, it will execute:

At this point, the process of unlocking is complete! Let me draw another flow chart:

Let’s look at the method that calls WAIT

Nothing to do, just add the current thread to the _waitSet bidirectional list and execute ObjectMonitor::exit to release the lock.

Now look at the method that calls notify

There is no fancy header, just take the node from the _waitSet header and wake it up at the head or end of the CXQ or EntryList, depending on the policy.

A notifyAll, I’m not going to analyze it, is just a notifyAll that goes through a loop.

So far synchronized several operations are alive, go out to say that their in-depth study of synchronized.

Now if you look at this picture, you should have a pretty good idea.

Why are there _CXq and _EntryList lists for threads?

Since there are multiple threads competing for locks at the same time, a _CXQ unidirectional linked list is created to hold the concurrency based on CAS, and a _EntryList bidirectional linked list is created to move some thread nodes at each wake up to reduce the _CXQ tail contention.

The introduction of spin

The principle of synchronized should be generally clear, and we also know that the system call will be used at the bottom level, which will have a large overhead, so think about how to optimize?

You know from the headlines, the solution is spin, and I said that at the beginning of the article, but I’ll mention it again.

Spin is essentially idling the CPU, executing meaningless instructions in order to keep it from waiting for the lock to be released.

Normally, a lock failure would block enqueueing, but sometimes, as soon as the lock is blocked, another thread releases the lock and wakes up the thread that just blocked, which is not necessary.

So when the thread contention is not very fierce, a little bit of spin, may not need to block the thread can directly acquire the lock, which avoids unnecessary overhead, improve the lock performance.

However, the number of spins is another difficulty. In a competitive situation, spins are a waste of CPU, because the result is sure to be that the spins will block after a while.

So Java introduced adaptive spin, which dynamically adjusts the number of spins based on the number of spins you had last time, and that’s called doing things with history.

Note that this is the heavyweight lock step, don’t forget the beginning of the article ~.

At this point, the principle of synchronized heavyweight locks should be clear, right? The subtotal

Synchronized is implemented by using monitor objects, CAS and MUtex mutexes. There are internal wait queues (CXQ and EntryList) and conditional wait queues (waitSet) to store the corresponding blocked threads.

The thread that does not compete for the lock is stored in the wait queue, and the thread that has obtained the lock is stored in the conditional wait queue after invoking wait. Unlock and notify will wake up the waiting thread in the corresponding queue to compete for the lock.

However, because blocking and wake up depend on the implementation of the underlying operating system, the system call has a switch between user state and kernel state, so it has a high overhead, so it is called heavyweight lock.

Therefore, adaptive spin mechanism is introduced to improve the performance of the lock.

Now it’s time to introduce lightweight locks

Let’s consider a scenario where multiple threads request the same lock at different times, and there is no need to block threads at all, not even monitor objects, so the concept of lightweight locking is introduced to avoid system calls and reduce overhead.

This scenario is still common, perhaps the norm, when lock competition is not intense, so the introduction of lightweight locks is necessary.

Before explaining how lightweight locks work, look at the previous MarkWord diagram.

The lightweight lock operates on the MarkWord of the object header.

If the current thread is judged to be unlocked, an area called LockRecord is marked in the current stack frame of the current thread stack. Then copy the lock object’s MarkWord into a LockRecord called DHW (the set_displaced_header method).

The lock object header is then pointed to the LockRecord via CAS.

Lightweight lock locking process:

If the current lock state is held by the current thread, null is put into DHW, which is the logic of reentrant locking.

Let’s take a look at the logic behind lightweight locks:

The logic is simple, that is, to swap the markword (DHW) stored in the LockRecord in the current stack frame back to the object header via CAS.

If the DHW obtained is null, it indicates reentrant, so it can be returned directly. Otherwise, CAS is used to change. If CAS fails, it indicates that there is a competition at this time, then expansion!

Let me say a few more words about this lightweight locking.

Each time a lock is added, it must be in a method call, and a method call is a stack frame. If it is a lightweight lock reentrent, then the DHW in the stack frame is null, otherwise it is the markword of the lock object.

In this way, the DHW value can be used to determine whether the DHW is reentrant.

Now we’re going to introduce biased locking

Let’s consider if there is a scenario where only one thread holds the lock at the beginning and no other thread contests it, and frequent CAS are unnecessary and costly.

So JVM researchers have created a bias lock, a bias lock, so that the thread can directly acquire the lock.

Let’s look at the graph again, and the bias lock is in the second row.

If the current lock object supports biased locking, then CAS will record the address of the current thread (also as a unique ID) into markWord and set the last three digits of the marker field to 101.

When a thread later requests the lock, you just need to determine whether the last three digits of the markword are 101 and point to the address of the current thread.

Another point that many articles may miss is the need to determine whether the EPOCH value is the same as the epoch value in the class of the lock object.

If yes, then the current thread holds the bias lock and can return directly.

What is the epoch for?

Can be understood as the generation of bias lock.

Biased locks perform undo operations when they are contested, and are upgraded to lightweight locks.

Revocation of the number of too much when a class object, such as a Yes is the object of the class as a biased locking, often has been revoked, the number reached a certain threshold (XX: BiasedLockingBulkRebiasThreshold, the default is 20) would bring contemporary biased locking abandoned, add a class epoch.

Therefore, when the epoch values of the class object and the lock object are different, the current thread can re-bias the lock to itself because the previous generation bias lock is obsolete.

However, to ensure that the executing thread holding the lock does not lose the lock because of this, biased lock cancellation requires that all threads be at a safe point, and then iterate through the Java stack of all threads to find instances of the class that have locked, and increment the epoch value in their tag field by one.

When to withdraw more than another threshold (XX: BiasedLockingBulkRevokeThreshold, the default value is 40), the abandoned such bias function, that is to say, this class cannot be biased.

The whole Synchronized process should be clear by now.

I’m talking about the lock upgrade process in reverse, because in fact there are heavyweight locks first, and then based on the actual analysis of the optimization to get biased locks and lightweight locks.

Including some details during the period should also be more clear, I think about Synchronized to understand this is almost.

Here’s another openJDK wiki image to see if it’s clear:

The last

The reason for the analysis of the source code, because of the data, but a lot of details are not clear, and then very uncomfortable, so there is no way to only bite the bullet.

For me, who basically can’t do c++, this is a bit difficult…. Off and on for a week.

I didn’t plan to write that much, I just wanted to write about the spin part… You can’t stop doing it.

And if anything goes wrong, call me.

This article is a bit too much code, I don’t know how many people can bear to read this…

I think those who see here are masters! Can you give me a minus one?

Shoulders of giants

“In-depth Dismantling of Java Virtual Machine” Zheng Yudi

Wiki.openjdk.java.net/display/Hot…

Docs.oracle.com/javase/spec…

Wechat search a search [yes training level raiders] more articles waiting for you to read, my article summary: github.com/yessimida/y… Welcome to star!

I am yes, from a little bit to a billion little bit, welcome to see, forward, leave a message, we will see you next.

One point about Synchronized, 99% of the articles on the Internet are wrong

Let’s start with heavyweight locks

Now it’s time to introduce lightweight locks

Now we’re going to introduce biased locking

The last

Shoulders of giants

Related Posts

Redis in the powerful data structure skiplist (internal details and practical application

Channel | Go Theme month in 5 minutes

Have you learned the linked list? It’s enough to know how to do these problems. The idea is all in the GIF.