This article is based on Android 11(R)
Locking a critical region in Java is usually done with the Synchronize code block, so the “lock” in the title is actually a dissection of the Synchronize keyword. The Synchronize code block must pass in an object, which can be this, a class object (e.g. foo.class), or any other object. Therefore, we can say that the state of the lock is associated with the object. Or maybe every object is inherently a lock.
The bytecode generated by Synchronize corresponds to two instructions, monitor-Enter and monitor-exit. Let’s look for the final implementation of monitor_Enter in both interpretation and machine code directions.
Explain to perform
[art/runtime/interpreter/interpreter_switch_impl-inl.h]
HANDLER_ATTRIBUTES bool MONITOR_ENTER(a) {... ObjPtr<mirror::Object> obj =GetVRegReference(A());
if (UNLIKELY(obj == nullptr)) {... }else{ DoMonitorEnter<do_assignability_check>(self, &shadow_frame, obj); < = = = call... }}Copy the code
[art/runtime/interpreter/interpreter_common.h]
static inline void DoMonitorEnter(Thread* self, ShadowFrame* frame, ObjPtr<mirror::Object> ref)
NO_THREAD_SAFETY_ANALYSIS
REQUIRES(! Roles::uninterruptible_) {...StackHandleScope<1> hs(self);
Handle<mirror::Object> h_ref(hs.NewHandle(ref)); < = = = call h_ref - >MonitorEnter(self); . }Copy the code
[art/runtime/mirror/object-inl.h]
inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
return Monitor::MonitorEnter(self, this./*trylock=*/false); < = = = call}Copy the code
The monitor-Enter directive will eventually call the monitor ::MonitorEnter static function, as you can see from the code above.
Machine code execution
[art/runtime/arch/arm64/quick_entrypoints_arm64.S]
ENTRY art_quick_lock_object_no_inline // This is also the slow path for art_quick_lock_object. SETUP_SAVE_REFS_ONLY_FRAME // save callee saves in case we block mov x1, XSELF // pass Thread::Current bl artLockObjectFromCode // (Object* obj, Thread*) <===... END art_quick_lock_object_no_inlineCopy the code
[art/runtime/entrypoints/quick/quick_lock_entrypoints.cc]
extern "C" int artLockObjectFromCode(mirror::Object* obj, Thread* self){...if (UNLIKELY(obj == nullptr)) {... }else {
ObjPtr<mirror::Object> object = obj->MonitorEnter(self); // May block <=== call. }}Copy the code
[art/runtime/mirror/object-inl.h]
inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
return Monitor::MonitorEnter(self, this./*trylock=*/false); < = = = call}Copy the code
All the same, machine code execution also ends up calling Monitor::MonitorEnter.
Two forms of lock
Virtual machines implement locks in two forms, one called Thin Lock and the other called Fat Lock.
Thin Lock is used in scenarios where competition is weak. Instead of system calls and context switches, spin (spin) and yield (YIELD) wait for the lock when a race occurs. When the thread holding the lock completes the operation quickly, the short spin is less expensive than the context switch.
However, if the spin fails to acquire the Lock for a while, the Thin Lock expands to a Fat Lock, adding data structures to store specific lock-related information, and suspending the thread through system calls.
To sum up, Fat Lock is functional, but expensive. Thin Lock has a small overhead, but it cannot be used for long waits. Therefore, the practice is to use Thin Lock first, and then expand to Fat Lock when the function is not satisfied.
As mentioned at the beginning of the article, every object is inherently a lock. So where does this lock information reside on the object?
Art: the answer is: mirror: the Object of the Object in the head (see the art of the virtual machine | Java objects and classes of memory structure). The object header contains a 4-byte long field, Monitor_, which stores lock-related information.
4 bytes a total of 32bits. The two bits with the highest value are used to mark the status. Information stored in different states has different meanings. Two bits have four states: ThinOrUnlock(Thin/Unlock shares the same state), Fat, Hash, and ForwardingAddress. ThinOrUnlock and Fat indicate lock states, Hash is the Hash value used to generate the HashMap for an object, and ForwardingAddress is the state used for GC.
In the figure above, m represents the mark bit state and R represents the read barrier state, both of which are used with GC and can be ignored when discussing locks.
When we perform monitor-Enter on an idle object, the lock state changes from Unlock to Thin. Here’s the code.
switch (lock_word.GetState()) {
case LockWord::kUnlocked: {
// No ordering required for preceding lockword read, since we retest.
LockWord thin_locked(LockWord::FromThinLockId(thread_id, 0, lock_word.GCState()));
if (h_obj->CasLockWord(lock_word, thin_locked, CASMode::kWeak, std::memory_order_acquire)) {
...
return h_obj.Get(a);// Success!
}
continue; // Go again.
}
Copy the code
The size of the LockWord Object is just 4 bytes, so it can be considered equivalent to the ART :: Mirror ::Object monitor_ field, except that it has built-in methods to manipulate the 4-byte information. When the lock state switches, the thread ID of the current thread (not tid; it starts at 1 for each process) is stored in the Monitor_ field, with the M and R flags associated with GC remaining unchanged.
When an object is locked by a thread, a Thin Lock reentrant occurs, assuming that we monitor-enter the object again in the same thread. If the object is monitor-enter on a different thread, a Thin Lock race occurs. The code and flow chart are shown below.
case LockWord::kThinLocked: {
uint32_t owner_thread_id = lock_word.ThinLockOwner(a);if (owner_thread_id == thread_id) {
uint32_t new_count = lock_word.ThinLockCount() + 1;
if (LIKELY(new_count <= LockWord::kThinLockMaxCount)) {
LockWord thin_locked(LockWord::FromThinLockId(thread_id, new_count, lock_word.GCState()));
if (h_obj->CasLockWord(lock_word,
thin_locked,
CASMode::kWeak,
std::memory_order_relaxed)) {
AtraceMonitorLock(self, h_obj.Get(), /* is_wait= */ false);
return h_obj.Get(a);// Success!
}
continue; // Go again.
} else {
// We'd overflow the recursion count, so inflate the monitor.
InflateThinLocked(self, h_obj, lock_word, 0); }}else {
// Contention.
contention_count++;
Runtime* runtime = Runtime::Current(a);if (contention_count <= runtime->GetMaxSpinsBeforeThinLockInflation()) {
sched_yield(a); }else {
contention_count = 0;
// No ordering required for initial lockword read. Install rereads it anyway.
InflateThinLocked(self, h_obj, lock_word, 0); }}continue; // Start from the beginning.
}
Copy the code
Fifty sched_yield runs are required before ThinLock bulges into FatLock. Sched_yield puts the current thread at the end of the CPU scheduling queue so that it doesn’t have to hang on and hog the CPU all the time. But the Android Master branch has optimized the process again to perform 100 spins before the 50 sched_yield. In contrast to sched_yield, spin does not free the CPU. Since a single sched_yield also takes microseconds, using spin saves time for extremely short lock holding times.
Next, the lock expansion process.
void Monitor::InflateThinLocked(Thread* self, Handle<mirror::Object> obj, LockWord lock_word,
uint32_t hash_code) {
DCHECK_EQ(lock_word.GetState(), LockWord::kThinLocked);
uint32_t owner_thread_id = lock_word.ThinLockOwner(a);if (owner_thread_id == self->GetThreadId()) {
// We own the monitor, we can easily inflate it.
Inflate(self, self, obj.Get(), hash_code);
} else {
ThreadList* thread_list = Runtime::Current() - >GetThreadList(a);// Suspend the owner, inflate. First change to blocked and give up mutator_lock_.
self->SetMonitorEnterObject(obj.Get());
bool timed_out;
Thread* owner;
{
ScopedThreadSuspension sts(self, kWaitingForLockInflation);
owner = thread_list->SuspendThreadByThreadId(owner_thread_id,
SuspendReason::kInternal,
&timed_out);
}
if(owner ! =nullptr) {
// We succeeded in suspending the thread, check the lock's status didn't change.
lock_word = obj->GetLockWord(true);
if (lock_word.GetState() == LockWord::kThinLocked &&
lock_word.ThinLockOwner() == owner_thread_id) {
// Go ahead and inflate the lock.
Inflate(self, owner, obj.Get(), hash_code);
}
bool resumed = thread_list->Resume(owner, SuspendReason::kInternal);
DCHECK(resumed);
}
self->SetMonitorEnterObject(nullptr); }}Copy the code
void Monitor::Inflate(Thread* self, Thread* owner, ObjPtr<mirror::Object> obj, int32_t hash_code) {
DCHECK(self ! =nullptr);
DCHECK(obj ! =nullptr);
// Allocate and acquire a new monitor.
Monitor* m = MonitorPool::CreateMonitor(self, owner, obj, hash_code);
DCHECK(m ! =nullptr);
if (m->Install(self)) {
if(owner ! =nullptr) {
VLOG(monitor) << "monitor: thread" << owner->GetThreadId() < <" created monitor " << m << " for object " << obj;
} else {
VLOG(monitor) << "monitor: Inflate with hashcode " << hash_code
<< " created monitor " << m << " for object " << obj;
}
Runtime::Current() - >GetMonitorList() - >Add(m);
CHECK_EQ(obj->GetLockWord(true).GetState(), LockWord::kFatLocked);
} else {
MonitorPool::ReleaseMonitor(self, m); }}Copy the code
The operation of the Inflate is simple; simply create a Monitor object, store more information, and put the Monitor Id into the original Monitor_ field.
The key is the sufficient conditions for expansion. If a Thin Lock is already owned by the thread, expansion can be done without anyone’s consent. However, if the Thin Lock is held by another thread, then the thread must be paused before ballooning (this does not mean that the thread is scheduled from the CPU, but that it is not allowed to enter the Java world to change the Lock state) to prevent competing updates to the Lock information during ballooning. After the bloat, the holding thread resumes running, at which point it sees a Fat Lock.
When the Lock expands to a Fat Lock, the thread tries again because the Lock holding action has not been completed. Only this time, go to the Fat Lock branch and execute the following code.
case LockWord::kFatLocked: {
// We should have done an acquire read of the lockword initially, to ensure
// visibility of the monitor data structure. Use an explicit fence instead.
std::atomic_thread_fence(std::memory_order_acquire);
Monitor* mon = lock_word.FatLockMonitor(a);if (trylock) {
return mon->TryLock(self) ? h_obj.Get() : nullptr;
} else {
mon->Lock(self);
DCHECK(mon->monitor_lock_.IsExclusiveHeld(self));
return h_obj.Get(a);// Success!}}Copy the code
{
ScopedThreadSuspension tsc(self, kBlocked); // Change to blocked and give up mutator_lock_.
// Acquire monitor_lock_ without mutator_lock_, expecting to block this time.
// We already tried spinning above. The shutdown procedure currently assumes we stop
// touching monitors shortly after we suspend, so don't spin again here.
monitor_lock_.ExclusiveLock(self);
}
Copy the code
The code’s ScopedThreadSuspension object is used to complete the thread state switch. It is called scoped because it switches and recovers the state through construction and destructor. Local variables in a scope are destructed automatically when the scope ends, so the curly braces are closed and the thread state is switched from Blocked back to Runnable.
The ExclusiveLock method of Monitor_lock_ (Mutex object) is finally called.
void Mutex::ExclusiveLock(Thread* self) {
if(! recursive_ || !IsExclusiveHeld(self)) {
#if ART_USE_FUTEXES
bool done = false;
do {
int32_t cur_state = state_and_contenders_.load(std::memory_order_relaxed);
if (LIKELY((cur_state & kHeldMask) == 0) /* lock not held */) {
done = state_and_contenders_.CompareAndSetWeakAcquire(cur_state, cur_state | kHeldMask);
} else{...if (!WaitBrieflyFor(&state_and_contenders_, self,
[](int32_t v) { return (v & kHeldMask) == 0; {}))// Increment contender count. We can't create enough threads for this to overflow.
increment_contenders(a);// Make cur_state again reflect the expected value of state_and_contenders.
cur_state += kContenderIncrement;
if (UNLIKELY(should_respond_to_empty_checkpoint_request_)) {
self->CheckEmptyCheckpointFromMutex(a); }do {
if (futex(state_and_contenders_.Address(), FUTEX_WAIT_PRIVATE, cur_state,
nullptr.nullptr.0) != 0) {... cur_state = state_and_contenders_.load(std::memory_order_relaxed);
} while((cur_state & kHeldMask) ! =0);
decrement_contenders(a); }}}while(! done); . exclusive_owner_.store(SafeGetTid(self), std::memory_order_relaxed);
RegisterAsLocked(self); } recursion_count_++; . }Copy the code
Mutex::ExclusiveLock finally enters the kernel state through the FUtex system call, where the current thread is scheduled from the CPU and suspended. Note that FatLock still has spin and yield operations (WaitBrieflyFor). This is because it is difficult to deflate a Thin Lock once it has expanded to a FatLock, and subsequent FatLock uses will still be temporarily held. This also means that the previous optimizations are still available here.
This piece of code is the core implementation of locking and is called a lot, so any minor optimizations are important. Before I wrote an article | debugging experience in c + + memory order and a relative stability problem are analyzed in detail by a memory in order to use thread stuck problem, which also introduced the c + + memory order, It is also the underlying (ART) implementation of the Java volatile keyword.
I also mentioned the ‘ExclusiveLock’ bug to Google, which can either consume the battery or cause the system to freeze in some cases. Here’s Google’s response, and those interested can check out the fix.