Template interpreter

We all know that Java can be compiled once and run anywhere entirely because of bytecodes, which act as the middle layer and mask the low-level details. But to be executed on a machine, it ultimately has to be translated into machine instructions.

The JVM is written in C/C++, and when Java programs are compiled, they generate many bytecode instructions. Each bytecode instruction is executed at the bottom of the JVM, and a pile of C code is compiled to program many machine instructions, so that our Java code gets to the machine instruction level. The resulting machine instructions are exponential, which leads to inefficient Java execution.

Early JVMS were notorious for slow interpretation, so is there a way to optimize this problem? We found that it was slow because there was a layer of C/C++ between Java and machine instructions, and compilers like GCC were not fully intelligent enough to produce machine code that was not very efficient. So we just skip the C/C++ layer and go straight to Java bytecode and native machine code.

So HotSpot’s engineers scrapped the earlier interpretive actuator in favor of a template actuator. The so-called template is a Java bytecode manually written into a fixed pattern of machine instructions, this part does not need the help of GCC, which can greatly reduce the final need to execute machine instructions, so as to improve efficiency.

All the interpreter in OpenJdk12 source code, the JVM in SRC/hotspot/share/interpreter directory, templateInterpreter. CPP is template code location of the interpreter. In templatetable. CPP, we can find implementations of monitorenter and monitorexit, which are related to synchronized, as well as other familiar monitorexit directives. Such as invokedynamic, Newarray, etc

def(Bytecodes::_monitorenter, ____|disp|clvm|____, atos, vtos, monitorenter, _);
def(Bytecodes::_monitorexit, ____|____|clvm|____, atos, vtos, monitorexit, _ );

Copy the code

Monitorenter performs logic

The penultimate parameter of Monitorenter and Monitorexit is the location of the machine code template corresponding to the bytecode. Here we look at the monitorenter implementation. Since the machine code implementation is cpu-specific, here we have a look at the x86 implementation (templatetable_x86. CPP), as well as other implementations under SRC /hotspot/ CPU such as PPC, ARM, S390, etc

void TemplateTable::monitorenter(a) {...// Place the pointer to the object to be locked in the obj variable of BasicObjectLock
    __ movptr(Address(rmon, BasicObjectLock::obj_offset_in_bytes()), rax);
    // Jump to execute lock_object
    __ lock_object(rmon); . }void InterpreterMacroAssembler::lock_object(Register lock_reg) {

  // If heavyweight locks are used, go directly to Monitorenter () for execution
  if (UseHeavyMonitors) {
    call_VM(noreg,
            CAST_FROM_FN_PTR(address, InterpreterRuntime::monitorenter),
            lock_reg);
  } else{...// Load object pointer into obj_reg
    movptr(obj_reg, Address(lock_reg, obj_offset));

    // Handle bias lock
    if (UseBiasedLocking) {
      // lock_reg: stores a pointer to BasicObjectLock
      // obj_reg: store the pointer to the lock object
      / / slow_case: tag, similar to goto, here refers to InterpreterRuntime: : monitorenter ()
      // done: indicates that the lock was successfully obtained.
      // slow_case and done are also passed in, so that in biased_locking_enter() you can jump to them as appropriate.
      biased_locking_enter(lock_reg, obj_reg, swap_reg, tmp_reg, false, done, &slow_case); }.../ / slow_case logic, it is necessary to enter InterpreterRuntime: : monitorenter (in) acquiring a lock.
    bind(slow_case);
    // Call the runtime routine for slow case
    call_VM(noreg,
            CAST_FROM_FN_PTR(address, InterpreterRuntime::monitorenter),
            lock_reg);


    // Done here is the same as done passed to biased lock above. Jumping to this indicates that the lock was acquired successfully, and the bytecode execution is returned.
    bind(done); }}Copy the code

As you can see from the code, if heavyweight locks are enabled, monitorenter goes directly to the logic of heavyweight locks. Otherwise, monitorenter processes the logic of biased locks and returns to monitorenter if not satisfied

Biased locking: -xx :+UseBiasedLocking. After JDK1.6, heavyweight locking is enabled by default: -xx :+UseHeavyMonitors

Biased locks, lightweight locks and heavyweight locks

We mentioned heavyweight locks and biased locks. What do they mean?

We all know that Java threads are mapped to the native threads of the operating system, and that blocking or waking up a thread requires the operating system’s help, which requires a transition from user mode to core mode. This is why many people say synchronized is slow. Previous articles have shown that synchronized is actually implemented through the operating system’s mutex, which is also known as heavyweight locking.

As opposed to heavyweight locks, there is also something called lightweight locks. Its locking is not achieved through the operating system, but through CAS with Mark Word, I will show its implementation through the source code later.

Biased locks are lighter than lightweight locks, and biased locks refer to favoring a thread. If only one thread obtains the lock, the lock object is biased toward that thread, and if the lock is not acquired by another thread during subsequent execution, the thread holding the biased lock will never need to synchronize again.

Let’s take a look at how the JVM is optimized by going from biased locking -> lightweight locking -> heavyweight locking along the source code.

Memory layout

Before analyzing the lock implementation, you might want to check out the previous article to see how objects are laid out in memory, but I’ve posted a picture here to refresh you

Lock state transformation and object Mark Word relationship

The actual locking optimization logic is already outlined in the JDK wiki, which I will post here. Later code analysis will follow this diagram.

Biased locking

Biased lock start

Biased locking in the virtual machine will only take effect after 4 seconds after the start, we can from the hotspot/share/runtime/biasedLocking CPP to see such a set


void BiasedLocking::init(a) {
  
  if (UseBiasedLocking) {
    if (BiasedLockingStartupDelay > 0) {
      EnableBiasedLockingTask* task = new EnableBiasedLockingTask(BiasedLockingStartupDelay);
      task->enroll();
    } else {
      VM_EnableBiasedLocking op(false); VMThread::execute(&op); }}}// The task above will eventually call this method, setting the last three digits of the mark word of the lock object's class to 101
static void enable_biased_locking(InstanceKlass* k) {
  k->set_prototype_header(markOopDesc::biased_locking_prototype());
}

Copy the code

BiasedLockingStartupDelay default time is 4000 milliseconds, so will be start after 4 s start a timer task to set up open biased locking set.

We can use – XX: BiasedLockingStartupDelay = 0 to set immediately start biased locking. This is also filled in a hole from the previous one.

java -XX:+PrintFlagsFinal | grep BiasedLockingStartupDelay

The scheduled task calls the enable_biased_locking method and sets the last three bytes of the Mark Word of the lock object class to 101. The Mark Word of the lock object class is called prototype_header. Remember this will be used in the analysis of biased locking.


MyObject obj = new MyObject();

synchronized(obj) {
  
  doSomething();
}

Copy the code

The lock object in the Java code above is obj of type MyObject(obj is an instance of MyObject). The Prototype header is actually the Mark Word for MyObject.

Biased lock application

The biased_locking_enter() method is long, so let’s analyze it section by section. The following code fragment are from the hotspot / / x86 CPU/macroAssembler_x86 CPP: : biased_locking_enter.

  1. First, check whether the value of the last three digits (biased lock + lock flag bit) in the Mark Word is 5, that is, whether it is in the biased lock state. If yes, perform the following fast_enter logic; if not, perform step 2

Address mark_addr (obj_reg, oopDesc::mark_offset_in_bytes()); . Omit some code... Label cas_label;int null_check_offset = - 1;

// If mark_addr is not stored in swap_reg, then mark_addr is stored in swap_reg first.
if(! swap_reg_contains_mark) { null_check_offset =offset(a);movptr(swap_reg, mark_addr);
}
// Move the object's mark_addr, or markOop pointer, into tmp_reg
movptr(tmp_reg, swap_reg);

// add tmp_reg and biased_lock_mask_in_place(111) to markOop
andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place);

// Check whether the last three digits of the Mark Word are 5(biASED_LOCK_pattern is 5, i.e. 101). If they are not equal, it indicates that cas_label is not in biased lock state. Otherwise, it is biased lock state.
cmpptr(tmp_reg, markOopDesc::biased_lock_pattern);
jcc(Assembler::notEqual, cas_label);

Copy the code
  1. Check whether the lock object Mark Word contains the address of the current thread, whether the last three flag bits are the same, and whether the epoch value is equal to the epoch value of the class. If they are the same, then the current thread holds the bias lock and can return directly. Otherwise, go to Step 3
// Load the class's prototype_header(Mark Word) into tmp_reg
load_prototype_header(tmp_reg, obj_reg);

// Match the current thread address with the class's prototype_header, so that the result is (current thread ID + epoch + generational age + lock flag + lock flag bit).
orptr(tmp_reg, r15_thread);

// Diff = diff; // Diff = diff; // Diff = diff
xorptr(tmp_reg, swap_reg);
Register header_reg = tmp_reg;

// select * from header_reg where the age of generations is not included.
andptr(header_reg, ~((int) markOopDesc::age_mask_in_place));

// If the object's markOop and (current thread address + other bits) are equal except for generational age, then the result of the above and operation should be 0, indicating that the object has been biased to the current thread
// Otherwise, it indicates that the current thread is not yet the holder of the bias lock and will continue down.
jcc(Assembler::equal, done);

Copy the code

Note that this yields an xor result, header_reg, which will be used in later steps.

  1. Determine whether the class object supports biased locking. If not, go to step 6 to remove the logic of biased locking. If yes, go to Step 4
testptr(header_reg, markOopDesc::biased_lock_mask_in_place);
jccb(Assembler::notZero, try_revoke_bias);
Copy the code

Header_reg stores xOR results of (current thread ID + epoch + generational age + lock flag + lock flag bit in Prototype_header) and the lock object Mark Word, We need to check whether the result of the last three digits (biASED_LOCK_MASk_in_place value is 111) is 0. If it is not 0, it means that the last three digits of the Mark Word of the previous xOR lock object are inconsistent with the last three digits of the class to which the object belongs, so the class to which the object belongs no longer supports biased locking. In this case, go to try_revoke_bias to remove bias locks.

The testptr implementation actually gets the number of bits of the first argument. The number of bits is determined by the binary length of the second argument.

  1. This indicates that both the lock object and the class object support biased locks, but not the current thread of the biased lock. Therefore, it will determine whether the epoch in the xor result is 0. If it is 0, it will skip to step 5. If the value is not 0, it proves that the lock has expired and goes to step 7 to perform the re-bias logic
// Test whether the lock object's epoch is the same as the lock object's epoch
testptr(header_reg, markOopDesc::epoch_mask_in_place);
jccb(Assembler::notZero, try_rebias);

Copy the code
  1. Indicates that the lock object has not been biased to any thread, then you can try to acquire the lock to make the object biased to the current thread
// Fetch the bits in the Mark Word object other than the thread address
andptr(swap_reg,
       markOopDesc::biased_lock_mask_in_place | markOopDesc::age_mask_in_place | markOopDesc::epoch_mask_in_place);

// Move the other bits to tmp_reg
movptr(tmp_reg, swap_reg);

// Construct a new complete Mark Word with other bits and the current thread and store it in tmp_reg. The new Mark Word will favor the current thread because it stores the current thread address.
orptr(tmp_reg, r15_thread);

// Try to store the newly formed Mark Word into the mark_ADDR (Mark Word) of the lock object using the CAS operation. If the setting is successful, the biased lock is obtained.
// The CMPXCHGPTR operation forces the contents of the RAX register (SWAP_reg) as the old data to be compared with the contents of the second argument, in this case mark_addr, or, if equal, the contents of the first argument, the new data in tmp_reg, to be stored in mark_addr.
cmpxchgptr(tmp_reg, mark_addr); // compare tmp_reg and swap_reg

// If the CAS operation fails, it indicates that the markOop data in the object header has been tampered with. That is, another thread has acquired a biased lock. Biased locks do not allow many threads to access the same lock object. So I need to jump slow_case (InterpreterRuntime: : monitorenter), to undo the object of biased locking, and upgrade the lock.
if(slow_case ! =NULL) {
  jcc(Assembler::notZero, *slow_case);
}
// If the CAS is successful, skip to done and return to the bytecode.
jmp(done);
Copy the code
  1. Try_revoke_bias Resets the mark word using CAS. After the partial lock is revoked, all subsequent operations follow the locking process of lightweight lock

The code definitions for try_revoke_bias and try_rebias are also in biASed_locking_enter

bind(try_revoke_bias);

// Class's prototype_header already supports biased locking bits, that is, all objects of this class no longer support biased locking, but the current object is still in biased locking state, so we need to reset the current object's markOop to be unlocked.

// Send the prototype_header of the lock object's class to tmp_reg.
load_prototype_header(tmp_reg, obj_reg);

// Try CAS to reset the markOop of the object to lock-free. It doesn't matter if it fails, because even if it does, another thread has removed the biased lock flag from the object.
cmpxchgptr(tmp_reg, mark_addr); 

Copy the code
  1. Try_rebias is will make the lock object to the current thread again, if failure is slow_case (InterpreterRuntime: : monitorenter) biased locking logic

bind(try_rebias);

// Send the prototype_header of the lock object's class to tmp_reg.
load_prototype_header(tmp_reg, obj_reg);

// Match the current thread address with the class's prototype header(Mark Word). The result is (current thread ID + epoch + generational age + lock flag + lock flag bit).
orptr(tmp_reg, r15_thread);

// The CAS operation succeeds
cmpxchgptr(tmp_reg, mark_addr); 

// If CAS fails, the Mark Word has been changed by another thread and needs to be skipped to slow_case to undo the bias lock, or skipped to done to execute the bytecode.
if(slow_case ! =NULL) {
  jcc(Assembler::notZero, *slow_case);
}
jmp(done);

Copy the code

Bias lock revocation

Slow_case (revocation of biased locking) logic is in InterpreterRuntime: : monitorenter

IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
  Handle h_obj(thread, elem->obj());
  if (UseBiasedLocking) {
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
    ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
  } else {
    ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
  }
IRT_END


void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock,
                                    bool attempt_rebias, TRAPS) {
  if (UseBiasedLocking) {
    if(! SafepointSynchronize::is_at_safepoint()) {
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return; }}else {
      assert(! attempt_rebias,"can not rebias toward VM thread");
      // Remove bias locks
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(! obj->mark() - >has_bias_pattern(), "biases should be revoked by now");
  }

  slow_enter(obj, lock, THREAD);
}
Copy the code

BiasedLocking: : try again revoke_and_rebias will see if I can use to lock, basic logic is the same as the above analysis, if you look at the following code here you will also find that if you call the System. IdentityHashCode () will be removed biased locking.

Since biased lock removal needs to be performed at the global safe point, we can tune system performance by turning off biased lock when there are a large number of threads competing for the same lock resource.

Now let’s see what revoke_at_Safepoint does

  1. update_heuristics()The REVOKE method increases the number of revoke times on a class object by 1

// Increase the number of revoke
if (revocation_count <= BiasedLockingBulkRevokeThreshold) {
  revocation_count = k->atomic_incr_biased_lock_revocation_count(a); }/ / if revoke number equals BiasedLockingBulkRevokeThreshold 40 (the default)
if (revocation_count == BiasedLockingBulkRevokeThreshold) {
  return HR_BULK_REVOKE;
}

/ / if revoke number equals BiasedLockingBulkRebiasThreshold (20) the default
if (revocation_count == BiasedLockingBulkRebiasThreshold) {
  return HR_BULK_REBIAS;
}

return HR_SINGLE_REVOKE;
Copy the code
  1. If the number of revocations is equal toBiasedLockingBulkRebiasThreshold(default 20), the class object is considered to be able to rebias, so do the following (bulk rebias)
if (klass->prototype_header() - >has_bias_pattern()) {
  int prev_epoch = klass->prototype_header() - >bias_epoch(a);// Increment the epoch value in the prototype_header of the class object by 1
  klass->set_prototype_header(klass->prototype_header() - >incr_bias_epoch());
  int cur_epoch = klass->prototype_header() - >bias_epoch(a);// Iterate through the stack of all threads, find all instances of class objects, and increment their epoch value in Mark Word by 1
  for (; JavaThread *thr = jtiwh.next();) { GrowableArray<MonitorInfo*>* cached_monitor_info =get_or_compute_monitor_info(thr);
    for (int i = 0; i < cached_monitor_info->length(a); i++) { MonitorInfo* mon_info = cached_monitor_info->at(i);
      oop owner = mon_info->owner(a); markOop mark = owner->mark(a);if ((owner->klass() == k_o) && mark->has_bias_pattern()) {
        assert(mark->bias_epoch() == prev_epoch || mark->bias_epoch() == cur_epoch, "error in bias epoch adjustment");
        owner->set_mark(mark->set_bias_epoch(cur_epoch)); }}}}// At this point we're done. All we have to do is potentially
// adjust the header of the given object to revoke its bias
revoke_bias(o, attempt_rebias_of_object && klass->prototype_header() - >has_bias_pattern(), true, requesting_thread, NULL);
Copy the code

In bulk rebias, the epoch value of the class object is incrementing by 1, the stack of all threads is traversed, all instances of the class object are found, their EPOCH value is incrementing by 1, and the bias information of the lock object is removed.

If you want to check the bulk revoke bias, process and result, you can use this to answer (stackoverflow.com/questions/4…

  1. If the number of revocations of the class object is equal toBiasedLockingBulkRevokeThreshold, it is considered that partial lock is not appropriate for class objects, so do BULK REVOKE

The code is similar to the one above, so I won’t post it, but mainly do the following two things

  • Set the prototype header of the class object to an unbiased state
  • Iterate through the stack of all threads, find all instances of classes, change the status bit of Mark Word to 001 and the corresponding Lock record, and change the bias lock to lightweight lock

Lightweight lock

The code implementation of the lightweight lock is in the slow_enter method


void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {

  // omit other code...

  // Check whether there is no lock
  if (mark->is_neutral()) {

    // Save mark directly to the _displaced_header field of the BasicLock object
    lock->set_displaced_header(mark);
    
    // Use CAS to update the mark word to a pointer to the BasicLock object
    if (mark == obj() - >cas_set_mark((markOop) lock, mark)) {
      return; }}... }class BasicLock {
  private volatile markOop _displaced_header;
}
Copy the code

Firstly, it is determined whether the Mark Word is neutral, that is, whether the value of the last three bytes of the Mark Word is 1(001). If it is neutral, it means that the Mark Word is not locked at this time and cannot be biased.

The lock object’s Mark word isplaced in the displaced_header property of the lock object. Then CAS is used to update the object’s Mark Word to a pointer to the Lock Record. If the update succeeds, Indicates that the thread owns the lock on the object. And the Mark Word lock flag bit (the last 2 bits of the Mark Word) changes to 00, indicating that the object is in a lightweight locked state.

Heavyweight lock

If the CAS update fails, it will swell to become a heavyweight lock, and the status of the lock flag will change to 10. The pointer to the heavyweight lock is stored in The Mark Word, and the thread waiting for the lock will also enter the blocking state.

ObjectSynchronizer::inflate(THREAD,
                          obj(),
                          inflate_cause_monitor_enter)->enter(THREAD);

Copy the code

Inflate is basically a judgment of state, which is easier to understand by looking at the comments, but let’s focus on the execution logic in the Enter function


void ObjectMonitor::EnterI(TRAPS) {...// The holder of the lock is returned directly
  if (Self->is_lock_owned ((address)cur)) {
    assert(_recursions == 0."internal state error");
    _recursions = 1;
    _owner = Self;
    return;
  }


  // To avoid costly thread blocking, wake up, etc., the spin ADAPTS before entering the blocking state
  if (TrySpin(Self) > 0) {
    assert(_owner == Self, "invariant");
    assert(_recursions == 0."invariant");
    assert(((oop)(object())) - >mark() == markOopDesc::encode(this), "invariant");
    Self->_Stalled = 0;
    return; }... }Copy the code

In the determination of heavyweight lock, it will not apply for the lock immediately, but will first adapt to spin several times to see whether the lock can be obtained, if not to apply for the lock again.

Adaptive spin lock, it will be on the same lock by previous spin time, and lock the owner of the state to decide, if the spin just to get the same lock, then think this also has a great chance to get to the, spin a few times more, if for a lock said, rarely get to spin, as no, don’t spin, directly to put up.

for (;;) {
    if (TryLock(Self) > 0) break; .if ((SyncFlags & 2) && _Responsible == NULL) {
      Atomic::replace_if_null(Self, &_Responsible);
    }
    // park self
    if (_Responsible == Self || (SyncFlags & 1)) {
      TEVENT(Inflated enter - park TIMED);
      Self->_ParkEvent->park((jlong) recheckInterval);
     
      recheckInterval *= 8;
      if(recheckInterval > MAX_RECHECK_INTERVAL) { recheckInterval = MAX_RECHECK_INTERVAL; }}else {
      TEVENT(Inflated enter - park UNTIMED);
      Self->_ParkEvent->park(a); }if (TryLock(Self) > 0) break; . }Copy the code

If the thread fails, it will park the current thread. The implementation of Park is the implementation of pThread mentioned in the previous article.

void os::PlatformEvent::park() {...int status = pthread_mutex_lock(_mutex); . status =pthread_cond_wait(_cond, _mutex); . status =pthread_mutex_unlock(_mutex);
}

Copy the code

Another side effect of the spin state is an unfair locking mechanism. A blocked thread has no way to immediately compete for the released lock. A thread in a spin state, however, is likely to get the lock first.

A thread can execute a method when it acquires a heavyweight lock, but the lock is not returned to its original unlocked state even after it is released

Good news and bad news

We can see that biased locking is very responsible, the entire code complexity has increased significantly to support biased locking, and many of the applications that benefit from biased locking are early Java collection apis, such as HashTable,Vector, etc.

So the good news is that bias locking was disabled in JDK15 and will be removed later.

The bad news is that most apps now run JDK8 and will run for many years to come.

Shoulders of giants

  1. Deep Understanding of the JVM VIRTUAL Machine

  2. In Depth Disassembling the Java VIRTUAL Machine

  3. Stackoverflow.com/questions/4…

  4. Citeseerx.ist.psu.edu/viewdoc/dow…

  5. Createchance. Making. IO/post/Java – and…

  6. zhuanlan.zhihu.com/p/34662715

  7. www.zhihu.com/question/55…