The layout of Java objects in memory is not as mysterious as you might think

Writing in the front

Java is written in C++, so Java objects eventually map to an object in C++ that can describe all Java objects. Optimization of what is known as synchronized locks is based on this object.

Layout of objects in memory

When a Java object is created, after the memory allocation is complete, the virtual machine needs to set the necessary information on the object, such as which class the object is an instance of, how to find the metadata information of the class, the hash code of the object, and the GC generation age of the object.

This information is stored in the Object Header of the Object. The object headers can be set differently depending on the VM running status, for example, whether biased locking is enabled.

In virtual machines, the layout of objects stored in memory can be divided into three areas: object Header, Instance Data, and alignment Padding.

Object head

The HotSpot VIRTUAL machine object header contains two parts of information. The first part stores the object’s own runtime data, such as hash code, GC generation year, lock status flag, thread-held lock, bias thread ID, and bias timestamp.

The length of this data is 32 – and 64-bit, respectively, on a 32-bit or 64-bit VM, and is officially called a Mark Word. Considering the space efficiency of virtual machines, Mark Word is designed as a flexible data structure to store as much information as possible in a very small space.

The other part of the object header is the type pointer, the class metadata pointing to the object, which the VIRTUAL machine uses to determine which class instance the object is.

We can source in the JVM (hotspot/share/oops/markOop HPP) seen in the definition of object in the head store content

class markOopDesc: public oopDesc {
 public:
  enum { age_bits             = 4.         lock_bits            = 2.         biased_lock_bits     = 1. max_hash_bits = BitsPerWord - age_bits - lock_bits - biased_lock_bits,  hash_bits = max_hash_bits > 31 ? 31 : max_hash_bits,  cms_bits = LP64_ONLY(1) NOT_LP64(0),  epoch_bits = 2  }; } Copy the code

Hash: Hash code of an object
Age: indicates the generation age of the object
Biased_lock: bias lock identifier bit
Lock: identifier of the lock status
JavaThread* : ID of the thread holding the biased lock
Epoch: Biased timestamp

For example, in the 32-bit HotSpot VIRTUAL machine, if the object is not locked, then the Mark Word 32bit space is 25bits for object hash, 4bits for object generation age, 2bits for lock flag, 1bit fixed to zero and in other states (lightweight lock, The contents of the objects under heavyweight locks, GC flags, and bias are shown in the following table

The instance data

The instance data portion is the valid information that the object actually stores and the content of the various types of fields defined in the program code.

The filling

The third part is not necessarily populated and has no special meaning, just a placeholder, because HotSpot VM’s memory management system requires that the object’s starting address be an integer multiple of 8 bytes, in other words, the object’s size must be an integer multiple of 8 bytes. The object header is a multiple of 8 bytes, so when the object instance data part is not aligned, it needs to be filled by alignment.

View object header

We can use the OpenJDK jol tool to view the contents of the object header, the first code is as follows

<dependency>
  <groupId>org.openjdk.jol</groupId>
  <artifactId>jol-core</artifactId>
  <version>0.9</version>
</dependency>
 Copy the code

public class GetRange {
  // -XX:-UseCompressedOops
  public static void main(String[] args) {

    // 1byte =8 bits (1byte =8 bits)
 System.out.println(VM.current().details());   MyClass myClass = new MyClass();   ClassLayout classLayout = ClassLayout.parseInstance(myClass);   System.out.println("****New Object****");   System.out.println(classLayout.toPrintable());   int hashCode = myClass.hashCode();   System.out.println("MyClass hashCode : " + hashCode) ;  System.out.println("MyClass hashCode binary" + Integer.toBinaryString(hashCode));  System.out.println("MyClass hashCode binary length" + Integer.toBinaryString(hashCode).length());   System.out.println();   System.out.println("****After invoke hashCode()****");   System.out.println(classLayout.toPrintable(myClass));   // Get the system byte order  System.out.println(The current byte order of the system is: + ByteOrder.nativeOrder());   } }  class MyClass {   String name = "think123";   int[] other;   boolean status; }  Copy the code

The output is as follows

You can see that the object header takes up 12 bytes and there is a 3-byte padding for it.

The JVM uses oopDesc to describe an object

class oopDesc {
 private:
  volatile markOop _mark;
  union _metadata {
    Klass*      _klass;
 narrowKlass _compressed_klass;  } _metadata;  } Copy the code

We can see that the object header has two parts, the Mark part, officially called the Mark word, which stores hash code, the age of the object generation, the bias lock mark and other information. The Mark Word length is a system subwidth, 8 bytes on 64-bit systems.

The second part is a type pointer to klass, indicating which class the object is an instance of. In this case, the union is used to indicate that the variables _klass and _compressed_klass share the same memory segment. _klass is used when pointer compression is disabled, and _compressed_klass is used when pointer compression is enabled. NarrowKlass is actually a 32bit unsigned int and therefore takes up four bytes, so the header size of the object after pointer compression is enabled is 12 bytes.

NarrowKlass definition in the hotspot/SRC/share/vm/oops/oopsHierarchy HPP, it is using junit type, is actually a 32 bit unsigned int

MyClass defines three fields,int[],String (4 bytes), and Boolean (1 byte)

Fill part: The sum of the header and body of the object above is 21 bytes, because 8-byte alignment is required, 3 bytes need to be filled, which is exactly 24 bytes.

When we turn off pointer compression (-xx: -usecompressedoops),mark Word takes 16 bytes and also uses 8 bytes for it. The name and other data fields occupy 8 bytes each. With pointer compression enabled, these two bytes take up 4 bytes each. So enabling this option saves memory and has been enabled by default since JDK8.

Above I printed out the current system byte order, and you can see that the current byte order is small.

For example, the value 0x2211 is stored in two bytes: the highest byte is 0x22 and the lowest byte is 0x11. Big-endian: The way humans read and write numbers is that the most important byte is first and the least important byte is second. Endian: The lowest byte is first and the highest byte is last, which is stored as 0x1122.

So when we look at the hashCode of myClass, we have to look backwards. When we look at the Hashcode in the Object header, we start at bits 9 through 39 (in 64-bit systems, we save the hash with 31 bytes).


MyClass hashcode(length 30, complemented with 2 zeros at most) :
myClass HashCode :    00100001 01011000 10001000 00001001

HashCode in Mark Word: 00001001 10001000 01011000 0 0100001 Copy the code

You can see that the stores in myClass’s hashCode and Mark Word are reversed.

There are only 31 bits to store hash. Why do I use 32 bits here? I use the 32-bit comparison just to make it easier to see. In fact, the first 0 in MarkWord, which holds the last 8 bits of the hash code, is borrowed from the unused 25 bits (combined with small byte order).

Next, let’s use the JOL tool to see how the flag bits in Mark Word look under different lock states.

Biased locking

First we see biased locking, due to the JVM default would not start in 4 seconds after launch to lock, so the test needs to be set immediately start biased locking (- XX: BiasedLockingStartupDelay = 0)

// -XX:BiasedLockingStartupDelay=0
public static void main(String[] args) throws InterruptedException {

    Layouter layouter = new HotSpotLayouter(new X86_32_DataModel());

 MyClass myClass = new MyClass();   ClassLayout layout = ClassLayout.parseInstance(myClass);   System.out.println("Before entering the synchronized code block :");  System.out.println(layout.toPrintable());    synchronized (myClass) {  System.out.println("Synchronize code block :");  System.out.println(layout.toPrintable());  }   System.out.println("After exiting sync code block :");  System.out.println(layout.toPrintable());  }  Copy the code

You can see that before entering the synchronized code block the lower 8 bits are 00000101, indicating that you are in a biased lock (the last three bits are 101), but there is no bias to any thread yet, so there is no data
The mark word in the synchronized code block and out of the synchronized code block is the same, the lower 8 bits are 00000101, in the biased lock, and stored biased thread ID and timestamp information. Note After exiting the synchronization block, the lock bias information is still retained

Lightweight locks and heavyweight locks

We use the following code to demonstrate the lightweight lock and the state bit change at heavyweight

 // Do not set immediate start bias lock
public static void main(String[] args) throws InterruptedException {

    Layouter layouter = new HotSpotLayouter(new X86_32_DataModel());

 MyClass myClass = new MyClass();   ClassLayout layout = ClassLayout.parseInstance(myClass);   System.out.println("Before creating t1 thread :");  System.out.println(layout.toPrintable());   Thread t1 = new Thread(() -> {  synchronized ((myClass)) {  try {  TimeUnit.SECONDS.sleep(5);  } catch (InterruptedException e) {  e.printStackTrace();  }  }  });   t1.start();   System.out.println("Before holding lock :");  System.out.println(layout.toPrintable());    synchronized (myClass) {  System.out.println("Holding lock :");  System.out.println(layout.toPrintable());  }   System.out.println("Release lock :");  System.out.println(layout.toPrintable());   System.out.println(After "System. The gc ()");  System.gc();  System.out.println(layout.toPrintable());  }  Copy the code

The running results are as follows:

The status change process of Mark Word is as follows:

In the stage before creating t1 thread, the lock flag bit is 01, the bias lock flag bit is 0, and the thread is in no lock state
At the stage before the main thread holds the lock, the flag bit is 00, which is a lightweight lock. At this time, the T1 thread has held the lock and the main thread has not requested the lock, so there is no competition at this time
The main thread holds the lock stage with the flag bit of 10 and expands to the heavyweight lock. At this point, the T thread has released the lock and the main thread has successfully acquired the lock, that is, there is a competition
When the main thread releases the lock, the flag bit is 10. It is still a heavyweight lock and does not degrade automatically
System. In the stage after GC, the state is returned to lock free and the GC age changes to 1

At this point, we know how the state bits change under different locks.

Wrote last

Creation is not easy, please give a thumbs-up.