What 99.9% of Java programmers can't say: Object memory layout in the JVM?

Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

Author: Li Ruijie

Currently, I am working for Alibaba as a senior JVM researcher

In Java programs, we have a variety of ways to create objects. In addition to the most common new statements, there are reflection mechanisms, the object.clone method, deserialization, and the unsafe.allocateInstance method to create objects.

The object.clone method and deserialization initialize the instance fields of the newly created Object by copying the existing data directly.

The broadening. AllocateInstance method does not initialize the instance field, whereas the new statement and reflection mechanism initialize the instance field by calling the constructor.

Let’s start with the new statement and prepare a class, as shown in the figure below

Let’s compile his bytecode:

As you can see, the bytecode compiled by the new statement will contain the new instruction to request memory and the Invokespecial instruction to invoke the constructor.

This article is not dedicated to the Invoke series of directives; I will cover them in a later article.

The invokespecial directive in bytecode is usually used to call private instance methods, constructors, and to invoke the instance method or constructor of the parent class using the super keyword, as well as the default method of the interface implemented.

Without mentioning constructors, we can’t help but mention Java’s many constraints on constructors. First, if a class does not define any constructors, the Java compiler automatically adds one without arguments.

The TestNew class we just saw, when its bytecode is compiled, has the following fragment.

In JAVA source, we do not define a constructor, but the generated bytecode automatically adds a constructor without arguments. The invokespecial method he uses ends up calling the constructor method of its Object parent.

I’ll cover the JVM’s constructor invocation principle, which is that if a subclass’s constructor needs to call the parent class’s constructor. The call can be implicit if the parent class has a no-argument constructor. That is, the Java compiler automatically adds calls to the superclass constructor.

However, if the parent class does not have a no-argument constructor, then the constructor of the subclass needs to explicitly call the constructor of the parent class with arguments.

There are two types of explicit calls, either directly using the “super” keyword to call the parent constructor, or using the “this” keyword to call another constructor in the same class.

Both direct and indirect explicit calls are required as the first statement of the constructor to initialize the inherited superclass field first.

Can an inherited superclass field be initialized without precedence? Yes, if you can use a bytecode injection tool.

When we call a constructor, it calls the parent constructor first, up to the Object class. The callers of these constructors are the same object, that is, the object created by the new directive.

In fact, my statement above means that the memory of an object created by the new directive actually covers all of the instance fields in its parent class.

That is, although a subclass does not have access to its parent class’s private instance fields, or its instance fields hide its parent class’s instance fields of the same name, the instance of the subclass still allocates memory for those instance fields.

Next I’ll introduce the technique of compressing Pointers. In the Java virtual machine, each Java object has an object header consisting of a marker field and a type pointer.

The tag field is used to store Java virtual machine running data about the object, such as hash code, GC information, and lock information, while the type pointer points to the object’s class.

In 64-bit JVMS, the tag field of the object header takes up 64 bits, and the type pointer takes up 64 bits. In other words, each Java object has an additional memory overhead of 16 bytes.

To minimize the memory usage of objects, 64-bit JVMS introduce the concept of compacting Pointers to Java objects that were 64-bit in the heap into 32-bit Pointers.

In this way, the type pointer in the object header is also compressed to 32 bits, reducing the size of the object header from 16 bytes to 12 bytes.

Of course, compressed Pointers can operate not only on the type pointer of an object header, but also on fields of reference types, as well as on arrays of reference types.

How does it work? The answer is memory alignment.

We specify that by default, the starting addresses of objects in the JVM heap need to be aligned to multiples of 8. If an object uses less than 8N bytes, then the empty space is wasted. This wasted space is called padding between objects.

As you know, Pointers hold addresses, and since the starting address of objects in the heap is aligned to multiples of 8, Pointers hold the memory address of a reference (or object’s class) without having to hold the last three bits.

Since all objects or classes have memory addresses aligned with 8, the lowest three bits of their memory address are always 0, and a 32-bit pointer can be addressed to 2 to the 35th power of bytes, or 32GB of address space (more than 32GB turns off the compression pointer).

We can further improve the addressing range by configuring memory alignment options for the virtual machine. However, this can also increase padding between objects, causing the compression pointer to be less space-saving than it should be.

Even if the compression pointer is turned off, the Java virtual machine still does memory alignment. In addition, memory alignment exists not only between objects, but also between fields within objects.

For example, the Java virtual machine requires long fields, double fields, and reference fields in the uncompressed pointer state to be multiples of 8.

Why is that?

You’ve probably heard of the CPU’s row caching mechanism. If the fields are not aligned, then it’s possible to have fields that span the cached rows.

Reading of this field may require replacing two cache rows, and the storage of this field will pollute both cache rows at the same time.

We will look again at the mechanisms associated with CPU caching rows in a later article on the nature of volatile keywords.

The last thing I want to mention is the field rearrangement technique, which is the memory alignment that exists between the fields of an object that I mentioned earlier. This refers to reassigning the order of fields to achieve memory alignment

It has the following two rules:

First, if a field occupies C bytes, the offset of that field needs to be aligned to NC. The offset here refers to the difference between the field address and the starting address of the object.

The Long class, for example, has only one instance field of type Long. In a 64-bit virtual machine that uses a compressed pointer, even though the object header is 12 bytes, the offset of the long field can only be 16, and the four bytes left empty in the middle are wasted.

Second, the offset of the field inherited by a subclass must be the same as the offset of the field corresponding to the parent class.

To put it bluntly, for example, B inherits A, which is the parent class of B, and all the fields in A are present in B, and the fields in A are put first, and then the fields in B. In addition, when A class B object is placed in A class A field, the offset of the field must be the same as that of the parent class.

Next, LET me talk about an extension. ** What is virtual sharing? **

Suppose two threads separately access different volatile fields in the same object. Logically, they do not share content and therefore do not need to synchronize.

If these two fields happen to be in the same cache row, writes to these fields cause the cache row to be written back, creating a virtual sharing.

Java8 also introduced a new annotation @ContEnded to address virtual sharing between object fields.

The Java virtual machine keeps different @Contended fields in separate cache lines, so you can see a lot of wasted space to avoid unnecessary cache line synchronization.

The specific algorithm belongs to the implementation details, we are interested in can use:

-XX:-RestrictContended

This virtual machine option looks at the memory layout of Contended fields.

END

Personal public account: Architecture Notes of Huishania (ID: Shishan100)

Welcome to long press the picture below to pay attention to the public number: Huoia architecture notes!

The official number backstage replies the information, obtains the author exclusive secret system study material

Architecture notes, BAT architecture experience taught each other

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

What 99.9% of Java programmers can’t say: Object memory layout in the JVM?

What 99.9% of Java programmers can’t say: Object memory layout in the JVM?

Related Posts

Heard you don’t know CompletableFuture yet?

Java Concurrent Programming Day 1 – Java Memory Model and threads

Tune the database dimensions