1. Object creation process

On a purely linguistic level, new objects can be created, reflected, copied, deserialized, and so on. Next, let’s explore how object creation works in a virtual machine.

We start with a virtual opportunity to a new command:

  • First check if the argument to this directive can locate a symbolic reference to a class in the constant pool

  • Check whether the class represented by this symbolic reference has been loaded, parsed, and initialized. If not, the corresponding class loading process is performed first

  • After the class load check passes, the virtual machine next allocates memory for the new objects.

    There are two ways to allocate memory: Bump The Pointer, Free List

    • Pointer collision: suppose the Java heap memory is absolutely neat, all of the memory used by aside and free memory is placed on the other side of the yard, there is a middle pointer as a cut-off point indicator, that allocates memory is just put the pointer in the direction of free space move a and the object is equal to the size of the distance, this distribution is called a pointer “bumped”
    • If the memory in the Java heap is not neat, has been used memory and free memory staggered together, that is simply no way pointer collision, the virtual machine, you must maintain a list of records on which memory blocks are available, at the time of distribution from the list to find a large enough space division to the object instance, and update the list of records, This allocation is called a “free list”
    • The choice between the two approaches depends on whether the Java heap is tidy
    • Java heap consolidation is determined by whether the adopted garbage collector has the ability to Compact
  • After the memory allocation is complete, the virtual machine initializes the allocated memory space (but not the object header) to zero.

  • The object header is then set. The request header contains information about which class the object is an instance of, how to find the metadata information about the class, the object’s hash code, and the object’s GC generation age.

The process is roughly illustrated as follows:

Allocate memory thread safety problems: object creation in A virtual machine is very frequent behavior, even if just modify the position of A pointer is pointing to, in the case of concurrent is not thread-safe, is possible to allocate memory object A, pointer could modify, object B and use both the original pointer to allocate memory.

There are two possible solutions to the thread safety problem:

  • One is to synchronize memory allocations — virtual machines actually use CAS and retry failures to ensure atomicity of the update operation
  • In other words, each Thread allocates a small block of memory in the Java heap in advance, which is called Thread Local Allocation Buffer (TLAB). The lock is allocated in the thread’s local buffer, and only needs to be synchronized when the local buffer is used up and a new cache is allocated.

From the virtual machine’s point of view, initialization is done once the object header is set, but for Java programs, the new instruction is followed by the <init> () method to initialize the object so that a usable object is fully constructed.

2. Memory layout of objects

In the HotSpot virtual machine, the storage layout of objects in the heap memory can be divided into three parts: object headers, Instance Data, and Padding.

The object header of the HotSpot VIRTUAL machine object contains two types of information. The first type is the runtime data used to store the object itself, such as HashCode, GC generation age, lock status flag, thread held lock, bias thread ID, bias timestamp, etc. The length of this part of the data is 32 bits and 64 bits in 32-bit and 64-bit virtual machines (without compression pointer enabled). It’s officially called the “Mark Word.”

Considering the space efficiency of virtual machines, Mark Word is designed as a dynamically defined data structure to store as much data as possible in a very small space, reusing its own storage space according to the state of the object.

For example, in a 64-bit HotSpot VIRTUAL machine, 31 of the 64 bits of Mark Word storage space are used to store object hash codes, 4 bits are used to store object generation ages, and 2 bits are used to store lock flags if the object is not locked by a synchronization lock. The storage contents of objects in other states (lightweight, heavyweight, biased) change as shown.

The other part of the object header is the type pointer, the pointer to the object’s type metadata, which the Java virtual machine uses to determine which class the object is an instance of. Not all virtual machine implementations must keep type Pointers on object data, and finding metadata information about an object does not have to go through the object itself,

If the object is a Java array, there must also be a piece of data in the object header to record the length of the array, because the virtual machine can determine the size of the Java object from the metadata information of ordinary Java objects, but if the length of the array is uncertain, there is no way to infer the size of the array from the information in the metadata.

3. Object access positioning

Java programs manipulate specific objects on the heap using reference data on the stack. Due to reference types in the Java virtual machine specification only it is a pointer to the object of reference, there is no definition of the reference should through what way to localization, the location of the access to the heap object, so the object access method is implemented by the virtual machine, the mainstream way of access are mainly using two kinds of handle and direct Pointers:

  • If handle access is used, a block of memory may be allocated to the Java heap as the handle pool. Reference stores the handle address of the object, and the handle contains the specific address information of object instance data and type data, as shown in the figure below:

  • If direct pointer access is used, the memory layout of the object in the Java heap must consider how to place the information related to the access type data. Reference stores the address of the object directly. If only the object itself is accessed, there is no need for the overhead of another indirect access, as shown in the figure:

The two methods of object access have their own advantages. The biggest advantage of using handles for access is that reference stores a stable handle address and only changes the instance data pointer in the handle when the object is moved (which is a very common behavior in garbage collection). Reference itself does not need to be modified.

The biggest benefit of using direct Pointers for access is that it is much faster. It saves time for a pointer location, which can add up to a significant execution cost because object access is so frequent in Java.

The HotSpot VIRTUAL machine mainly uses direct Pointers for object access.





Reference:

[1] : Zhou Zhipeng, In Depth Understanding the Java Virtual Machine: Advanced JVM Features and Best Practices

[2] : Zhou Zhipeng et al translated Java Virtual Machine Specification

[3] : ☆ Gnawing concurrency (vii) : In-depth analysis of the principle of Synchronized