The Java Virtual Machine and Class files are the cornerstones of Java’s implementation of system independence.

The Class file is the cornerstone of the JVM’s implementation of language independence.

The Class file contains the Java virtual machine instruction set, symbol tables, and several other auxiliary information.

Each Class file corresponds to a ClassFile structure as follows:

ClassFile {
 u4 magic;
 u2 minor_version;
 u2 major_version;
 u2 constant_pool_count;
 cp_info constant_pool[constant_pool_count-1];
 u2 access_flags;
 u2 this_class;
 u2 super_class;
 u2 interfaces_count;
 u2 interfaces[interfaces_count];
 u2 fields_count;
 field_info fields[fields_count];
 u2 methods_count;
 method_info methods[methods_count];
 u2 attributes_count;
 attribute_info attributes[attributes_count];
}
Copy the code

A quick look at what each term means:

Since the Class file structure does not have any delimiters, the order and number of data items are strictly limited, and the meaning, length, and order of each byte are not allowed to change.

Let’s look at what each term means.

1, the magic number

This is basically every Java developer’s first Java program:

public class HelloWorld {
    public static void main(String[] args) {

        System.out.println("Hello World"); }}Copy the code

The Idea tool is used, and the target directory will generate the corresponding class file. To view the hexadecimal information of the file, we can install a plug-in HexView.

Once installed, select the class file, right-click HexView, and open it in hexadecimal as follows:

The first line contains a special string of characters, cafebabe, which is a magic number that the JVM identifies as a class file. The JVM checks if the class file starts with that magic number during validation and throws a ClassFormatError if it doesn’t.

This byte is very interesting — coffee baby, Java turns out to be more than coffee, but also baby 😂

2. Version number

The four bytes immediately following the magic number, 0000 0031, store the Version number of the class file: bytes 5 and 6 are Minor versions, and bytes 7 and 8 are Major versions.

The Java version number starts at 45, and the major version number increases by 1 for each major JDK release after JDK1.1 (JDK1.0 uses version 45.0 and version 45.3). You cannot run a later version of the Class file, even if the file format has not changed.

0000 0031 corresponds to 49 in decimal notation and is the internal version number of JDK8.

Constant pool

Immediately after the major and minor version numbers are the constant pool entries.

Since the number of constants in a constant pool is not fixed, you need to place an entry of type U2 in the constant pool, representing the constant pool capacity count (constant_pool_count). Contrary to language custom in Java, this capacity count starts at 1 instead of 0.

As shown in the figure, the constant pool capacity is the hexadecimal number 0x0022, or 34 in decimal, which means that there are 33 constants in the constant pool with indexes ranging from 1 to 33. In the Class file structure, only constant pool capacity counts start at 1. For other collection types, including interface index collections, field table collections, method table collections, and so on, capacity counts start at 0, as is customary.

There are two main types of constants in the constant pool: Literal and Symbolic References. Literals are close to the Java language level concepts of constants, such as text strings, constant values declared final, and so on. Symbolic references are the concept of compilation principle and mainly include the following types of constants:

  • Packages exported or exposed by modules

  • Fully Qualified Name of class and interface

  • Field name and Descriptor

  • The name and descriptor of the method

  • Method handles and Method types (Method Handle, Method Type, Invoke Dynamic)

  • Dynamic Call Points and Dynamic constants (Dynamically-Computed Call Site, Dynamically-Computed Constant)

All 17 types of constant structures have only one thing in common. The first bit at the beginning of the table structure is a tag of type U1, which indicates which constant type the current constant belongs to.

The specific meanings of the 17 constant types are shown in the table:

type mark describe
CONSTANT_Utf8_info 1 The character string is utF-8 encoded
CONSTANT_Integer_info 3 Integer literals
CONSTANT_Float_info 4 Floating point literals
CONSTANT_Long_info 5 Long type literals
CONSTANT_Double_info 6 A double – precision floating-point literal
CONSTANT_Class_info 7 Symbolic reference to a class or interface
CONSTANT_String_info 8 String type literals
CONSTANT_Fieldref_info 9 Symbolic reference to a field
CONSTANT_Methodref_info 10 Symbolic references to methods in a class
CONSTANT_InterfaceMethodref_info 11 Symbolic references to methods in the interface
CONSTANT_NameAndType_info 12 A partial symbolic reference to a field or method
CONSTANT_MethodHandle_info 15 Represents a method handle
CONSTANT_MethodType_info 16 Presentation method type
CONSTANT_Dynamic_info 17 Represents a dynamically computed constant
CONSTANT_InvokeDynamic_info 18 Represents a dynamic method call point
CONSTANT_Moudle_info 19 Represents a module
CONSTANT_Package_info 20 Represents open or exported packages in a module

The constant pool is very tedious, and the 17 constant types have completely independent data structures, with little commonality or connection between them.

Let’s look directly at the total structure table for the 17 data types in the constant pool:

4. Access flags

After the constant pool ends, the next two bytes represent access_flags, which are used to identify access information at the Class or interface level. Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final, etc.

The specific flag bits and meanings are shown in the table:

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the type is Public
ACC_FINAL 0x0010 Only the class can set whether or not to be declared final
ACC_SUPER 0x0020 Whether the new semantics of the Invokespecial bytecode instruction are allowed
ACC_INTERFACE 0x0200 Flag this is an interface
ACC_ABSTRACT 0x0400 Whether it is of the abstract type. For interfaces or abstract classes, the second flag value is true and the other types are false
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code
ACC_ANNOTATION 0x2000 This is a note
ACC_ENUM 0x4000 Flag This is an enumeration

There are 16 flag bits available for access_flags. Currently, only 9 of them are defined. The unused flag bits must always be zero.

Class index, superclass index, and interface index collection

Why do these three go together? Because these three are used to determine the inheritance of a class.

The class index is used to determine the fully qualified name of the class, and the superclass index is used to determine the fully qualified name of the class’s parent. Because the Java language does not allow multiple inheritance, there is only one parent class index. All Java classes except java.lang.Object have a parent class, so none of the Java classes except java.lang.Object have a parent class index of zero.

The interface index collection is used to describe which interfaces are implemented by the class. The implemented interfaces are listed in the index collection by the implements keyword (left to right).

6. Set of field tables

The interface index is followed by the Field table (field_info), which describes variables declared in an interface or class — fields here include only class-level and instance-level variables, not local variables declared inside a method.

The main information includes:

Public, protected, private

Class level variable or instance level variable

③ Whether it is final or not

(4) Concurrency visibility (volatile)

⑤, serializable (transient modification)

⑥, field data type (8 basic data types, object, array and other reference types)

⑦, field name

The structure of the field table is as follows:

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Access_flags is the access flag for this field, which is similar to the access flag in a class and describes the type of permission for this field: private, protected, public; Concurrency visibility: volatile; Variability: final;

The details of the access flag are as follows:

Due to the constraints of Java syntax rules, one of ACC_PUBLIC, ACC_PRIVATE, and ACC_PROTECTED can be selected at most. ACC_FINAL and ACC_VOLATILE cannot be selected at the same time. The fields in the interface must have ACC_PUBLIC, ACC_STATIC, and ACC_FINAL flags.

7. Method table collection

The structure of a method table is the same as that of a field table, consisting of access_flags, name_index, descriptor_index, and Attributes, as shown in the table:

The only difference is the access_flag, because the volatile and transient keywords do not modify methods.

Method table flag bits and their values are as follows:

8. Property sheet collection

Finally, we come to the final item: the property sheet collection.

The aforementioned Class files, field tables, and method tables can carry their own set of property tables, which are referenced here.

The properties in the property sheet collection are as follows:

With other data items in the Class files require strict order, length, and content is different, the limitation of property sheet set loose some, no longer table for each attribute has a strict sequence, and the code for the Java virtual machine allows, so long as you don’t repeat with existing attribute name anyone to realize the compiler can be to write their attributes in the table definition of attribute information, The Java virtual machine runs ignoring properties it does not recognize.





Reference:

[1] : Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices, third edition

[2] : Java Virtual Machine Specification (Java_SE_7)

[3] : JVMS/SE8

[4] : Java Virtual machine details (9) —— class file structure

[5] : A knife, straight into the small heart of the class file

[6] : JVM Series ii – Bytecode File Structure (Basics)