The author | TongHui
Almond Java development engineer, focus on underlying technology.
Introduction: We all know that to run a Java file containing the main method, we compile it into a class file, load it into the JVM, and then run it, but there are some questions: what exactly is in the compiled class file? How does the JVM execute a class file? Let’s look at a very simple example of how the JVM works.
1. Prepare
The following are examples of Java files and class files:
Java file:
The class files:
2. Structure of the class file
As you can see from the above, the class file is indeed made up of bits of bytecode like its other name, a bytecode file. Note that class files are made up of bytes, so if data is larger than one byte, it is stored in unsigned big-endian mode. See here for the difference between big-endian mode and small-endian mode. So what do these bytes mean? How does the JVM parse these bytes of data? Oracle defines the structure of a class file:
As you can see from the above, each byte in a Class file has a specific meaning. For example, the first four bytes represent a magic number (CAFEBABE), which is the same for all Class files. The next two bytes are the next version number. Another example is cp_info, which is a very important field and is the constant pool that I’ll focus on later.
Java Language and Virtual MachineSpecifications are even possible if you need to see what each field represents https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.1
The above structure may seem abstract, so take a look at the following diagram:
As you can imagine, all the bytes in a class file represent fixed information, so the JVM knows exactly what the class file contains, such as method information, field information, etc., based on the format of the class file.
3. Important components of the class file
Now that we know the structure of a class file, let’s look at some of the important components of a class file.
3.1 constant pool
The constant pool is the cp_info field in ClassFile you saw earlier. Let’s take an intuitive look at what a constant pool looks like:
This is the constant pool of HelloWorld.class. The first two bytes of a constant pool indicate the number of constant entries in the constant pool, which is finite because there are only two bytes. You can figure out how many. The number of constant terms is followed by each constant term. Each constant item has a 1-byte tag bit, which is used to indicate what the constant item represents. If tag is 0A, it indicates that this is a MethodRef constant item.
In technical terms the constant pool holds literal and symbolic references. Literals are like text strings, or constant values that are declared final. Symbolic references include fully qualified names of three classes of constant classes and interfaces, field names and descriptors, and method names and descriptors.
In particular, the index of constant entries in the constant pool starts at 1, so that the index value is set to 0 when other structures need to indicate that no constant entries are referenced.
As you can summarize from the previous description, all constant pool entries have the following common format:
cp_info {
u1 tag;
u1 info[];
}
Copy the code
In the constant pool, each cp_INFO entry (that is, constant entry) must have the same format, starting with a single-byte tag entry that represents the cp_INFO type. The contents of the following info[] items are determined by the type of tag.
The types of tags are as follows:
Some common constants:
The Class Info:
CONSTANT_Class_Info {
u1 tag;
u2 name_index;
}
Copy the code
-
The value of tag is 7
-
Name_index refers to a constant item in the constant pool whose index is name_index
UTF8 Info:
CONSTANT_UTF8_Info {
u1 tag;
u2 length;
u1 bytes[length];
}
Copy the code
-
The value of tag is 1
-
Length represents the number of bytes of the UTF8 encoded string
-
Bytes [length] Specifies the length of the string
Note: The method name and field name in the class file reference UTF8 Info, but the UTF8 Info data length is 2 bytes, so the maximum length of the method name and field name is 65535.
String Info:
CONSTANT_String_INFO {
u1 tag;
u2 string_index;
}
Copy the code
-
The value of tag is 8
-
String_index refers to a constant entry in the constant pool whose index is sent to string_index
Field_Ref Info:
CONSTANT_Fieldref_Info {
u1 tag;
u2 class_index;
u2 name_and_type_index;
}
Copy the code
-
The value of tag is 9
-
Class_index refers to a constant item in the constant pool whose index is class_index, and this constant item must be of Class Info type
-
Name_and_type_index refers to a constant item in the constant pool whose index is name_AND_type_index, And this constant item must be of Type Name And Type Info
Method_Ref Info:
CONSTANT_Methodref_Info {
u1 tag;
u2 class_index;
u2 name_and_type_index;
}
Copy the code
-
The value of tag is 10
-
Class_index refers to a constant item in the constant pool whose index is class_index, and this constant item must be of Class Info type
-
Name_and_type_index refers to a constant item in the constant pool whose index is name_AND_type_index, And this constant item must be of Type Name And Type Info
NameAndType Info:
CONSTANT_NameAndType_Info {
u1 tag;
u2 name_index;
u2 descriptor_index;
}
Copy the code
-
The value of tag is 12
-
Name_index refers to a constant item in the constant pool whose index is name_index
-
Descriptor_index refers to the constant item in the constant pool whose index is Descriptor_index
3.2 the field
As with the previous constant pools, because the number of fields in each class is uncertain, the first two bytes of the field section are used to indicate the number of fields in the current class file, followed by the specific fields.
Let’s take a look at the structure of the fields
Field_Info {
u2 access_flag;
u2 name_index;
u2 descriptor_index;
u2 attribute_count;
attribute_info attributes[attribute_count];
}
Copy the code
-
Access_flag represents the access modifier for the field, which is similar to the representation of the class, but has a different content
The access identity of the field
-
Name_index points to the constant entry of the name_INDEX index in the constant pool
-
Descriptor_index The constant item pointing to the Descriptor_index in the constant pool
-
Attribute_count indicates the number of attributes in the field
-
Attributes [attribute_count] indicates the specific attributes of the field
Note: The type of the field descriptor here is the type of the field, but it’s not the type of the whole word like int or String when you write code, it’s the shorthand for some character, like this:
So, for example, if the field is String, the descriptor is Ljava/lang/Object; If the field is int[][], its descriptor is [[I
The properties of the fields are the same as those of the methods described below.
3.3 methods
Methods, like fields, need a field that represents the number of methods, followed by the specific method
Again, look at the structure of the method:
Method_Info {
u2 access_flag;
u2 name_index;
u2 descriptor_index;
u2 attribute_count;
attribute_info attributes[attribute_count]
}
Copy the code
-
The value of access_flag is the same as that of field, but the value is different. The value of access_flag for method can be as follows:
-
Name_index has the same meaning as field, indicating the name of the method
-
Descriptor_index also means the same thing as field except that it is used differently, so let’s see how it is used:
Method Descriptors are made up of two parts: parameter descriptors and return value descriptors. So method descriptors take the following form:
( ParameterDescriptor* ) ReturnDescriptor
Copy the code
And the parameter descriptor is the field descriptor, and the return value descriptor is the field descriptor, but one more type is void, and the descriptor is the following:
VoidDescriptor:V
Copy the code
So for example, if the signature of a method is
Object m(int i, double d, Thread t) {.. }
Copy the code
So its descriptor is theta
(IDLjava/lang/Thread;) Ljava/lang/Object;
Copy the code
-
Attribute_count means the number of attributes as field does
-
Attributes [attribute_count] and field also represent specific attributes, and the number of attributes is determined by attribute_count
3.4 attributes
3.4.1 Attribute structure
Property This data structure can appear in class files, field tables, and method tables. Some properties are unique, and some properties are common to all three.
Attributes are described as follows:
Instead of explaining each Attribute in detail, let’s look at the most important Attribute in a method table, the Code Attribute. This is important because the Code for our function is in the Code Attribute (which actually stores instructions). Other properties of some explanation can refer to the Oracle of the JVM specification described in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7-300
3.4.2 Code Attribute
First look at the structure of the Code Attribute
Code_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 max_stack;
u2 max_locals;
u4 code_length;
u1 code[code_length];
u2 exception_table_length;
{ u2 start_pc;
u2 end_pc;
u2 handler_pc;
u2 catch_type;
} exception_table[exception_table_length];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Copy the code
As you can see, the Code Attribute Attribute is very complex. Let’s briefly explain the meaning of each member:
-
The index of the constant entry in the constant pool to which attribute_name_index points, and the constant entry must be of type UTF8 Info and value “Code”.
-
Attribute_length indicates the length of the attribute, but does not include the first six bytes
-
Max_stack represents the maximum depth of the operand stack in the function stack frame formed at runtime by the method with the Code attribute
-
Max_locals indicates the length of the maximum local variable table
-
Code_length indicates the length of the method in which the Code attribute is located (this length is the length of the method Code compiled into bytes)
-
Code [length] indicates the specific code, so the code length of Java functions is limited. The length of compiled byte instructions can only be the maximum of 4 bytes. So the code for a function should not be too long, or it will not compile
-
Exception_table_length indicates the number of exceptions that the method will throw
-
Exception_table [Exception_table_length] Indicates a specific exception
-
Attributes_count represents the length of the subattribute of the Code attribute, which is complicated because it can also be nested
-
Attributes [attributes_count] represents a specific attribute
Helloworld.class Code Attribute: helloWorld.class Code Attribute: helloWorld.class
3.4.3 Code Attribute Two sub-attributes
Here’s an additional Code Attribute with two child attributes. Have you ever wondered why when we run an error using an IDE, the IDE can pinpoint exactly which line of code is wrong? Why is it that when we use a method in the IDE we can see the parameter name of the method and get the variable value from the parameter name when debugging? The key reason is these two child properties of the Code property.
LineNumberTable
The structure of LineNumberTable
LineNumberTable_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 line_number_table_length;
{ u2 start_pc;
u2 line_number;
} line_number_table[line_number_table_length];
}
Copy the code
Start_pc is the index value of the Code [] array in the Code Attribute. Line_number is the line number of the source file
LocalVariableTable
The structure of LocalVariableTable
LocalVariableTable_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 local_variable_table_length;
{ u2 start_pc;
u2 length;
u2 name_index;
u2 descriptor_index;
u2 index;
} local_variable_table[local_variable_table_length];
}
Copy the code
Local_variable_table [local_variable_table_length];
-
Start_pc and length indicate the index range of the local variable ([start_pc, start_PC + length))
-
Name_index indicates the index of the variable name in the constant pool
-
Descriptor_index indicates the index of the variable descriptor in the constant pool
-
Index represents the index of this local variable in the local variable table
The LocalVariableTable property is actually used to describe the relationship between the variables in the LocalVariableTable in the stack frame and the variables defined in the Java source code, so that other people can refer to the method and know the property name of the method. And you can get parameter values from the context based on parameter names during debugging.
4. Execution engine
After the JVM has parsed the class file, it transforms it into a runtime structure, stores it in a method section (also known as a permanent generation), and creates a class object (also known as a class object) that provides an interface to access the class data.
When executing, the JVM will always start with the main method, which is to find the main method from all the methods in the Class, and then find the bytecode of the method body from the Code Attribute of the main method and call the execution engine. So to understand how the JVM executes code, you need to know something about bytecode.
4.1 Runtime stack frame structure
Let’s take a look at the RUNTIME structure of the JVM
Because the JVM is a stack-based virtual machine, almost all operations need to be done through operations on the stack. This is done by starting with main (a stack frame for main is created at the beginning), executing instructions for main (in the Code Attribute), creating a new stack frame if you want to call a method, and popping up the first stack frame if the function is finished.
4.2 JVM instructions
No matter what function you write in the Java source file and what sophisticated algorithm you use, the compiler compiles it into the class file one by one, and the bytes in the Code [] field in Code Attribute are the bytecode instructions translated from the function.
The instructions supported by the JVM can be roughly divided into three categories: those with no operands, those with the sum of one operand, and those with one operand. Of two operands. Because the JVM represents instructions in a single byte, there are only 256 instructions at most.
The general form of JVM instructions is as follows:
4.3 Several commonly used instruction parsing
Because the JVM has too many instructions to parse all of them here, a few instructions were chosen to parse.
4.3.1 invokespecial
Invokespecial is used to call instance methods. It is specifically used to handle calls to superclass methods, private methods, and instance initialization methods.
IndexByte1 and indexByte2 used in the index of the constant pool ((indexByte1 < < 8) | indexByte2). The constant term pointed to must be of type MethodRef Info. It also creates a function stack frame, pushes the argument of the called method from the current operand stack, and places it in the local variable table of the function stack frame of the called method.
4.3.2 aload_n
Aload_n loads a reference type value from the local variator into the operand stack. The value of N determines which variable to load from the local variator of the current function stack frame.
4.3.3 astore_n
Note: Store a reference type data in the local variable table, and the location of the local variable table is determined by the value of N.
All right, so that’s it. For instructions, you can look at Oracle’s JVM instruction set, which has detailed instructions for each instruction.
So the job of the execution engine is to implement the corresponding function according to each instruction.
5. To summarize
Due to the richness of the JVM, only the major processes performed by the JVM are analyzed here, and some of the processes such as class loading, class linking (validation, preparation, parsing), and initialization are not explained. This is not to say that these are not important, but we can pay more attention to some of these things when we write code. Here I also wrote a can run according to the above example, can be found here https://github.com/thlcly/Mini-JVM.
6. Reference
The Java® Virtual Machine Specification
Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 2)
In-depth Java Virtual Machine version 2
The full text after
You may also be interested in the following articles:
-
Lego micro Service Transformation (I)
-
Lego micro service Transformation (II)
-
A startup’s Path to containerization (I) – Containerization before
-
A startup’s path to containerization (II) – Containerization
-
The containerization of a startup (iii) – The container is the future
-
Four-dimensional Reading: My Secret Technique for Efficient Study
-
The skills necessary for an engineer to grow
-
Responsive programming (PART 1) : Overview
-
Responsive programming (part 2) : Spring 5
-
Apple’s three kits in health care
-
Talk about the mobile cross-platform database Realm
-
Processing of complex business state: from state mode to FSM
-
React to Preact Migration Guide
-
Brief introduction to back-end cache system
-
What exactly is abstraction and the principles of abstraction in software design
Almond technology station
Long press the left QR code to pay attention to us, here is a group of passionate young people looking forward to meeting with you.