Learning Java Virtual Machines
- Learn Java Virtual Machine part 1: Memory area and garbage collection
preface
Understanding the JVM is a basic requirement for Java programmers, but how many students like me are so obsessed with solving the bug heap layout that they forget the internal discipline and have a fragmentary understanding of the JVM? A systematic study of the JVM may take us further down the road.
independence
Platform independence
- “Write Once, Run Anywhere”
- Java virtual machines on a variety of platforms, and a program storage format that all platforms support — Byte Code
Language independence
- The Java Virtual machine is not bound to any programming language, including the Java language. It is only associated with a specific binary file format called a Class file, which contains the Java Virtual machine instruction set, symbol tables, and several other auxiliary information.
- The Java Specification is split into The Java Language Specification and The Java Virtual Machine Specification
- As a general-purpose, machine-independent execution platform, implementors of any other language can use the Java Virtual Machine as their language’s running base and Class files as their product delivery medium.
Class The structure of the Class file
A Class file is a set of binary streams based on 8-byte units. Each data item is arranged in strict order and compact in the text, without adding any delimiters in the middle. This makes the content stored in the Class file almost all the data necessary for the program to run, and there is no gap. When a data item needs to occupy more than 8 bytes of space, it will be divided into several 8-bytes for storage according to the big-endian notation.
“Unsigned numbers” and “tables”
According to the Java Virtual Machine Specification, the Class file format uses a pseudo-structure similar to the C-language structure to store data, with only two data types: “unsigned number” and “table.”
- Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.
- A table is a composite data type consisting of multiple unsigned numbers or other tables as data items. To facilitate differentiation, all table names are customarily terminated with “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file can be viewed as essentially a table.
Use 010Editor to view the class file structure
View helloWorld.class in the 010Editor
public class HelloWorld {
private static final String HELLO_WORLD = "Hello World!";
public static void main(String args[]) {
System.out.println(HELLO_WORLD);
}
}
Copy the code
Download address: www.sweetscape.com/download/01… Opening the class file after downloading will automatically prompt you to install classadv. bt
The magic number
- magic
The first four bytes of each Class file are called the Magic Number, which is used to determine if the file is a Class file acceptable to the VIRTUAL machine. The Magic Number for the Class file is 0xCAFEBABE
The version number
- Minorersion and major_version
The four bytes following the magic number store the Version number of the Class file: bytes 5 and 6 are Minor versions, and bytes 7 and 8 are Major versions. Each JDK Version has its own specific Version number. Older JDK versions are backward compatible with older Class files, but older JDK versions cannot run older Class files, and virtual machines refuse to execute older Class files even if the file format has not changed
Constant pool
- u2 constant_pool_count
Since the number of constants in the constant pool is not fixed, you need to place an entry of type U2 in the constant pool, representing the constant pool capacity count (constant_pool_count).
Note that the subscript of the constant pool is from1
This means that the Class file has36
A constant. So why does the index start at 1? The purpose is to indicate that no constant pool entry is referenced in a particular case, when the subscript is used0
Said.
- u2 constant_pool
There are two main types of constants in the constant pool: literals and Symbolic References. Each constant in the constant pool is a table.
There are more than a dozen data types in the constant pool, each with its own data structure, but they all have a common attribute tag. A tag is a flag bit that identifies a data structure.
Common constant pool types:
The class type | The volunteers | describe |
---|---|---|
CONSTANT_Utf8_info | 1 | The character string is utF-8 encoded |
CONSTANT_Integer_info | 3 | Integer literals |
CONSTANT_Float_info | 4 | Floating point literals |
COSTANT_Long_info | 5 | Long integer literals |
CONSTANT_Double_info | 6 | A double – precision floating-point literal |
CONSTANT_Class_info | 7 | Symbolic reference to a class or interface |
CONSTANT_String_info | 8 | String type literals |
CONSTANT_Fieldref_info | 9 | Symbolic reference to a field |
CONSTANT_Methodref_info | 10 | Symbolic references to methods in a class |
CONSTANT_InterfaceMethodref_info | 11 | Symbolic references to methods in the interface |
CONSTANT_NameAndType_info | 12 | A partial symbolic reference to a field or method |
CONSTANT_MethodHandle_info | 15 | Represents a method handle |
CONSTANT_MethodType_info | 16 | Identify method types |
CONSTANT_InvokeDynamic_info | 18 | Represents a dynamic method call point |
CONSTANT_Module_ino | 19 | Represents a module |
CONSTANT_Package_ino | 20 | Represents a package opened or exported by a module |
Let’s once again parse the first constant in the constant pool
- Tag: 10 represents the type CONSTANT_Methodref_info, a symbolic reference to a method in the class
- This is a constant pool index that represents the 5th data item (CONSTANT_Methodref_info’s class_index always points to CONSTANT_Class_info)
- Name_and_type_index: 23 is also a constant pool index
ClassIndex:
- Tag: 7 stands for CONSTANT_Class_info
- Name_index :30 is another constant pool index
- Data item 29 represents a CONSTANT_Utf8_info type, bytes[16] is the content of a string, which you can see from the 010Editor parsing is Java /lang/System, representing the class’s permission name
Let's go back to name_AND_type_index for the first constant
- Tag: 12 stands for CONSTANT_NameAndType_info
- Name_index represents the unqualified name of a field or method, where the value is
- Descriptor_index represents the field or method descriptor, where the value is ()V.
This completes the analysis of the first data item in the constant pool, and each subsequent data item can be analyzed in this way
Javap tool
The rest of the analysis is all manual Thunder, let’s get lazy. In the JDK bin directory, Oracle has prepared a tool for analyzing Class files bytecode: Javap
HP-ProDesk-680-G6-PCI-Microtower-PC:~/DEBUG$ javap -verbose HelloWorld.class Classfile /home/mi/DEBUG/HelloWorld.class Last modified May 12, 2021; size 641 bytes MD5 checksum 1910a4531e5743c190636067d43d4bc4 Compiled from "HelloWorld.java" public class com.wang.javavmdemo.HelloWorld minor version: 0 major version: 52 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #3 // com/wang/javavmdemo/HelloWorld super_class: #6 // java/lang/Object interfaces: 0, fields: 1, methods: 2, attributes: 1 Constant pool: #1 = Methodref #6.#23 // java/lang/Object."<init>":()V #2 = Fieldref #24.#25 // java/lang/System.out:Ljava/io/PrintStream; #3 = Class #26 // com/wang/javavmdemo/HelloWorld #4 = String #27 // Hello World! #5 = Methodref #28.#29 // java/io/PrintStream.println:(Ljava/lang/String;) V #6 = Class #30 // java/lang/Object #7 = Utf8 HELLO_WORLD #8 = Utf8 Ljava/lang/String; #9 = Utf8 ConstantValue #10 = Utf8 <init> #11 = Utf8 ()V #12 = Utf8 Code #13 = Utf8 LineNumberTable #14 = Utf8 LocalVariableTable #15 = Utf8 this #16 = Utf8 Lcom/wang/javavmdemo/HelloWorld; #17 = Utf8 main #18 = Utf8 ([Ljava/lang/String;)V #19 = Utf8 args #20 = Utf8 [Ljava/lang/String; #21 = Utf8 SourceFile #22 = Utf8 HelloWorld.java #23 = NameAndType #10:#11 // "<init>":()V #24 = Class #31 // java/lang/System #25 = NameAndType #32:#33 // out:Ljava/io/PrintStream; #26 = Utf8 com/wang/javavmdemo/HelloWorld #27 = Utf8 Hello World! #28 = Class #34 // java/io/PrintStream #29 = NameAndType #35:#36 // println:(Ljava/lang/String;)V #30 = Utf8 java/lang/Object #31 = Utf8 java/lang/System #32 = Utf8 out #33 = Utf8 Ljava/io/PrintStream; #34 = Utf8 java/io/PrintStream #35 = Utf8 println #36 = Utf8 (Ljava/lang/String;)V { public com.wang.javavmdemo.HelloWorld(); descriptor: ()V flags: (0x0001) ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 3: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Lcom/wang/javavmdemo/HelloWorld; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #4 // String Hello World! 5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return LineNumberTable: line 7: 0 line 8: 8 LocalVariableTable: Args [Ljava/lang/String;} SourceFile: "helloWorld.java"Copy the code
Preview the rest of the class file:
Access tokens
- access_flags
After the constant pool ends, the next two bytes represent access_flags, which are used to identify some Class or interface level access information, including: whether the Class is a Class or an interface; Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final; , etc.
Symbol position and meaning table:
Sign the name | Flag values | Containing righteousness |
---|---|---|
ACC_PUBIC | 0x0001 | Whether the type is public |
ACC_FINAL | 0x0010 | Whether to declare final |
ACC_SUPER | 0x0020 | This flag must be true for all classes compiled after JDK1.0.2 |
ACC_INTERFACE | 0x0200 | Interface or not |
ACC_ABSTRACT | 0x0400 | Whether the type is abstract |
ACC_SYNTHETIC | 0x1000 | Flag that this class is not generated by user code |
ACC_ANNOTATION | 0x2000 | Annotation or not |
ACC_ENUM | 0x4000 | Whether it is an enumeration type |
ACC_MODULE | 0x8000 | Module or not |
The HelloWorld is an ordinary class, not interface, annotations, enumeration type or mo module, and is the public key So his access_flag should be ACC_PUBIC | ACC_SUPER, converted to decimal is 33
A collection of class indexes, parent indexes, and interface indexes
These three pieces of data are used in the Class file to determine the inheritance of the type.
- This_class class index is used to determine the fully qualified name of this class. Value points to an index in the constant pool.
- The super_class superclass index is used to determine the fully qualified name of the superclass of this class. Because the Java language does not allow multiple inheritance, there is only one parent class index. All Java classes except java.lang.Object have a parent class, so none of the Java classes except java.lang.Object have a parent class index of zero.
- Interfaces_count, which indicates the number of interfaces implemented by the class. HelloWorld does not implement any interface so it is 0.
- If several interfaces are implemented, the interface information is stored in subsequent interfaces[]. The interface index collection is used to describe the interfaces implemented by the Class. The implemented interfaces are arranged from left to right after the implements keyword (or extends keyword if the Class file represents an interface).
Field table length and set of field tables
Fields in the Java language include class-level variables as well as instance-level variables, but not local variables declared inside methods.
- Fileds_count specifies the number of variables declared in the class
- Filed_info specifies the variable information declared in this class
private static final String HELLOWORLD = "HelloWorld";
As you can see, Java describes a field as being first access-scoped, public, private, or protected, and this information determines whether the field is visible to heap-scoped classes. The second is the description information modified by some keywords, such as instance variable or class variable, mutable, concurrent visibility, serializable, etc. These keywords include static, final, volatile, transient, etc. The data type (base data type, array, object) and name of the field follow. These modifiers are described in Boolean values, and the data types and names are indeterminate, usually by referring to constants from the constant pool.
Method table length and method table collection
The Class file storage format describes methods in almost exactly the same way that fields are described. The method table is structured like a field table. This includes access_flags, name_index, descriptor_index, and attributes
Where is the code in a method whose definition can be expressed through access flags, names, and descriptor indexes? The Java Code in a method is compiled by the compiler into bytecode instructions and stored in a property called “Code” in the method’s property sheet collection.
Attribute_name_index corresponds to a constant index of type CONSTANT_Utf8_info, which is fixed to "Code".
Property sheet length and collection of property sheets
Class files, field tables, and method tables can all carry their own set of property tables to describe information specific to certain scenarios
With other data class files projects strict sequence length is different, property sheet set limit is relatively loose, does not require the table for each attribute has a strict sequence, as long as you don’t with existing attribute name repetition, anyone to realize the compiler can be to write their own property sheet properties, the JVM running automatically ignore unknown attribute.
The following table defines the properties in java7:
The attribute name | Use location | meaning |
---|---|---|
Code | Method table | Bytecode instructions compiled into Java code |
ConstantValue | Field in the table | Constant pool defined by the final keyword |
Deprecated | Class, method, field list | Methods and fields declared deprecated |
Exceptions | Method table | Method throws an exception |
EnclosingMethod | The class file | This property is available only if a class is local or anonymous and identifies the enclosing method of the class |
InnerClass | The class file | Inner class list |
LineNumberTable | Code attributes | The mapping of Java source line numbers to bytecode instructions |
LocalVariableTable | Code attributes | Local defecate description of the method |
StackMapTable | Code attributes | New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method |
Signature | Class, method table, field table | Used to support method signatures in case of generics |
SourceFile | The class file | Record the source file name |
SourceDebugExtension | The class file | Store additional debugging information |
Synthetic | Class, method table, field table | Flag methods or fields are automatically generated by the compiler |
LocalVariableTypeTable | class | The use of characteristic signatures instead of descriptors was added to describe generic parameterized types after the introduction of generic syntax |
RuntimeVisibleAnnotations | Class, method table, field table | Support for dynamic annotations |
RuntimeInvisibleAnnotations | Table, method table, field table | Use to indicate which annotations are not visible at runtime |
RuntimeVisibleParameterAnnotation | Method table | Role similar to RuntimeVisibleAnnotations attribute, only role for the object |
RuntimeInvisibleParameterAnnotation | Method table | Like RuntimeInvisibleAnnotations attribute, function as object which for method parameters |
AnnotationDefault | Method table | Use to record the default value of the annotation class element |
BootstrapMethods | The class file | Bootstrap qualifier used to hold an InvokedDynamic instruction reference |
For a list of field table sets, property table sets, method table sets, and ACess_flag sets, consult section 6.3 of Understanding the JAVA Virtual Machine like a dictionary. We just need to understand the principle.