Learning Java Virtual Machines

  • Learn Java Virtual Machine part 1: Memory area and garbage collection

preface

Understanding the JVM is a basic requirement for Java programmers, but how many students like me are so obsessed with solving the bug heap layout that they forget the internal discipline and have a fragmentary understanding of the JVM? A systematic study of the JVM may take us further down the road.

independence

Platform independence

  • “Write Once, Run Anywhere”
  • Java virtual machines on a variety of platforms, and a program storage format that all platforms support — Byte Code

Language independence

  • The Java Virtual machine is not bound to any programming language, including the Java language. It is only associated with a specific binary file format called a Class file, which contains the Java Virtual machine instruction set, symbol tables, and several other auxiliary information.
  • The Java Specification is split into The Java Language Specification and The Java Virtual Machine Specification
  • As a general-purpose, machine-independent execution platform, implementors of any other language can use the Java Virtual Machine as their language’s running base and Class files as their product delivery medium.

Class The structure of the Class file

A Class file is a set of binary streams based on 8-byte units. Each data item is arranged in strict order and compact in the text, without adding any delimiters in the middle. This makes the content stored in the Class file almost all the data necessary for the program to run, and there is no gap. When a data item needs to occupy more than 8 bytes of space, it will be divided into several 8-bytes for storage according to the big-endian notation.

“Unsigned numbers” and “tables”

According to the Java Virtual Machine Specification, the Class file format uses a pseudo-structure similar to the C-language structure to store data, with only two data types: “unsigned number” and “table.”

  • Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.
  • A table is a composite data type consisting of multiple unsigned numbers or other tables as data items. To facilitate differentiation, all table names are customarily terminated with “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file can be viewed as essentially a table.

Use 010Editor to view the class file structure

View helloWorld.class in the 010Editor

public class HelloWorld {

    private static final String HELLO_WORLD = "Hello World!";
    public static void main(String args[]) {
        System.out.println(HELLO_WORLD);
    }
}
Copy the code

Download address: www.sweetscape.com/download/01… Opening the class file after downloading will automatically prompt you to install classadv. bt

The magic number

  • magic

The first four bytes of each Class file are called the Magic Number, which is used to determine if the file is a Class file acceptable to the VIRTUAL machine. The Magic Number for the Class file is 0xCAFEBABE

The version number

  • Minorersion and major_version

The four bytes following the magic number store the Version number of the Class file: bytes 5 and 6 are Minor versions, and bytes 7 and 8 are Major versions. Each JDK Version has its own specific Version number. Older JDK versions are backward compatible with older Class files, but older JDK versions cannot run older Class files, and virtual machines refuse to execute older Class files even if the file format has not changed

Constant pool

  • u2 constant_pool_count

Since the number of constants in the constant pool is not fixed, you need to place an entry of type U2 in the constant pool, representing the constant pool capacity count (constant_pool_count).



Note that the subscript of the constant pool is from1This means that the Class file has36A constant. So why does the index start at 1? The purpose is to indicate that no constant pool entry is referenced in a particular case, when the subscript is used0Said.

  • u2 constant_pool

There are two main types of constants in the constant pool: literals and Symbolic References. Each constant in the constant pool is a table.

There are more than a dozen data types in the constant pool, each with its own data structure, but they all have a common attribute tag. A tag is a flag bit that identifies a data structure.

Common constant pool types:

The class type The volunteers describe
CONSTANT_Utf8_info 1 The character string is utF-8 encoded
CONSTANT_Integer_info 3 Integer literals
CONSTANT_Float_info 4 Floating point literals
COSTANT_Long_info 5 Long integer literals
CONSTANT_Double_info 6 A double – precision floating-point literal
CONSTANT_Class_info 7 Symbolic reference to a class or interface
CONSTANT_String_info 8 String type literals
CONSTANT_Fieldref_info 9 Symbolic reference to a field
CONSTANT_Methodref_info 10 Symbolic references to methods in a class
CONSTANT_InterfaceMethodref_info 11 Symbolic references to methods in the interface
CONSTANT_NameAndType_info 12 A partial symbolic reference to a field or method
CONSTANT_MethodHandle_info 15 Represents a method handle
CONSTANT_MethodType_info 16 Identify method types
CONSTANT_InvokeDynamic_info 18 Represents a dynamic method call point
CONSTANT_Module_ino 19 Represents a module
CONSTANT_Package_ino 20 Represents a package opened or exported by a module

Let’s once again parse the first constant in the constant pool

  • Tag: 10 represents the type CONSTANT_Methodref_info, a symbolic reference to a method in the class
  • This is a constant pool index that represents the 5th data item (CONSTANT_Methodref_info’s class_index always points to CONSTANT_Class_info)
  • Name_and_type_index: 23 is also a constant pool index

ClassIndex:

  • Tag: 7 stands for CONSTANT_Class_info
  • Name_index :30 is another constant pool index

  • Data item 29 represents a CONSTANT_Utf8_info type, bytes[16] is the content of a string, which you can see from the 010Editor parsing is Java /lang/System, representing the class’s permission name

Let's go back to name_AND_type_index for the first constant

  • Tag: 12 stands for CONSTANT_NameAndType_info
  • Name_index represents the unqualified name of a field or method, where the value is
  • Descriptor_index represents the field or method descriptor, where the value is ()V.

This completes the analysis of the first data item in the constant pool, and each subsequent data item can be analyzed in this way

Javap tool

The rest of the analysis is all manual Thunder, let’s get lazy. In the JDK bin directory, Oracle has prepared a tool for analyzing Class files bytecode: Javap

HP-ProDesk-680-G6-PCI-Microtower-PC:~/DEBUG$ javap -verbose HelloWorld.class Classfile /home/mi/DEBUG/HelloWorld.class Last modified May 12, 2021; size 641 bytes MD5 checksum 1910a4531e5743c190636067d43d4bc4 Compiled from "HelloWorld.java" public class com.wang.javavmdemo.HelloWorld minor version: 0 major version: 52 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #3 // com/wang/javavmdemo/HelloWorld super_class: #6 // java/lang/Object interfaces: 0, fields: 1, methods: 2, attributes: 1 Constant pool: #1 = Methodref #6.#23 // java/lang/Object."<init>":()V #2 = Fieldref #24.#25 // java/lang/System.out:Ljava/io/PrintStream; #3 = Class #26 // com/wang/javavmdemo/HelloWorld #4 = String #27 // Hello World! #5 = Methodref #28.#29 // java/io/PrintStream.println:(Ljava/lang/String;) V #6 = Class #30 // java/lang/Object #7 = Utf8 HELLO_WORLD #8 = Utf8 Ljava/lang/String; #9 = Utf8 ConstantValue #10 = Utf8 <init> #11 = Utf8 ()V #12 = Utf8 Code #13 = Utf8 LineNumberTable #14 = Utf8 LocalVariableTable #15 = Utf8 this #16 = Utf8 Lcom/wang/javavmdemo/HelloWorld; #17 = Utf8 main #18 = Utf8 ([Ljava/lang/String;)V #19 = Utf8 args #20 = Utf8 [Ljava/lang/String; #21 = Utf8 SourceFile #22 = Utf8 HelloWorld.java #23 = NameAndType #10:#11 // "<init>":()V #24 = Class #31 // java/lang/System #25 = NameAndType #32:#33 // out:Ljava/io/PrintStream; #26 = Utf8 com/wang/javavmdemo/HelloWorld #27 = Utf8 Hello World! #28 =  Class #34 // java/io/PrintStream #29 = NameAndType #35:#36 // println:(Ljava/lang/String;)V #30 = Utf8 java/lang/Object  #31 = Utf8 java/lang/System #32 = Utf8 out #33 = Utf8 Ljava/io/PrintStream; #34 = Utf8 java/io/PrintStream #35 = Utf8 println #36 = Utf8 (Ljava/lang/String;)V { public com.wang.javavmdemo.HelloWorld(); descriptor: ()V flags: (0x0001) ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 3: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Lcom/wang/javavmdemo/HelloWorld; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #4 // String Hello World! 5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return LineNumberTable: line 7: 0 line 8: 8 LocalVariableTable: Args [Ljava/lang/String;} SourceFile: "helloWorld.java"Copy the code

Preview the rest of the class file:

Access tokens

  • access_flags

After the constant pool ends, the next two bytes represent access_flags, which are used to identify some Class or interface level access information, including: whether the Class is a Class or an interface; Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final; , etc.

Symbol position and meaning table:

Sign the name Flag values Containing righteousness
ACC_PUBIC 0x0001 Whether the type is public
ACC_FINAL 0x0010 Whether to declare final
ACC_SUPER 0x0020 This flag must be true for all classes compiled after JDK1.0.2
ACC_INTERFACE 0x0200 Interface or not
ACC_ABSTRACT 0x0400 Whether the type is abstract
ACC_SYNTHETIC 0x1000 Flag that this class is not generated by user code
ACC_ANNOTATION 0x2000 Annotation or not
ACC_ENUM 0x4000 Whether it is an enumeration type
ACC_MODULE 0x8000 Module or not

The HelloWorld is an ordinary class, not interface, annotations, enumeration type or mo module, and is the public key So his access_flag should be ACC_PUBIC | ACC_SUPER, converted to decimal is 33

A collection of class indexes, parent indexes, and interface indexes

These three pieces of data are used in the Class file to determine the inheritance of the type.

  • This_class class index is used to determine the fully qualified name of this class. Value points to an index in the constant pool.
  • The super_class superclass index is used to determine the fully qualified name of the superclass of this class. Because the Java language does not allow multiple inheritance, there is only one parent class index. All Java classes except java.lang.Object have a parent class, so none of the Java classes except java.lang.Object have a parent class index of zero.
  • Interfaces_count, which indicates the number of interfaces implemented by the class. HelloWorld does not implement any interface so it is 0.
  • If several interfaces are implemented, the interface information is stored in subsequent interfaces[]. The interface index collection is used to describe the interfaces implemented by the Class. The implemented interfaces are arranged from left to right after the implements keyword (or extends keyword if the Class file represents an interface).

Field table length and set of field tables

Fields in the Java language include class-level variables as well as instance-level variables, but not local variables declared inside methods.

  • Fileds_count specifies the number of variables declared in the class
  • Filed_info specifies the variable information declared in this class

private static final String HELLOWORLD = "HelloWorld";

As you can see, Java describes a field as being first access-scoped, public, private, or protected, and this information determines whether the field is visible to heap-scoped classes. The second is the description information modified by some keywords, such as instance variable or class variable, mutable, concurrent visibility, serializable, etc. These keywords include static, final, volatile, transient, etc. The data type (base data type, array, object) and name of the field follow. These modifiers are described in Boolean values, and the data types and names are indeterminate, usually by referring to constants from the constant pool.

Method table length and method table collection

The Class file storage format describes methods in almost exactly the same way that fields are described. The method table is structured like a field table. This includes access_flags, name_index, descriptor_index, and attributes

Where is the code in a method whose definition can be expressed through access flags, names, and descriptor indexes? The Java Code in a method is compiled by the compiler into bytecode instructions and stored in a property called “Code” in the method’s property sheet collection.

Attribute_name_index corresponds to a constant index of type CONSTANT_Utf8_info, which is fixed to "Code".

Property sheet length and collection of property sheets

Class files, field tables, and method tables can all carry their own set of property tables to describe information specific to certain scenarios

With other data class files projects strict sequence length is different, property sheet set limit is relatively loose, does not require the table for each attribute has a strict sequence, as long as you don’t with existing attribute name repetition, anyone to realize the compiler can be to write their own property sheet properties, the JVM running automatically ignore unknown attribute.

The following table defines the properties in java7:

The attribute name Use location meaning
Code Method table Bytecode instructions compiled into Java code
ConstantValue Field in the table Constant pool defined by the final keyword
Deprecated Class, method, field list Methods and fields declared deprecated
Exceptions Method table Method throws an exception
EnclosingMethod The class file This property is available only if a class is local or anonymous and identifies the enclosing method of the class
InnerClass The class file Inner class list
LineNumberTable Code attributes The mapping of Java source line numbers to bytecode instructions
LocalVariableTable Code attributes Local defecate description of the method
StackMapTable Code attributes New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method
Signature Class, method table, field table Used to support method signatures in case of generics
SourceFile The class file Record the source file name
SourceDebugExtension The class file Store additional debugging information
Synthetic Class, method table, field table Flag methods or fields are automatically generated by the compiler
LocalVariableTypeTable class The use of characteristic signatures instead of descriptors was added to describe generic parameterized types after the introduction of generic syntax
RuntimeVisibleAnnotations Class, method table, field table Support for dynamic annotations
RuntimeInvisibleAnnotations Table, method table, field table Use to indicate which annotations are not visible at runtime
RuntimeVisibleParameterAnnotation Method table Role similar to RuntimeVisibleAnnotations attribute, only role for the object
RuntimeInvisibleParameterAnnotation Method table Like RuntimeInvisibleAnnotations attribute, function as object which for method parameters
AnnotationDefault Method table Use to record the default value of the annotation class element
BootstrapMethods The class file Bootstrap qualifier used to hold an InvokedDynamic instruction reference

For a list of field table sets, property table sets, method table sets, and ACess_flag sets, consult section 6.3 of Understanding the JAVA Virtual Machine like a dictionary. We just need to understand the principle.