Description: A VM loads data describing classes from class files to memory, verifies, converts, and initializes the data, and finally forms Java types that can be directly used by VMS. This is the class loading mechanism of a VM.

Let’s look at the class loading process in detail:

Class life cycle

From loading into memory to unloading out of memory, the class has gone through four stages: loading, connecting, initializing and using, among which connecting includes three steps: verification, preparation and parsing. These steps generally follow the sequence shown in the figure, but the Java language itself supports runtime binding, so the parsing phase can also occur after initialization. The above order is just to say the beginning of the order, the actual process is cross-carried out, the loading process may have started to verify.

The timing of class loading

The first step is to know when a class needs to be loaded. The Java Virtual Machine specification does not restrict this, but it does specify five situations in which a class must be initialized. Obviously, loading, validating, and preparing must precede initialization.

Class loading timing

The most common scenario for the four bytecode instructions in case 1 in Java is: 1. New an object; 2. Set or get a static field of a class (except for static fields that are final modified into the constant pool); 3. Call a static method of a class

The process of class loading

Let’s take a step by step look at each process of class loading

1. The load

Loading is the first step in the entire class loading process. If classes or interfaces need to be created, the Java virtual machine method area needs to be created in an internal representation that matches the virtual machine implementation specification. Typically, class creation is triggered by another class or interface that references the class to be created through its own run-time constant pool, or by calling some method in the Java core library, such as reflection.

Generally speaking, loading is divided into the following steps:

  1. Gets the binary byte stream of a class by its fully qualified name
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area

Create a Class named C, which, if C is not an array type, can load a binary representation of C (that is, a Class file) through the Class loader. If it is an array, it is created using the Java virtual machine, which recursively loads the components of the array using the loading process mentioned above.

Java virtual machines support two types of loaders:

  • Bootstrap ClassLoader
  • User-defined Class Loader

A user-defined ClassLoader should be an instance of a subclass of the abstract ClassLoader class. Applications use user-defined class loaders to extend the capabilities of the Java virtual machine to dynamically load and create classes. For example, in the first step of loading, we get the binary byte stream. With a custom class loader, we can download, dynamically generate, or extract the class information from an encrypted file from the network.

Class loaders will be described in a new article.

2. Verify

Validation, as the first step in linking, is used to ensure that the binary representation of the class or interface is structurally correct, thus ensuring that the byte stream contains information that is safe for the virtual machine. The rules for the validation phase in the Java Virtual Machine specification are also growing, but generally accomplish the following four validation actions.

validation

1. File format verification: mainly verifies whether the byte stream conforms to the Class file format specification and can be processed by the current version of the VIRTUAL machine. Main verification points:

  • Whether to magic number0xCAFEBABEAt the beginning
  • Check whether the major and minor versions are within the processing range of the current VM
  • Whether constants in the constant pool have unsupported types (check the constant tag flag)
  • Is there any index value that points to a constant that does not exist or that does not conform to a type
  • CONSTANT_Utf8_info whether there is data in the constant that does not conform to UTF8 encoding
  • Whether parts of the Class file and the file itself have been deleted or additional information… In fact, there is more to the verification. Refer to my in-depth understanding of THE JVM Class file format. The verification at this stage is based on binary byte streams.

2. Metadata verification: mainly conducts semantic analysis on the information described by bytecode to ensure that the information provided complies with the requirements of Java language specifications. Main verification points:

  • Whether the class has a parent class (only Object objects do not have a parent class, all others do)
  • Whether the class inherits classes that are not allowed to be inherited (classes modified by final)
  • If the class is not abstract, does it implement all the methods required by its parent or interface
  • Whether a field or method ina class conflicts with the parent class (e.g., overwriting a final field in the parent class, overloading a method that does not conform to the rules, e.g., method parameters are the same, but return value types are different)…

3. Bytecode verification: mainly through data flow and control flow analysis, to determine the program semantics is legal and logical. After verifying the data types in the metadata information in the second phase, bytecode verification verifies and analyzes the method body of the class to ensure that the methods of the verified class do not cause events that harm vm security when running. Mainly include:

  • Ensure that the data type of the operand stack and the sequence of instructions work together at any time, such that an int in the operand stack is not loaded into a local variable as long when used
  • Ensure that the jump does not jump to bytecode instructions outside the method body
  • Ensure that type conversions within the method body are legal. For example, it is legal for a subclass to assign to a parent class, but not legal for a parent class to assign to a subclass or any other type that has no inheritance relationship.
  1. Symbolic reference validation: The final stage of validation occurs when the virtual machine converts symbolic references to direct references. This transformation takes place during the third stage of connection parsing. Symbolic references match information outside the class itself (the various symbolic references in the constant pool). There are usually:
  • Whether a class is found for a fully qualified name described by a string in a symbol reference
  • Whether a field descriptor that matches a method exists in the specified class and the methods and fields described by the simple name
  • The purpose of symbolic reference validation is to ensure that the parse action can be performed properly. If symbolic reference validation fails, the class, method, and field are accessible (private,public,protected, default). It will throw a Java. Lang. A subclass of IncompatibleClassChangeError anomalies, Such as Java. Lang. IllegalAccessError, Java. Lang. NoSuchFieldError, Java. Lang. NoSuchMethodError, etc.

The validation phase is important, but not necessarily necessary. If all code artifacts have been used and validated over and over again, you can use the virtual machine parameter -xVerify: None to turn off validation and speed up class loading time.

3. Prepare

The task in the preparation phase is to allocate space for static fields of the class or interface and initialize these fields by default. This phase does not execute any of the VIRTUAL machine bytecode instructions. These fields are displayed during initialization, so the preparation phase does not do this. Suppose you have:

public static int value = 123;Copy the code

The initial value of value during the preparation phase is 0 instead of 123, and value will be 0 only during the initialization phase. Let’s look at zero values for all the base types in Java:

The data type Zero value
int 0
long 0L
short (short)0
char ‘\u0000’
byte (byte)0
boolean false
float 0.0 f
double 0.0 d
reference null

A special case is that if the field property table contains a ConstantValue property, the value of the preparatory variable is initialized to the value specified by the ConstantValue property, such as the value defined above:

public static final int value = 123;Copy the code

At compile time, value starts out pointing to ConstantValue, so the value of value is already 123 during preparation.

4. The parsing

The parsing phase is the process of replacing symbolic references in the constant pool with direct references. Symbolic references are constants of the types CONSTANT_Class_info, CONSTANT_Fieldref_info, and CONSTANT_Methodref_info in the Class file. Let’s look at the definition of symbolic references and direct references.

Symbolic References: Symbolic References describe the referenced target as a set of symbols, which can be any literal, as long as the target can be uniquely located. Symbol references are independent of memory layout, so the referenced object does not necessarily have to have been loaded into memory. The memory layout can vary among virtual machine implementations, but the accepted symbolic references must be consistent, because the literal form of symbolic references is clearly defined in the Class file format.

Direct References: A pointer to a target directly, a relative offset, or a handle that can be indirectly located to the target. A direct reference is related to the memory layout implemented by the VIRTUAL machine. The translation of a symbolic reference on different virtual machines may not be the same. If there is a direct reference, it must already be in memory.

The following Java virtual machine directives point symbolic references to the runtime constant pool, and each instruction is executed with its symbolic reference resolved:

The command that causes parsing

Multiple parse requests for the same symbol are common. With the exception of invokedynamic instructions, the virtual machine usually caches the results of the first parse and references them directly when they are encountered again to avoid repeating the parse action.

This rule does not hold for invokedynamic instructions. When a symbolic reference is encountered that has previously been resolved by an Invokedynamic instruction, it does not mean that the resolution result is valid for other Invokedynamic instructions. This is due to the semantics of the InvokeDynamic instruction, which is intended for dynamic language support, meaning that the parsing action does not take place until the program actually runs the instruction. Other commands are “static” and can be parsed as soon as the recording phase is complete before the execution of the code begins.

Class and interface resolution: If the Java virtual machine references either class N or interface C in the method body of class D, the following steps are performed:

  1. If C is not an array type, D’s defining classloader is used to create either class N or interface C. Any exceptions that occur during loading can be considered class and interface resolution failures.
  2. If C is an array type and its element type is a reference type. Symbolic references to classes or interfaces representing element types are resolved by recursive calls.
  3. Check C’s access permission. If D does not have access to C, throwjava.lang.IllegalAccessErrorThe exception.

CONSTANT_Class_info: To parse an unparsed symbolic reference to a field, first parse the CONSTANT_Class_info symbol reference to the index in the class_index entry of the field table. If any exceptions occur during the parsing of this class or interface symbol reference, field parsing will fail. If parsing is complete, the class or interface to which the field belongs is represented by C. The VM specification requires the following steps to search for subsequent fields in C.

1. If C contains a field whose simple name and field descriptor match that of the target, a direct reference to the field is returned. The search ends. 2. Otherwise, if an interface is implemented in C, the interface and its parent interface are recursively searched from bottom to top based on inheritance relationship. If the interface contains a field whose simple name and field descriptor match the target, the direct reference of the field is returned, and the search ends. 3. Otherwise, if C is not java.lang.Object, it will recursively search its parent class from bottom to top based on inheritance. If the class contains a field whose simple name and field descriptor match the target, it will return a direct reference to that field, and the search ends. 4. If no, find out failure, throw Java. Lang. NoSuchFieldError anomalies. If returned to the reference, you also need to check the access, if there is no access, will throw out the Java. Lang. IllegalAccessError anomalies.

In a practical implementation, the requirements might be stricter, and the compiler might refuse to compile if the same field name appears in both the parent class and interface of C.

Class method resolution Class method resolution also begins by resolving symbolic references to the class or interface to which the indexed method belongs in the class_index entry in the class method table. We still use C to represent the parsed class, and the virtual machine will perform the following steps to search C for the class methods. 1. First of all check the C method reference for the class or interface, if is the interface, then the method reference will be thrown IncompatibleClassChangeError exception 2. The Method reference checks whether C and its parent class contain the Method. If there is indeed a Method in C that has the same name as the specified Method reference and claims to be a Signature Polymorphic Method, the Method lookup process is considered successful. Classes mentioned by all method descriptors also need to be resolved. For C, it is not necessary to declare methods using the descriptor specified by method references. 3. Otherwise, if C declares a method with the same name and descriptor as a method reference, the method lookup is successful. 4. If C has a parent class, recursively find C’s immediate parent class as described in Step 2. 5. Otherwise, the class C list of implementation of the interface and their parent interface of recursive search whether there is a simple name and descriptor with the target matching method, if there is a match, the method of class C as an abstract class, to find the end, and throw out the Java. Lang. AbstractMethodError anomalies.

  1. Otherwise, the method is declared failed and thrownjava.lang.NoSuchMethodError.

    Finally, if the lookup process successfully returns a direct reference, the method is authenticated for permission, and if it does not have access to the method, it is thrownjava.lang.IllegalAccessErrorThe exception.

Interface method parsing an interface method also needs to parse the symbolic reference of the class or interface to which the indexed method belongs in the class_index entry of the interface method table. If the parsing succeeds, the interface is still represented by C. The VM performs the following operations to search for interface methods. 1. With the class method resolution is different, if found in the interface method table class_index corresponding index C is classes instead of interfaces, direct selling Java. Lang. IncompatibleClassChangeError anomalies. 2. Otherwise, search interface C for a method whose simple name and descriptor match the target. If there is a method whose simple name and descriptor match the target, return a direct reference to the method. 3. Otherwise, recursively search the parent interface of interface C up to java.lang.Object to see if there is a method whose simple name and descriptor match the target. If there is, return a direct reference to the method. 4. Otherwise, declaring the method fails, throwing Java lang. NoSuchMethodError anomalies.

Due to the method of the interface are public by default, so there is no access problem, also can’t throw Java. The basic lang. IllegalAccessError anomalies.

5. The initialization

Initialization is the last step in class loading. In the previous phase, except for the loading phase, which can be loaded through user-defined class loaders, the rest of the virtual machine is basically in charge. But it is during the initialization phase that the user-written Java code is actually executed.

In the preparation phase, variables are assigned initial values, but in the initialization phase, all variables are reinitialized according to user-written code. To put it another way, the initialization phase is the execution of the class constructor

() method.

The < Clinit >() method is generated by combining the compiler’s automatic collection of assignment actions for all class variables in a class with statements in the static block, which can only access variables defined before the static block, based on the order in which the statements appear in the source file. Variables defined after it can be assigned in the previous static block, but cannot be accessed.

public class Test { static { i=0; System.out.print(I); } static int I =1; }Copy the code

The

() method, unlike the class’s constructor

() method, does not explicitly call the parent constructor. The virtual advantage is that the parent

() has already executed before the subclass’s

() method executes. Therefore, the first

() to be executed in the virtual machine must be java.lang.object.




Also due to the order in which

() is executed, the static statement block in the parent class is superior to the variable assignment operation in the subclass, so in the following code snippet, B will have a value of 2.

static class Parent {
  public static int A=1;
  static {
    A=2;
  }
}

static class Sub extends Parent{
  public static int B=A;
}

public static void main(String[] args) {
  System.out.println(Sub.B);
}Copy the code

The

() method is not required for a class, and the compiler will not generate a

() method for a class that has neither a static block nor a static variable assignment action.

An interface cannot use static blocks, but it does allow variable initialization assignments, so the interface generates < Clinit >() just as the class does, but the < Clinit >() in the interface does not need to execute the parent class first. The parent interface is initialized only when a variable defined in the parent class is used. In addition, the interface’s implementation class does not execute the interface’s < Clinit >() method when initialized.

The virtual machine ensures that a class’s

() methods are properly chained and synchronized in a multithreaded environment. If multiple threads initialize a class, only one thread will execute the

() method, and all the others will have to wait.

6. The Java VM exits

The general condition for a Java virtual machine to exit is if some thread calls the Exit method of the Runtime class or the System class, or the HALT method of the Runtime class, and the Java security manager also allows these exit or HALT operations. In addition, the JNI(Java Native Interface) specification also describes the Java VIRTUAL machine exit process when using the JNI API to Load and Unload Java virtual machines.