The Java VIRTUAL machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by the VIRTUAL machine. This process is called the virtual machine Class loading mechanism.

Class loading timing

When a type is loaded into vm memory and unloaded from memory, 2. Its life cycle must go through Loading, Verification, Preparation, Resolution, Initialization, Using and Unloading. The three parts of validation, preparation and parsing are collectively referred to as Linking. Figure 7-1 shows the sequence of the seven stages.

Figure 7-1 Life cycle of a class

Load, validation, preparation, initialization and unload the order of the five stages is certain, the type of loading process must, in accordance with the order, step by step while parsing stage does not necessarily: in some cases it can start again after the initialization phase, this is in order to support the Java language runtime binding features (also called dynamic binding or late binding). Note that it says “start”, not “proceed” or “finish”, because these phases are often intermixed, invoking and activating another phase as one is executed.

The Java Virtual Machine Specification specifies that there are only six cases in which classes must be “initialized” immediately (and loading, validation, and preparation naturally need to begin before then) :

  1. When you encounter four bytecode instructions — New, getstatic, putstatic, or Invokestatic — if the type has not been initialized, you need to trigger its initialization phase first. Typical Java code scenarios where these four instructions can be generated are: · When objects are instantiated using the new keyword. · When reading or setting a static field of a type (except static fields that are final and have been put into the constant pool at compile time). · When a static method of a type is called.
  2. When a reflection call is made to a type using the java.lang.Reflect package’s methods, initialization needs to be triggered if the type has not already been initialized.
  3. When initializing a class, if the parent class has not been initialized, the initialization of the parent class must be triggered first.
  4. When the virtual machine starts, the user needs to specify a primary class (the one containing the main() method) to execute, and the virtual machine initializes this primary class first.
  5. When using the new dynamic language support in JDK 7, If a Java lang. Invoke. The final analytical results for MethodHandle instance REF_getStatic, REF_putStatic, REF_invokeStatic, REF_newInvokeSpecial handle four kinds of methods, If the class corresponding to the method handle has not been initialized, it needs to be initialized first.
  6. When an interface defines a new JDK 8 default method (an interface method decorated with the default keyword), if any of the interface’s implementation classes are initialized, the interface should be initialized before it.

The Java Virtual Machine Specification uses a very strong qualifier “have and only” for each of the six scenarios that trigger the initialization of a type, and the action in these six scenarios is called an active reference to a type. In addition, none of the reference types trigger initialization, called passive references. Here are three examples of what a passive quote is:

1. Referring to a static field of a parent class by subclass does not cause subclass initialization

package org.fenixsoft.classloading;

/** * Passive use of class fields demo 1: * Referencing a static field of a parent class by a subclass does not cause the subclass to initialize **/
public class SuperClass {

    static {
        System.out.println("SuperClass init!");
    }

    public static int value = 123;
}

public class SubClass extends SuperClass {

    static {
        System.out.println("SubClass init!"); }}/** * not actively using class fields to demonstrate **/
public class NotInitialization {

    public static void main(String[] args) { System.out.println(SubClass.value); }}Copy the code

After the above code runs, it simply prints “SuperClass init!” Instead of printing “SubClass init!” . For static fields, only the class that directly defines the field is initialized, so referring to a static field defined in a parent class by its subclass triggers initialization of the parent class but not the subclass.

2. Referencing a class through an array definition does not trigger initialization of the class

package org.fenixsoft.classloading;

/** * Passive use of class fields demo 2: * Referencing a class through an array definition does not trigger initialization of the class **/
public class NotInitialization {

    public static void main(String[] args) {
        SuperClass[] sca = new SuperClass[10]; }}Copy the code

This code inside triggered another called “[Lorg. Fenixsoft. Classloading. SuperClass” class initialization phase, for the user code, this is not a valid type name, It is a subclass automatically generated by the virtual machine that directly inherits from Java.lang. Object, and the creation action is triggered by the bytecode instruction Newarray.

This class represents an element type is org. Fenixsoft. Classloading. The SuperClass of a one-dimensional array, the array of its properties and methods (the user can directly use only be modified for the length attribute of public and clone () method) are implemented in this class.

3. Constants are stored in the constant pool of the calling class at compile time. In essence, there is no direct reference to the class that defines the constant, so it does not trigger initialization of the class that defines the constant

package org.fenixsoft.classloading;

* Constants are stored in the constant pool of the calling class at compile time. There is no direct reference to the class that defines the constant, so it does not trigger initialization of the class that defines the constant **/
public class ConstClass {

    static {
        System.out.println("ConstClass init!");
    }

    public static final String HELLOWORLD = "hello world";
}

/** * not actively using class fields to demonstrate **/
public class NotInitialization {

    public static void main(String[] args) { System.out.println(ConstClass.HELLOWORLD); }}Copy the code

Although the Java source code does reference the ConstClass constant HELLOWORLD, the value of this constant is stored directly in the NotInitialization constant pool during compilation. Subsequent references from NotInitialization to constClass. HELLOWORLD are actually converted to references from the NotInitialization class to its own constant pool.

The real difference between an interface and a class is the third of the six “have and only” initialization scenarios described earlier: When a class is initialized, all of its parents are required to be initialized. However, when an interface is initialized, its parents are not required to be initialized. The parent interface is initialized only when the parent interface is actually used (for example, referencing constants defined in the interface).

Second, the process of class loading

1, load,

During the load phase, the Java virtual machine needs to do three things:

  1. Gets the binary byte stream that defines a class by its fully qualified name.
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area.

The situation is different for array classes, which themselves are not created by the classloader, but are dynamically constructed directly in memory by the Java virtual machine. But array classes are still closely related to class loaders, because the Element Type of an array (i.e., the Type of the array with all dimensions removed) is ultimately loaded by the class loader. The creation of an array class (hereinafter referred to as C) follows these rules:

  • If the Component Type of the array is a reference Type, the load procedure defined in this section is used recursively to load the Component Type. Array C will be identified on the class namespace of the classloader that loads the component type (this is important; a type must be unique with the classloader).
  • If the array’s component type is not a reference type (for example, int[] is int), the Java virtual machine will mark array C as associated with the bootstrap classloader.
  • The accessibility of an array class is the same as the accessibility of its component type. If the component type is not a reference type, the accessibility of its array class defaults to public and is accessible by all classes and interfaces.

2, validation,

Validation is the first step in the connection phase. The purpose of this phase is to ensure that the information contained in the byte stream of a Class file complies with all the constraints of the Java Virtual Machine Specification and that the information can be run as code without compromising the security of the virtual machine itself.

The verification phase will roughly complete the following four stages of verification: file format verification, metadata verification, bytecode verification and symbol reference verification.

2.1 file format verification

The first step is to verify that the byte stream complies with the Class file format specification and can be processed by the current version of the virtual machine. This phase may include the following verification points:

  • Does it start with the magic number 0xCAFEBABE?
  • Check whether the major and minor versions are acceptable for the current Java VM.
  • If there are unsupported constant types in the constant pool (check the constant tag flag).
  • Is there any index value that points to a constant that does not exist or that does not conform to a type?
  • CONSTANT_Utf8_info whether there is data in the constant that does not conform to utF-8 encoding.
  • Whether any other information has been deleted or added to parts of the Class file and the file itself.

The main purpose of this validation phase is to ensure that the input byte stream is properly parsed and stored in the method area in a format that describes the information of a Java type. This stage of the validation should be performed based on binary byte stream, it is only through the validation of this phase, the byte stream is allowed to enter the Java virtual machine memory method for storage in the area, so all * * at the back of the three stages of validation method is based on the storage structure, it won’t directly read, byte stream operation again. 民运分子

2.2 Metadata verification

The second stage is ** semantic analysis of the information described by bytecode ** to ensure that the information described conforms to the requirements of the Java Language Specification. This stage may include the following verification points:

  • Whether this class has a parent (all classes except java.lang.Object should have a parent).
  • Whether the parent of this class inherits classes that are not allowed to be inherited (classes modified by final).
  • If the class is not abstract, does it implement all the methods required by its parent or interface?
  • Whether a field or method ina class conflicts with the parent class (for example, overwriting a final field in the parent class, or overloading a method that does not conform to the rules, such as the same method parameters but different return value types).

2.3 bytecode verification

After verifying the data type in the metadata information in the second stage, the method body of the Class (Code attribute in the Class file) should be verified and analyzed ** to ensure that the methods of the verified Class do not endanger the security of the VIRTUAL machine at runtime, for example:

  • Ensure that the data type of the operand stack and the sequence of instruction codes work together at any time. For example, there is no such thing as “putting an int on the operand stack and loading it into the local variable table as long”.
  • Ensure that no jump instruction jumps to a bytecode instruction outside the method body.
  • Ensure the method body type conversion is always effective, for example, can put a subclass object assignment to the parent class data type, which is safe, but the parent class object is assigned to a subclass data types, even the object assignment give it no inheritance relationships, and completely irrelevant to a data type, is dangerous and illegal.

Due to the high complexity of data flow analysis and control flow analysis, the Java virtual machine design team implemented a joint optimization between the Javac compiler and the Java virtual machine after JDK 6 to move as much validation assistance as possible into the Javac compiler to avoid excessive execution time spent in the bytecode validation phase. To do this, add a new attribute named “StackMapTable” to the property table of the method body’s Code property. This attribute describes the state of the local change table and operation stack when all the Basic blocks of the method body start. During bytecode validation, The Java virtual machine does not need to derive the validity of these states programmatically, but simply checks whether the records in the StackMapTable attribute are valid. This changes the type derivation of bytecode validation to type checking, saving a lot of validation time. Theoretically, StackMapTable attributes can also be incorrectly or tampered with, so whether it is possible to maliciously tamper with Code attributes and generate corresponding StackMapTable attributes to cheat the type verification of virtual machines is a problem that needs to be carefully considered by virtual machine designers.

2.4 symbol reference verification

The validation behavior in the last stage ** occurs when the virtual machine converts symbolic references to direct references, which occurs in the parse phase, the third stage of the connection. Symbolic reference verification can be regarded as the matching verification of all kinds of information outside the class itself (various symbolic references in the constant pool). In plain English, that is, whether the class is missing or denied access to some external classes, methods, fields and other resources on which it depends. ** The following items usually need to be verified in this stage:

  • Whether a class can be found for a fully qualified name described by a string in a symbol reference.
  • Whether a field descriptor for a method and methods and fields described by a simple name exist in the specified class.
  • The accessibility of classes, fields, and methods in symbolic references (private, protected, public,) is accessible to the current class.

The main purpose of symbolic reference validation is to ensure that the parsing behavior performs properly.

The validation phase is a very important, but not mandatory, phase for the virtual machine’s classloading mechanism, because the validation phase can only be passed or not passed. Once the validation phase is passed, it has no effect on the program’s runtime. If the program to run all the code (including their own writing, the third-party packages, from the external load, and dynamically generated all code) have been repeatedly used and validated in the implementation of the production stage can consider to use – Xverify: none parameters to shut down most of the class verification measures, so as to shorten the time of virtual machine class loading.

3, preparation,

The preparation phase is the phase where you formally allocate memory and set initial values for variables defined in a class (that is, static variables, modified by static). Conceptually, the memory used by these variables should be allocated in the method area, which HotSpot implemented using persistent generation in JDK 7 and before, and in JDK 8 and after, Class variables are stored in the Java heap along with Class objects.

The initial value here is “normally” zero for the data type, assuming that the definition of a class variable is:

public static int value = 123;
Copy the code

The initial value of the value variable after the preparation phase is 0 instead of 123, because the execution of any Java methods has not yet started. The putStatic instruction that assigns value to 123 is stored in the class constructor () method after the program is compiled, so assigning value to 123 is not done until the initialization phase of the class.

If a class field has a ConstantValue attribute in the field attribute table, the value of the variable will be initialized to the initial value specified by the ConstantValue attribute in the preparation phase, assuming that the above definition of the class variable value is changed to:

public static final int value = 123;
Copy the code

The Javac will generate a ConstantValue attribute for value at compile time, and in preparation the vm will assign value to 123 based on the ConstantValue setting.

4, parsing,

The parsing phase is the process by which the Java virtual machine replaces symbolic references in the constant pool with direct references.

What is the connection between a direct reference and a symbolic reference in the parsing phase?

Symbolic References

A symbolic reference describes the referenced object as a set of symbols, which can be any literal, as long as they are used to unambiguously locate the object. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily something that has been loaded into the virtual machine’s memory. The memory layout of various virtual machine implementations can vary, but the symbolic references they accept must all be consistent, because the literal form of symbolic references is explicitly defined in the Java Virtual Machine Specification’s Class file format.

Direct References

A direct reference is a pointer that can point directly to a target, a relative offset, or a handle that can be indirectly located to the target. A direct reference is directly related to the memory layout implemented by the VIRTUAL machine. The direct reference translated from the same symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in the virtual machine’s memory.

The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier. CONSTANT_Class_info, con-stant_fieldref_info, CONSTANT_Methodref_info, CONSTANT_InterfaceMethodref_info, and CONSTANT_M correspond to the constant pool EthodType_info, CONSTANT_MethodHandle_info, constant_dyna-mic_info and CONSTANT_InvokeDynamic_info are 8 constant types.

4.1. Class or interface resolution

If you want to parse an unparsed symbolic reference N into a direct reference to a class or interface C, the virtual machine will need to complete the following three steps:

  1. If C is not an array type, the virtual machine will pass the fully qualified name representing N to D’s classloader to load C. In the loading process, due to the requirements of metadata verification and bytecode verification, loading actions of other related classes may be triggered, such as loading the parent class or the interface implemented by this class. If any exception occurs during the load process, the parsing process will fail.
  2. If C is an array type and the element type of the array is an object, the descriptor of N will be of the form “[Ljava/lang/Integer”, then the array element type will be loaded according to the rules in point 1. If the descriptor of N is of the form assumed earlier, The element type to load is “java.lang.Integer”, and the virtual machine generates an array object representing the dimensions and elements of the array.
  3. If nothing happens in the previous two steps, then C is actually a valid class or interface in the virtual machine, but symbolic reference validation is performed to verify that D has access to C before parsing is complete. If it is found that do not have access, will throw Java. Lang. IllegalAccessError anomalies.

4.2 Field parsing

To resolve an unparsed symbol reference to a field, the CONSTANT_Class_info symbol reference of the index in the class_index entry in the field table is first resolved, that is, the symbol reference of the class or interface to which the field belongs.

4.3 method analysis

The first step of method parsing is the same as field parsing. It is also necessary to resolve the symbolic reference of the class or interface of the index method in the class_index entry of the method table. If the resolution is successful, then we still use C to represent the class.

  1. Because the Class file format reference method and the method of the interface of a Class symbols in constant type definitions are separated, if found in the Class method table class_index index C is an interface, that is thrown directly Java. Lang. IncompatibleClassChangeError anomalies.
  2. If you pass the first step, look for a method in class C with a simple name and descriptor that match the target. If so, return a direct reference to the method, and the search ends.
  3. Otherwise, the parent class of class C recursively looks for a method with a simple name and descriptor that match the target, and returns a direct reference to the method, ending the search.
  4. Otherwise, the class C list of implementation of the interface and their parent interface of recursive search whether there is a simple name and descriptor with the target matching method, if there is a match, the method of class C is an abstract class, at that time to find the end, throwing Java lang. AbstractMethodError anomalies.
  5. Otherwise, declaring the method lookup failure, throw out the Java. Lang. NoSuchMethodError. Finally, if the lookup process successfully returned to the reference directly, will the method for authentication, if it is found that do not have access to this method, will throw Java. Lang. IllegalAccessError anomalies.

4.4 interface method analysis

The interface method is also a symbolic reference to the class or interface that the index method belongs to in the class_index[illustration] entry of the interface method table. If the resolution is successful, the interface is still represented by C. Then the VM will follow the following steps to search for the interface method: 1) analytical method with class, on the other hand, if found in the interface method table class_index index C in a classes instead of interfaces, then directly thrown Java. Lang. IncompatibleClassChangeError anomalies. 2) Otherwise, search interface C to see if there is a method whose simple name and descriptor match the target. If so, return a direct reference to the method, and the search ends. 3) Otherwise, search recursively in the parent interface of interface C until the java.lang.Object class (the search scope of interface methods will also include methods in the Object class) to see if there is a method whose simple name and descriptor match the target. If so, return a direct reference to this method, and the search ends. 4) to rule 3, because the Java interface allows multiple inheritance, if C entities in different parent interface name and descriptors are more simple and target matching method, it will be returned from the multiple methods one end and find that the Java virtual machine specification, and no further rules which should return an interface method. But like the previous field look-ups, Javac compilers implemented by different publishers may refuse to compile this code with stricter constraints to avoid uncertainty. 5) otherwise, declaring the method lookup failure, throw Java. Lang. NoSuchMethodError anomalies.

4.5. Initialization

It is not until the initialization phase that the Java virtual machine actually begins to execute the Java program code written in the class, handing control to the application. It can also be expressed in a more direct way: the initialization phase is the process of executing the class constructor () method. () is not a method that programmers write directly in Java code; it is an automatic artifact of the Javac compiler.

  • () method is collected by * * compiler automatically class assignment for all the class variables and static blocks (static {} block) of the statement in merger, the compiler collection order is decided by the order of the statement in the source file, static block can only access to the definition in the static block variables before * *, after its variables, The previous static block can be assigned, but not accessed, as shown in Listing 7-5.

Listing 7-5 illegal forward reference variables

public class Test {
    static {
        i = 0;  // Copy variables to compile correctly
        System.out.print(i);  // The compiler will say "illegal forward reference"
    }
    static int i = 1;
}
Copy the code
  • Unlike class constructors (that is, instance constructor () methods in the virtual machine perspective), the () method does not require an explicit call to the superclass constructor, and the Java virtual machine guarantees that the () method of the superclass is executed before the () method of the subclass is executed. So the first () method to be executed in the Java virtual machine must be of type java.lang.object.

  • The () method is not required for a class or interface. If a class has no static block and no assignment to a variable, the compiler may not make the () method for that class.

  • Static blocks cannot be used in interfaces, but there are still assignment operations that initialize variables, so interfaces generate () methods just like classes. But unlike classes, the () method of an interface does not need to execute the () method of the parent interface first, because the parent interface is initialized only when a variable defined in the parent interface is used. In addition, the implementation class of the interface does not execute the interface’s () method when initialized.

  • The Java virtual machine must ensure that a class’s () methods are locked and synchronized correctly in a multithreaded environment. If multiple threads initialize a class at the same time, only one of them will execute the class’s () methods, and all the other threads will block and wait until the active thread finishes executing the () method. A long operation in a class’s () method can cause multiple process blocks, which are often hidden in practice. Listing 7-7 illustrates this scenario.

Listing 7-7 field parsing

static class DeadLoopClass {
    static {
        If Initializer does not complete normally, the compiler will say "Initializer does not complete normally".And refuse to compileif (true) {
            System.out.println(Thread.currentThread() + "init DeadLoopClass");
            while (true) {}}}}public static void main(String[] args) {
    Runnable script = new Runnable() {
        public void run(a) {
            System.out.println(Thread.currentThread() + "start");
            DeadLoopClass dlc = new DeadLoopClass();
            System.out.println(Thread.currentThread() + " run over"); }}; Thread thread1 =new Thread(script);
    Thread thread2 = new Thread(script);
    thread1.start();
    thread2.start();
}
Copy the code

Class loaders

The Java Virtual Machine design team intentionally implemented the class-loading action of “getting the binary stream describing a class by its fully qualified name” outside the Java Virtual machine so that the application could decide how to get the required classes. The code that does this is called a Class Loader.

Classes and classloaders

For any class, the uniqueness of the Java virtual machine must be established by both the classloader that loads it and the class itself. Each classloader has a separate class namespace.

2. Parental delegation model

From the perspective of Java virtual machines, there are only two different class loaders: one is the Bootstrap ClassLoader, which is implemented in C++ and is part of the virtual machine itself; The other is all the other class loaders **, which are implemented in the Java language, exist independently of the virtual machine, and all inherit from the abstract java.lang.classloader class.

For Java applications of this era, most Java programs were loaded using class loaders provided by the following three systems.

Bootstrap Class Loader

As described earlier, this class loader is responsible for loading files that are stored in <JAVA_HOME>\lib, or in a path specified by the -xbootclasspath parameter, and are recognized by the Java virtual machine (by filename, e.g. Rt.jar, tools.jar, Libraries with incorrect names will not be loaded even if placed in the lib directory.) The libraries are loaded into the virtual machine memory. The boot class loader cannot be directly referenced by Java programs. If you need to delegate the loading request to the boot class loader when writing a custom class loader, you can use NULL instead. Listing 7-9 is Java. Lang. This. GetClassLoader () method code snippets, including comments and code implementation are clearly illustrates the null values to represent the bootstrap class loader.

Extension Class Loader

The classloader is implemented as Java code in the sun.misc.Launcher$ExtClassLoader class. It is responsible for loading all libraries in the <JAVA_HOME>\lib\ext directory, or in the path specified by the java.ext.dirs system variable. The name “extension Classloader” suggests that this is a mechanism for extending Java system libraries. The JDK development team allows users to place general-purpose libraries in ext directories to extend Java SE functionality.

Application Class Loader

The classloader is implemented by sun.misc.Launcher$AppClassLoader. Because the application ClassLoader is the return value of the getSystem-classLoader () method in the ClassLoader class, it is also called the “system ClassLoader” in some cases. It is responsible for loading all libraries on the user’s ClassPath, and developers can use the class loader directly in their code. If the application does not have its own custom class loader, this is generally the default class loader in the application.

Figure 7-2 Class loader parent delegate model

One of the obvious benefits of using the parent-delegate model to organize relationships between class loaders is that classes in Java have a hierarchy of priorities along with their classloaders. For example, the java.lang.Object class, which is stored in rt.jar, is delegated to the boot class loader at the top of the model by any class loader. Therefore, the Object class is guaranteed to be the same class in each class loader environment of the program.

Parental delegation is a good way to solve the problem of consistency of base types when class loaders cooperate (the more basic classes are loaded by the higher loaders)

3. Break the parental delegation model

Each program module (called a Bundle in OSGi) has its own class loader. When a Bundle needs to be replaced, the Bundle is replaced with the same kind of loader to achieve the hot replacement of code.

In an OSGi environment, class loaders no longer delegate the tree structure recommended by the parent model, but evolve into a more complex network structure. When receiving a class load request, OSGi will perform a class search in the following order: 1) Delegate classes starting with Java.* to the parent class loader. 2) Otherwise, delegate the classes in the list to the parent class loader. 3) Otherwise, delegate the classes from the Import list to the class loader of the Export class Bundle. 4) Otherwise, find the current Bundle’s ClassPath and load it using your own class loader. 5) Otherwise, check whether the class is in the Fragment Bundle. If so, delegate the load to the class loader of the Fragment Bundle. 6) Otherwise, find the Bundle from the Dynamic Import list and delegate the load to the corresponding Bundle’s class loader. 7) Otherwise, class lookup fails. Only the first two points of the lookup order above still conform to the principles of the parent delegate model; the rest of the class lookup is done in the flat classloader.

Fully understand the implementation of OSGi, even the essentials of class loaders.