The conversion from native machine code to bytecode as a result of code compilation is a small step in the development of storage formats, but a giant leap in the development of programming languages.

An overview of the

A virtual machine loads data describing a Class from a Class file into memory, verifies, converts, and initializes the data, and finally forms Java types that can be used by virtual machines. This is the Class loading mechanism of virtual machines.

The timing of class loading

The whole life cycle of class loading includes seven processes: loading, verification, preparation, parsing, initialization, use and unloading, among which verification, preparation and parsing are collectively referred to as connection. The virtual machine does not enforce when classes are loaded, but for the initialization phase, the virtual machine specification rigorously states that classes must be initialized immediately in only five cases (loading, validating, preparing, and initializing must naturally occur before initialization) :

  1. When the bytecode instructions new, getStatic, putStatic, and Invokestatic are encountered, initialization needs to be triggered if the class has not already been initialized (initialization naturally occurs when the class is loaded). The most common scenarios for these four instructions are when you instantiate an object using the new keyword, when you get or set a static field of a class (except for those modified by final), and when you use a static method of a class.
  2. When a reflection call is made to a class using the java.lang.Reflect package’s methods, initialization is triggered if the class has not already been initialized.
  3. When initializing a subclass, if the parent class is not initialized, the initialization of the parent class is triggered first.
  4. When a VM starts and the user needs to specify a main class (the class with the main method) to execute, the VM initializes this class first.
  5. When using JDK1.7 dynamic language support, if a Java lang. Invoke. The final analytical results REF_getStatic MethodHandle instance, REF_putStatic, REF_invokeStatic method handles, If the class to which the method handle corresponds has not been initialized, it needs to be initialized first.

The process of class loading

loading

During class loading, the VM needs to complete the following three events:

  1. Gets the binary byte stream that defines a class by its fully qualified name
  2. Converts the static storage structure represented by the byte stream into the runtime data structure of the method area
  3. Generate a java.lang.Class object representing the Class in the method area as an access point for the various data of the Class in the method area

The loading of a non-array class can be done either with the bootstrap class loader provided by the virtual machine or with a custom class loader (that is, overriding the loadClass method of a class loader). The situation is different for array classes, which themselves are not created by the classloader, but by the virtual machine itself.

validation

Validation is the first step in the connection phase to ensure that the byte stream of the Class file contains the information required by the current VIRTUAL machine and that the input byte stream is properly parsed and stored in the method area. The verification stage mainly includes the following four stages: file format verification, metadata verification, bytecode verification, symbol reference verification.

  • File format validation The first phase verifies that the byte stream complies with the Class file format specification and can be currently processed by the virtual machine. Main verification points in this stage:
  1. Does it start with a magic number
  2. Check whether the version is within the range of processing by the local VM
  3. Whether various indexes that point to constants point to constants that do not exist
  4. .
  • The second stage of metadata verification is mainly semantic analysis of metadata information of the class to ensure that there is no metadata that does not conform to Java language specifications. Verification points include:
  1. Whether this class has a parent class
  2. Does this class inherit from classes that are not allowed to inherit (classes modified by final)?
  3. If the class is not abstract, whether it implements the methods required by its parent class or the inherited interface
  4. .
  • The third stage of bytecode validation is the most complex stage in the validation process. The main purpose is to analyze the semantics of the program through data flow and control flow. This phase verifies the method body of the class to ensure that the verified method does not harm the VIRTUAL machine when running:
  1. Ensures that jump instructions do not jump to bytecode instructions outside the method body
  2. Ensure that the type conversion in the method body is correct
  3. .
  • The final phase of symbolic reference validation occurs when the virtual machine converts symbolic references to direct references, and this conversion action occurs in the third phase of the connection, parsing. Symbol reference can be regarded as the verification of various symbol references in the constant pool. The verification points are as follows:
  1. Whether a class or interface is found for a fully qualified name described by a string in a symbol reference
  2. The accessibility of classes, fields, and methods in symbolic references is accessible to the current class
  3. .

To prepare

The preparation phase is the phase that formally allocates memory and sets initial values for class variables (variables modified static, excluding instance variables), which are allocated in the method area. Also, the initial value here is usually a zero value of the data type.

public static int value = 123
Copy the code

The initial value of the value variable after the preparation phase is 0, not 123, because no Java methods have been executed. The putStatic instruction that assigns value to 123 is compiled and stored in the class constructor method. So the assignment of value to 123 is performed during initialization. As mentioned above, this is usually a zero value for the data type, but there are special cases where the class variable is initialized to the specified value during the preparation phase if it is final.

public static final int value = 123;
Copy the code

In the preparation phase, the value of value is assigned to 123.

parsing

The parsing phase is the process by which the virtual machine replaces symbolic references in the constant pool with direct references. The virtual machine specification does not specify when the parsing phase takes place, Anewarray, checkcast, getField, getstatic, Instanceof, InvokeDynamic, InvokeInterface, Invokespecial, Invokeestatic, inV are only required Olevirtual, LDC, LDC_W, Multianewarray, New, PUTStatic, and PUTField, the 16 bytecode instructions used to manipulate symbolic references, parse the symbolic references they use **. So the virtual machine can decide whether to parse symbolic references as soon as the class is loaded by the loader or before they are used. The parse action is mainly resolved for class or interface, field, class method, interface method, method type, method handle, and call point qualifier 7 class symbol references.

  • If you want to resolve a symbolic reference N that has never been resolved to a direct reference to a class or interface C, the virtual machine will complete the parsing process in the following three steps:
  1. If C is not an array type, the virtual machine will pass the full class name of the symbol reference N to D’s classloader to load C. In the loading process, due to the need for verification, it may trigger the loading of other classes. Once there is an error in the loading process, the parsing process directly fails.
  2. If C is an array type and the array elements are object types, the descriptor for N will be of the form [Ljava/lang/Integer. The array element types will be loaded as described in point 1, and the virtual machine will then generate an array object representing the dimensions and elements of the array
  3. If the previous steps are there are no errors, symbolic reference is needed before parsing is complete, confirm whether have access to the C, D if there is no access to C, and D throw Java. Lang. IllegalAccessEroor anomalies.
  • Field resolution Resolves symbolic references to an unparsed field. The CONSTANT_Class_info symbol reference of the class_index entry index in the field table is resolved first, that is, the symbol reference of the class or interface to which the field belongs. If an exception occurs in resolving symbolic references to this class or interface, field resolution will fail. If the class or interface is successfully parsed, the class or interface to which the field belongs is represented by C, followed by a subsequent field search on C:
  1. If C itself contains a field with the same simple name and field descriptor as the target field, a direct reference to that field is returned, and the lookup ends
  2. Otherwise, if interfaces are implemented in C, each interface and its parent interface will be recursively searched from bottom to top based on inheritance, and then follow Step 1
  3. Otherwise, if C is not an object class, recursively search for its parent class from bottom up by inheritance, and then follow Step 1
  4. Otherwise, find failure, throw the Java. Lang. NoSuchFieldError anomalies.
  • Class method parsing The first step of class method parsing is the same as field parsing. The first step is to parse the symbolic reference of the class or interface of the claaa_index index of the class method table. If the parsing succeeds, the class is represented by C.
  1. Class C looks for methods with simple names and descriptors that match the target. If there is a direct reference to the method, the search ends
  2. Otherwise we recursively look in the parent class of class C
  3. Otherwise, look in an interface or parent interface of class C
  4. Otherwise lookup failure, throw the Java. Lang. NoSuchMethodError anomalies.
  • Interface method parsing Interface method parsing is similar to class method parsing in that there is no redundancy.

Initialize the

The initialization phase is the last step in the class loading process. In the previous class loading process, except that the loader can be automatically defined to participate in the loading process of the class, the rest of the actions are completely dominated and controlled by the virtual machine. During the initialization phase, you actually start executing the Java code defined in the class. In the preparation phase, variables have been assigned the zero value required by the system, while in the initialization phase, class variables and other resources are initialized according to a subjective plan made by the programmer through the program. Or the initialization phase is the process of executing the class constructor methods.

  1. Methods are generated by combining all the statements in the static statement block (static{}) with the copy operation that the compiler automatically collects all the class variables in the class. A static statement block can only access variables defined before the static statement, and variables defined after it can only be assigned values in the static statement block.
  2. Methods, unlike class constructors, do not explicitly call the superclass constructor, and the virtual machine guarantees that the superclass’s methods will be executed before the subclass’s methods.
  3. Methods are not necessary. If a class has no static blocks and no assignment of variables, the compiler may not generate methods for that class.
  4. The virtual machine ensures that a class’s methods are locked and synchronized correctly in a multithreaded environment. If multiple threads initialize a class at the same time, only one thread will execute the method.

Classes and class loaders

The virtual machine design team implemented the class-loading action of getting the binary stream of a class by its fully qualified name outside the Java Virtual machine so that the developer could decide how to get the required class. The code module that implemented this action was called the classloader. For any class, the class loader that loads it and the class itself are required to determine the uniqueness of the virtual machine in which it resides. In layman’s terms, comparing two classes for equality only makes sense if they have the same classloader. Otherwise, even if they come from the same Class file and are loaded by the same virtual machine, they cannot be equal as long as they have different classloaders.

Parental delegation model

From the virtual machine perspective, there are only two different class loaders: the boot class loader, which is part of the virtual machine; The other is other class loaders that are independent of the virtual machine and all inherit from the abstract java.lang.classloader. From a developer’s perspective, most Java programs use one of three system-provided classloaders:

  • This class is responsible for loading libraries in the

    \lib directory into the virtual machine memory that are recognized by the virtual machine (libraries with incorrect names will not be loaded even if they are placed in the lib directory). The startup class loader cannot be referenced directly by a Java program.

  • The extension class loader is responsible for loading all libraries in the

    \lib\ext directory. Developers can use the extension class loader directly

  • The reference class loader is responsible for loading libraries specified in the user’s ClassPath and can be used directly by the developer. If the program does not have its own custom class loader, this is generally the program’s default class loader.


  • The workflow of the parental delegation model: If a classloader receives a classloading request, it does not load the class itself. Instead, it delegates the request to its parent classloader to load it. This is true at every level of classloaders, so all classloading requests end up at the top level of the starting classloader. Only when the parent class loader does not find the required class in its search scope does the child loader attempt to load the request itself. The advantage of using parental delegates is that Java classes have a hierarchical relationship with priority along with their classloaders.