preface

The virtual machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by virtual machines. This is the Class loading mechanism of virtual machines.

  • The flow of class loading

    The whole life cycle of a class from being loaded into vm memory to being uninstalled from memory includes loading, verification, preparation, parsing, initialization, use, and uninstallation. The three parts of verification, preparation, and parsing are collectively referred to as connections. The sequence of the seven stages is shown in Figure 1-1.

Above, load, validation, preparation, initialization and unload the five phases is fixed, the order of the class loading process must be in accordance with the order step by step, but may not necessarily: parsing stage he is in no case can start again after the initialization phase again, this is to support the Java language runtime binding. The above phases are usually intermixed with each other, usually invoking and activating another phase (such as initializing another class within a class) while one phase is executing.

  • The timing of class loading

    When do you need to start the first phase of the class loading process: loading? The Java Virtual Machine specification does not enforce constraints, leaving it up to the implementation of the virtual machine. For the initialization phase, however, the virtual machine specification strictly states that there are only five cases in which classes must be initialized immediately (loading, validation, and preparation naturally need to begin before then).

    • encounterNew, getstatic, putstatic, invokestaticThese four bytecode instructions need to trigger initialization if the class is not fossilized. The scenario in which these four instructions are generated is: UsenewWhen an object is instantiated, when static variables of or of a class are read (except for static fields that are final and whose results have been put into the constant pool at compile time), and when static methods of a class are called.
    • usejava.lang.reflectWhen a package’s method makes a reflection call to a class that has not been initialized, it needs to trigger its initialization first
    • When initializing a class, if its parent class has not been initialized, the initialization of its parent class must be triggered first.
    • When the virtual machine starts, the user needs to specify a primary class (the one containing the main() method) to execute, and the virtual machine initializes this primary class.
    • When using JDK1.7’s dynamic language support, if ajava.lang.invoke.MethodHandleThe final parsing result of the instanceREF_getStatic, REF_putStatic, REF_invokeStaticIf the class to which the method handle corresponds has not been initialized, initialization needs to be triggered first.

    For more than 5 kinds of initialized will trigger the class of the scene, the virtual machine in the specification used a very strong now is due to: the one and only, the five kinds of behavior is called for a class of scenarios for reference actively, but beyond that, all the way of the class will not trigger the initialization, known as passive references, such as the following examples:

public class Parent {
    public static int a = 1;
    static {
        System.out.println("Parent init");
    }
}
public class Son extends Parent{
    static {
        System.out.println("Son init");
    }
}
   public static void main(String[] args) {
        System.out.println("args = [" + Son.a + "]"); } Parent init args = [1]Copy the code

For a static field, only fields directly define the class will be initialized, so through its subclasses to refer to the parent class defined in a static field, will trigger a parent class initialization and will not trigger a subclass initialization, as to whether to trigger a subclass loading and validation, did not make clear a regulation in the virtual machine specification, this depends on the realization of a virtual machine, For the Sun HotSpot virtual machine, you can observe that this operation results in subclass loading with the -xx :_TraceClassLoading parameter.

In addition, referencing a class through an array definition does not trigger initialization of the class.

   public static void main(String[] args) {
        Parent[] parentArry = new Parent[10];
    }
Copy the code

There is no output after running the above code, indicating that the initialization phase of the Parent class is not triggered. This code triggers the initialization of another class named [lxxx.xxx. Parent]. If this looks familiar, you can see from the bytecode article that [L] represents an array of objects. It is a virtual machine generated class that directly inherits Object and is triggered by the bytecode instruction Newarray. This class represents a one-dimensional array of element type Parent, in which all properties and methods (the only methods that can be called directly by the user are Length and Clone) are implemented. In the Java language, when the check to an array can throw ArrayIndexOutOfBoundsException is unusual, but the anomaly detection is not encapsulated in the array element to access the class, but is encapsulated in array access xaload, xastore bytecode instructions.

Initialization of a class is not triggered when a static, final-modified constant of a class is referenced

public class Parent {
    public static final int a = 1;
    static {
        System.out.println("Parent init");
    }
}
  public static void main(String[] args) {
        System.out.println("args = [" + Son.a + "]"); } Args = [1]Copy the code

Because as an immutable when final modified constant value, so the compilation phase will spread through constant optimization, the constant value of 1 store to the main class (in class) of the main method of constant pool, so after the main class of the constant 1 in reference to the actual were into the main class of its own constant pool of reference, that is to say, There is no symbolic reference to the Parent Class in the main Class file, and the two classes are not related after the exception Class.

The shelves of the interface process with the class loading process is slightly different, in view of the interface need to do some special instructions: interface also has the initialization process, this and the class is consistent, but in the interface can’t use static {} block, but the interface compiler will still generate < client > class constructor, is used to initialize interface defined in the member variables. Interfaces differ from class regularization in the third type of initialization scenario described earlier: when a class is initialized, all of its parents are required to have been initialized. However, when an interface is initialized, it is not required that all of its negative excuses have been initialized. It is initialized only when the negative excuses are actually used (such as referencing constants defined in the interface).

  • Class loading steps

Next, we will take a look at the whole process of class loading, that is, load, verify, prepare, parse, initialize these five phases of the lock to perform the specific action.

  • loading

Load is a phase of the class loading process, during which the virtual machine does three things

  • 1. Get the binary byte stream that defines a class by its fully qualified name.
  • 2. Convert the static storage structure represented by this byte stream to the runtime data structure of the method area.
  • 3. Generate a Class object representing this Class in memory, which acts as the access point for various data of this Class.

The load phase does not specify where the loaded content comes from, because it loads the fully qualified name of a class to get the binary byte stream that defines it. Therefore, the VIRTUAL machine is not defined at all from where to obtain, how to obtain, but common ways to obtain the following:

  • From the ZIP package, which is the usual JAR,EAR,WAR
  • The most typical scenario application for obtaining data from the network is an Applet
  • Runtime computing generation, mainly used in dynamic proxy technology, injava.lang.reflect.ProxyIs used inProxyGenerator.gengrateProxyClassTo generate the form for a particular interface*$ProxyThe binary byte stream of the proxy class
  • Generated by other files, such as JSP files that generate the corresponding Class Class
  • Read from the database, for example some middleware servers may choose to install programs into the database to complete the distribution of program code between clusters. .

To other phase of class loading process, an array of loading phase (phase accurately, is loaded in class for binary byte stream action) is the strongest, developers controllable because loading stage can use the system to provide the bootstrap class loader to complete, can also be done by user-defined class loaders to (for example the bytecode encryption, Developers can control how byte streams are retrieved by defining their own class loaders.

But the array class is not created by the classloader, it is created directly by the Java virtual machine. However, the relationship between data types and the class loader is still very close, because the element types of an array class are ultimately created by the class loader. The creation of an array class follows the following rules:

  • 1. If the array type is a reference type, the component type needs to be loaded and identified in the class namespace of the class loader that loaded the component type, as described in a later class loader.
  • 2. If the array type is the underlying data type, the Java virtual machine marks the array as associated with the boot class loader.
  • 3. The visibility of an array class is consistent with the visibility of its component type. If the component type is not a reference type, the visibility of the array defaults to public.

After the loading phase is complete, the virtual machine stores the external binary byte streams in the virtual machine’s desired format in the method area. The virtual machine defines the data store format in the method area, and then instantiates an object of Class Class in memory (not explicitly in the Java heap). In the case of the HotSpot VIRTUAL machine, the mClass object is a special object, but it is stored in the method area. This object acts as an external interface for the application to access these types of data in the method area.

Parts of the load phase and the subsequent connect phase intersect. The connect phase may have started before the load phase is complete, but the actions sandwiched between the load phase are still the connect phase.

  • validation

    Validation is the first part of the connection phase. The purpose of this step is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current virtual machine and does not compromise the security of the virtual machine. A virtual machine that does not check the input byte stream and trusts it completely can crash the system by loading harmful byte streams, so validation is an important part of the virtual machine’s efforts to protect itself. According to the Java Virtual Machine Specification (JSE 7 Edition) released in 2011, the following four stages of verification will be roughly completed in the development stage: file format verification, metadata verification, bytecode verification and symbol reference verification.

  • 1. File format verification Verifies that the byte stream complies with the Class file format specification and can be processed by the current version of the VM. This may include the following verification points:

    • Whether it starts with the magic number 0xCAFEBABY.
    • Check whether the major and minor versions are within the processing range of the current VM.
    • If there are unsupported constant types in the constant pool (check the constant’s tag).
    • Whether various index values that point to constants point to nonexistent constants or nonconforming constants.
    • CONSTANT_Utf8_info whether there is data in a constant that does not conform to UTF8 encoding.
    • Whether any other information has been deleted or added to parts of the Class file and the file itself. .

The above are just a few points of validation to ensure that the package will parse correctly in the input byte stream and that the format conforms to the data requirements of a Java type. Only through this stage of the process, the byte stream will enter the method area of memory for storage, the following three verification stages are all based on the method to obtain the storage structure, will not directly operate the byte stream.

  • 2. Metadata verification The second step is to conduct semantic analysis of the information described by bytecode to ensure that the information described conforms to the requirements of Java language specifications. Verification points included in this stage are as follows:

    • Whether this class has a parent (all classes except Object should have a parent).
    • Whether the parent of this class inherits classes that are not allowed to inherit (classes modified by final).
    • If the class is not abstract, whether it implements all the methods required by its parent or interface.
    • Whether a field or method ina class conflicts with the parent class (for example, a final field that overrides the parent class). .
  • 3. Bytecode verification This is the most complex stage in the verification process. The main purpose is to determine that the program semantics are legitimate and logical through data flow and control flow analysis. After the metadata is submitted, the method body of the class will be verified and analyzed to ensure that the methods of the verified class will not cause events that harm VM security during running, for example:

    • The data types of the operand stack and the sequence of instructions work together. For example, the operand stack does not store int data and load it with long.
    • Ensures that a goTO does not jump to a bytecode instruction outside the method body.
    • Ensures that the type conversion in the method body is valid. .

    If the body of a class method fails verification, there is definitely something wrong, but verification is not always completely safe, that is, the logic of the program can’t be absolutely accurate.

    In order to avoid too much time spent in the bytecode verification phase, the virtual machine design team implemented an optimization in Javac virtual machine after JDK1.6 by adding an attribute named StackMapTable to the property table of the Code attribute of the method body. This attribute describes the method at the beginning of all the basic deficient in local variables and the operand stack due status, bytecode verifier, there is no need for the legitimacy of the state is derived according to the program, you just need to check whether StackMapTable attribute the records in the legal, that converts bytecode verification of type inference type checking, To save some time.

  • 4. Symbol reference verification

    Final stage check occurs at the virtual machine to translate into direct reference symbols referenced, the conversion of action will occur in the connection of the third stage analytic, symbolic reference validation can be seen as outside the class itself (reference symbols) in constant pool information matching check, also need to check the following contents:

    • Whether a class can be found for a fully qualified name described by a string in a symbol reference.
    • Whether a field descriptor that matches a method exists in the specified class and the methods and fields described by the simple name.
    • Whether the access type of a class, field, or method in a symbolic reference is accessible to the current class. . The purpose of symbolic reference validation is to ensure that the parse action executes properly, and if it fails to pass the symbolic reference, one is thrownIncompatibleClassChangeErrorA subclass of exceptions, for exampleNoSuchField(Method)Error.

    The validation phase is an important, but not necessary, phase for virtual machines. If your code has been used and validated over and over again, consider using the -xVerify: None parameter to turn off most of the class validation measures during the implementation phase to shorten the class load time.

  • To prepare

    The preparation phase is the phase where memory is formally allocated for class variables and initial values are set. The memory used by these variables is allocated in the method area. Two confusing concepts need to be emphasized at this stage: first, only class variables (static variables) are allocated, not instance variables, which will be allocated in the Java heap along with the object when it is instantiated; Second, the initial value here is zero for the data type in his field case.

  public static  int number= 1;
  public static final int numberFinal= 123;
Copy the code

In the example above, the initial value of number after the preparation phase is 0 instead of 1, because the Java method has not yet been executed. The putStatic instruction that assigns number to 1 is compiled and stored in the class constructor

() method. So the assignment of number to 1 will only take place during initialization.

However, in special cases, if a ConstantValue attribute (modified by final) exists in the field property table of a class field, the variable numberFinal is initialized to the specified value during the preparation phase. The Javac will generate the ConstantValue attribute for numberFinal at compile time, and in preparation the VM will set the ConstantValue to 123 based on the ConstantValue setting.

  • parsing

    The parsing phase is the process by which the virtual machine replaces symbolic references in the constant pool with direct references in the JVM notes: The Java virtual machine constant pool has been mentioned many times. It appears in Class files as constants of the types CONSTANT_Class_info, CONSTANT_Fieldref_info, CONSTANT_Methodref_info, etc. What is the connection between a direct reference and a symbolic reference in the parsing phase?

    SymbolicReferences: SymbolicReferences describe the referenced target as a set of symbols, which can be any literal, as long as they are used to unambiguously locate the target. But the target of the reference is not necessarily already loaded into memory, and in many cases it acts like a placeholder to indicate that it will need to point to such a thing in the future and then be replaced with a direct reference at a later stage. The symbolic references that are accepted by various virtual machines must be consistent and not explicitly defined in the Class file format of the Java Virtual Machine specification because of the literal form of symbolic references.

    SymbolicReferences: A direct reference can be a pointer to a target, a relative offset, or a handle that briefly locates the target. A direct reference is related to the memory layout implemented by the VIRTUAL machine. The translation of a symbolic reference on different virtual machine instances will not be the same. If there is a direct reference, the target of the reference must already exist in memory.

    The virtual machine specification does not specify when the parsing phase occurs, Only required in the execution anewarray, multianewarray, checkcast, getfield, getstatic, instanceof, invoke (dynamic, interfance, special, static, virtual ), LDC, LDC_w, new, putField, putStatic, and other 16 bytecodes are first parsed for symbolic references they use. So the virtual machine implementation can decide whether to resolve symbolic references in the constant pool as soon as the class is loaded by the loader, or to wait until a symbolic reference is ready to be used.

    Besides invokedynamic instructions, virtual machine realized for the first time analytical results for caching, record direct reference in the runtime constant pool, and the constants are identified as parsed state, so as to avoid repeat parsing action, if a symbolic reference resolving success or failure, then subsequent to its reference resolution should also receive a success or let us know.

    For invokedynamic instructions, the encounter of a symbolic reference that has previously been resolved by an InvokeDynamic instruction does not mean that the resolution result is valid for other Invokedynamic instructions. Because the invokeDynamic instruction is intended for dynamic language support, the reference to it is called the dynamic call point qualifier, which means that parsing can’t take place until the program runs to the instruction. In contrast, the rest of the instructions that trigger parsing are static, meaning that parsing can begin immediately after the load phase is complete, before the code has even begun to execute.

    Parse actions are performed for class or interface, field, class method, interface method, method type, method handle, and call point qualifier. The first three are respectively for the constant pool CONSTANT_ (Class, Fieldref, Methodref, InterfaceMethodref) _info.

    1. Class or interface resolution

    Assuming that an unparsed symbolic reference N in class W is resolved into a direct reference to a class or interface O, the virtual machine completes the entire process in three main steps.

    • If O is not an array type, the virtual machine will pass the fully qualified name representing N to W’s classloader to load the class O. In the loading process, due to the requirements of metadata verification and bytecode verification, loading actions of other related classes may be triggered. Once an exception occurs in the loading process, the parsing process will fail.

    • If O is an array type, and an array type for the object (descriptor [Lxxx/XXX), it will be in accordance with the above rules to load an array element types, if N descriptors, such as the front lock hypothesis form, you can load the element type of object, then the array by a representative of the virtual machine to generate a dimension and the elements of the array object.

    • If no exception is found in the preceding two steps, it is actually a valid class or interface in the C virtual machine, but symbolic reference verification is performed before the parsing is complete to verify that W has access to O. If not, an IlleagalAccessError exception is raised.

    2. Field parsing

    The CONSTANT_Class_info symbol reference of the index in the class_index entry in the field table is resolved first, that is, the symbol reference of the class and interface to which the field belongs. In other words, to solve the field, the class must be solved first.

    • After the class is parsed, if the class itself contains a field whose simple name and field descriptor match the target, a direct reference to that field is returned

    • If the class implements an interface, each interface and its parent interface are searched recursively based on inheritance, and if the interface contains a field whose simple name and field descriptor match the target, a direct reference to that field is returned.

    • If the class is not Object, the parent class is recursively searched by inheritance, and if the parent class contains a field whose simple name and field descriptor match the target, a direct reference to that field is returned.

    • If all the previous steps fail, a NoSuchFieldError exception is raised.

    • Similarly, IlleagalAccessError is raised if you do not have access to the returned field reference.

    • If a field with the same name appears in both the class’s interface and its parent, or in multiple interfaces of its parent, the compiler may reject compilation.

    3. Class method analysis

    The first step in class method resolution, like field resolution, is to resolve the class in which the method resides. Then follow these steps for the subsequent class method search.

    • 1) class and interface methods symbolic constant reference types are defined as separate (one is Methodref, one is InterfaceMethodref), if a class method found in the table index is an interface, then throws IncompatibleClassChangeError anomalies.

    • 2) If the class contains a method whose simple name and field descriptor match the target through the first step, return a direct reference to that method.

    • 3) Otherwise, the class’s parent recursively looks for a method whose simple name and field descriptor match the target, and returns a direct reference to that method.

    • 4) Otherwise, the class implementation’s list of interfaces and their parent interface recursively looks for methods with simple names and field descriptors that match the target. If so, the class is an abstract class (if not, the class will find this method), and raises an AbstractMethodError exception.

    • 5) Raise NoSuchMethodError when none of the preceding steps fails.

    • 6) Similarly, if you do not have access to the returned method reference, raise IlleagalAccessError.

    4. Class method parsing

    As usual, interface methods also need to resolve symbolic references to the class or interface to which the indexed method belongs in the interface method table class_info. Then follow the steps below for the subsequent interface method search.

    • 1) with the class method resolution, on the other hand, if the interface methods found in the table of the interface is a classes instead of interfaces, throw IncompatibleClassChangeError anomalies.

    • 2) If the interface contains a method whose simple name and field descriptor match the target through the first step, return a direct reference to that method.

    • 3) Otherwise, the parent interface of the interface is recursively searched up to the Object class to see if it contains a method whose simple name and field descriptor match the target, and then returns a direct reference to that method.

    • 4) Raise NoSuchMethodError when none of the preceding steps fails.

    • 5) Interface methods do not raise IlleagalAccessError because they are public by default and do not have access rights.

  • Initialize the

    Class initialization is the last step in the class loading process. In the previous class loading process, the user application can participate in the loading phase through the custom class loader, and the rest of the action is completely dominated and controlled by the virtual machine. In the initialization phase, you actually start executing the Java program code (or bytecode) defined in the class.

    In the preparation phase, variables have been assigned their required initial values, while in the initialization phase, class variables and other resources are initialized according to the program’s plan, expressed in another way: the initialization phase refers to the process that points to the class constructor

    () method.

    The

    () method is generated by the compiler automatically combining the assignment of all static variables in the cell phone class with statements in the static block. The order in which the statements are collected is determined by the order in which the statements appear in the source file. Variables defined after it can be assigned by the preceding static statement block, but cannot be accessed.

public class Parent {
    static {
        a=2;
        System.out.println("Parent init"+a);
    }
    public static  int a = 1;
}
Copy the code

The above code can assign a value to a in the code block, but it does not work because it will be reassigned to 1 by the following a, and the following class variables cannot be called in the code block, which will display the illeagal Forward Reference error

Unlike the class’s constructor, instance constructor

(), the

() method does not explicitly call its parent class constructor, and the virtual machine guarantees that the parent class’s

() method will complete before the subclass’s

() method executes, that is, The static statement block defined in the parent class is due to the subclass’s variable assignment, so the first class of the < Clinit >() method to be executed in the virtual machine must be Object.



In the following example, the output is 2, because the static assignment of the parent class takes place before that of the child class

public class Parent {
    public static  int a = 1;
    static {
        a=2;
    }
}
public class Son extends Parent{
      public static int b=a;
}
 public static void main(String[] args) {
        System.out.println("args = [" + Son.b + "]");
    }
Copy the code

The

() method is not required, and the compiler may not generate a

() method for a class that has no static blocks and no assignment to class variables.

Static blocks cannot be used in interfaces, but there are still assignment operations for variable initialization, so both interfaces and classes generate < Clinit >() method methods, but unlike classes, the < Clinit >() method of the Magic Heart interface does not need to execute the parent’s < Clinit >() method first. The parent interface is initialized only when a variable defined in the parent interface is used, and the implementation class of the interface does not execute the interface’s < Clinit >() method when initialized.

The virtual machine ensures that a class’s < Clinit >() method is properly locked in a multithreaded environment. No, if multiple threads initialize a class at the same time, only one thread will go back to execute the class’s < Clinit >() method, and all the other threads will have to block and wait. This is how static singleton implementations work.

  • conclusion

    This article is from Inside the Java Virtual Machine for those who are interested.