This article was published on the official wechat account BaronTalk

In the last article we introduced class file structure. In this article we will look at how virtual machines load classes.

After our source code is compiled into bytecodes by the compiler, it ultimately needs to be loaded into the virtual machine before it can run. The virtual machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by virtual machines. This is the Class loading mechanism of virtual machines.

Unlike languages where wiring is done at compile time, classes in Java are loaded, wired, and initialized at runtime. This strategy adds some performance overhead to class loading, but it gives Java applications a high degree of flexibility. Java’s inherently dynamically extensible language features rely on dynamic loading and dynamic wiring at run time.

For example, an interface-oriented application can wait until runtime to specify the actual implementation class; Users can use Java’s predefined and custom classloaders to have a native application run to load a binary stream from the network or elsewhere as part of the program code.

Class loading timing

The entire life cycle of a class from being loaded to uninstalled by a VIRTUAL machine includes: 2, Loading, Verification, Preparation, Resolution, Initialization, Using and Unloading are two phases. The three parts of validation, preparation and parsing are collectively referred to as Linking. The sequence of these 7 stages is as follows:

The sequence of loading, validation, preparation, initialization, and unloading phases is defined in the figure above. The loading process of a class must begin in this sequence. It can be started after the initialization phase in some cases to support Java’s dynamic binding.

There is no constraint in the virtual machine specification on when to start the first node “load” of the class loading process. However, in the “initialization” phase, the following five conditions are strictly specified for the VM. If the class is not initialized, it must be initialized immediately (loading, verification, and preparation naturally need to start before this stage) :

  1. Four bytecode instructions, new, getstatic, putstatic, or Invokestatic, are encountered;
  2. When a java.lang.Reflect method is used to make a reflection call to a class;
  3. When initializing a class, if the parent class is not initialized, the initialization of the parent class must be triggered first.
  4. When a VM starts, the user needs to specify a main class to execute. The VM initializes this class first.
  5. When using JDK 1.7 dynamic language support, if a Java lang. Invoke. The final analytical results REF_getStatic MethodHandle instance, REF_putStatic, REF_invokeStatic method handles, And the class to which the method handle corresponds is not initialized.

There are five “have and only” scenarios that trigger class initialization, and the behavior in these five scenarios is called an active reference to a class. In addition, all ways of referring to a class do not trigger initialization, called passive references. For example, the following scenarios are passive references:

  1. Reference by subclass to a static field of the parent class does not result in subclass initialization;
  2. Referencing a class through an array definition does not trigger initialization of the class;
  3. Constants are stored in the constant pool of the calling class at compile time. They are not directly referenced to the class that defines the constant, so they do not trigger initialization of the class that defines the constant.

Class loading process

loading

By “load” I mean a phase of the “class loading” process. During the load phase, the virtual machine needs to do the following three things:

  1. Get the binary byte stream that defines a class by its fully qualified name;
  2. Convert the static storage structure represented by this byte stream to the runtime data structure of the method area;
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area.

validation

Validation is the first step in the connection phase, which ensures that the byte stream in the Class file meets the requirements of the current virtual machine and does not compromise the security of the virtual machine itself. During the verification phase, the following four stages will be roughly completed:

  1. File format validation: The first step is to verify that the byte stream complies with the Class file format specification and can be processed by the current version of the VIRTUAL machine. The verification points mainly include: whether to start with magic number 0xCAFEBABE; Check whether the major and minor versions are within the processing range of the current VM. Whether the constants in the constant pool have unsupported constant types; Whether the parts of the Class file and the file itself have been deleted or additional information, etc.

  2. Metadata verification: The second stage is semantic analysis of the information described by bytecode to ensure that the information described conforms to the requirements of Java language specifications. Verification points in this stage include: Whether the class has a parent class; Whether the parent of this class inherits classes that are not allowed to be inherited; If the class is not abstract, whether it implements all methods required by its parent class or interface; Class fields, methods that conflict with the parent class, and so on.

  3. Bytecode verification: The third stage is the most complex stage in the whole verification process, mainly to determine the program semantics is legal and logical through data flow and control flow analysis.

  4. Symbolic reference validation: The final stage of validation occurs when the virtual machine converts symbolic references to direct references. This transformation takes place during the parse phase, the third stage of the connection. Symbolic reference validation can be seen as checking the matching of images outside of the class itself (various symbolic references in the constant pool).

To prepare

The preparation phase is the formal allocation of memory for class variables and the setting of their initial values. The memory used by these variables will be allocated in the method area. There are two confusing concepts that need to be emphasized at this stage:

  • First, only class variables (static modified variables) are allocated, not instance variables, which are allocated in the Java heap along with the object when it is instantiated.

  • Second, the initial value referred to here is “normally” the zero value of the data type. Public static int value = 123; The initial value of the value variable after the preparation phase is 0 instead of 123, because no Java methods have been executed at this point. The putStatic instruction that assigns value to 123 is stored in the class constructor () method after the program is compiled. So assigning value to 123 will only be performed during initialization.

If a class field has a ConstantsValue attribute in the field attribute table, the variable value will be initialized to the value indicated by the ConstantValue attribute in the preparation phase. Public static final int value = 123; At compile time, JavaC will generate ConstantValue for value. During the preparation phase, the VM will assign value to 123 based on ConstantValue Settings.

parsing

The parsing phase is the process by which the virtual machine replaces symbolic references in the constant pool with direct references. We’ve talked a lot about symbolic references and direct references, but what exactly are symbolic references and direct references?

  • Symbolic Reference: A Symbolic Reference is a set of symbols that describe the object being referenced. The symbol can be literal in any form, as long as it is used to unambiguously locate the object.

  • Direct Reference: A Direct Reference can be a pointer to a target, a relative offset, or a handle that can be indirectly located to the target.

Initialize the

The class initialization stage is the last step in the class loading process. In the previous class loading process, except for the user application can participate in the loading stage through the custom class loader, the rest of the action is completely dominated and controlled by the virtual machine. In the initialization phase, you actually start executing the Java program code defined in the class. The initial phase is the process of executing the class constructor () method.

Class loader

The virtual machine design team implemented the class-loading action of “getting the binary stream describing a class by its fully qualified name” outside the Java virtual machine so that the application could decide how to get the classes it needed. The code module that implements this action is called a class loader.

Classes and class loaders

For any class, the uniqueness of the Java virtual machine needs to be established by both the classloader that loads it and the class itself, and each classloader has a separate class namespace. In other words, comparing two classes to be “equal” only makes sense if they are loaded by the same Class loader. Otherwise, even if two classes are from the same Class file and loaded by the same virtual machine, as long as they are loaded by different Class loaders, the two classes must be different.

Parental delegation model

From the perspective of Java virtual machines, there are only two different class loaders: one is the Bootstrap ClassLoader, which is implemented in C++ and is part of the virtual machine itself; The other is all the other class loaders, which are implemented in Java, independent of the virtual machine, and all inherit from the abstract java.lang.classloader class.

From a Java developer’s perspective, class loaders can be divided into:

  • Bootstrap ClassLoader: this ClassLoader is responsible for loading the libraries stored in the

    \lib directory into the virtual machine memory. The startup class loader cannot be directly referenced by Java programs. If you need to delegate the loading request to the startup class loader when writing a custom class loader, you can use NULL instead.

  • Extension ClassLoader: The classloader, implemented by sun.misc.Launcher$ExtClassLoader, is responsible for loading all libraries in the

    \lib\ext directory, or in the path specified by the java.ext.dirs system variable. Developers can use the extended class loader directly;

  • Application ClassLoader: This ClassLoader is implemented by sun.misc.Launcher$app-classloader. The getSystemClassLoader() method returns this class loader, and is therefore also known as the system class loader. It is responsible for loading the libraries specified on the user’s ClassPath. Developers can use this class loader directly, and if the application does not have its own custom class loader, this is usually the default class loader in the application.

All of our applications are loaded by these three types of loaders in conjunction with each other, and we can also define our own class loaders if necessary. Their relationship is shown below:

This hierarchy, shown in the figure above, is called the Parents Delegation Model for the classloader. The parent delegate model requires that all class loaders have their own parent class loaders, except for the top-level start class loaders.

The parental delegation model works like this: If a classloader receives a classload request, it does not try to load the class itself at first. Instead, it delegates the request to the parent classloader. This is true at every level of classloaders, so all load requests should eventually be passed to the top level of the starting classloader. Only when the parent class loader reports that it cannot complete the class loading request (it did not find the desired class in its search scope) will the child loader attempt to load it itself.

The advantage of this is that Java classes have a hierarchy of priorities along with their classloaders. For example, java.lang.Object, which is stored in rt.jar, is delegated to the bootstrap class loader at the top of the model by whichever class loader loads the class, so Object is the same class in each class loader environment of the program. Instead of using the parent delegate model and loading by class loaders, if a user writes a class called java.lang.Object and places it in the program’s ClassPath, multiple Object classes will appear. The most basic behavior in the Java type system is not guaranteed.

The parent delegate model is very important to ensure the stability of Java programs, but its implementation is simple. The code to implement the parent delegate model is centralized in the loadClass() method of java.lang.classLoader, and the logic is clear: If not, the parent loader’s loadClass() method is called. If the parent loader is empty, the startup class loader is used as the parent loader by default. If the parent class fails to load, throw a ClassNotFoundException and call your own findClass() method to load it.

protectedClass<? > loadClass(String name,boolean resolve)
        throws ClassNotFoundException {
    // First, check to see if the requested class is already loadedClass<? > c = findLoadedClass(name);if (c == null) {
        try {
            if(parent ! =null) {
                c = parent.loadClass(name, false);
            } else{ c = findBootstrapClassOrNull(name); }}catch (ClassNotFoundException e) {
            // If the parent throws a ClassNotFoundException, the parent loader cannot complete the loading
        }

        if (c == null) {
            // If the parent class loader cannot be loaded, call your own findClass method for class loadingc = findClass(name); }}if (resolve) {
        resolveClass(c);
    }
    return c;
}
Copy the code

Class file structures and class loading have been covered in two consecutive articles, the next of which is “bytecode execution engines for virtual machines.”

References:

  • Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 2)

If you like my articles, follow my public account BaronTalk, my zhihu column or add a Star on GitHub.

  • Wechat official account: BaronTalk
  • Zhihu column: zhuanlan.zhihu.com/baron
  • GitHub:github.com/BaronZ88
  • Personal blog: Baronzhang.com