This is the first day of my participation in Gwen Challenge

This article is excerpted from the third edition of “Deep Understanding of Java Virtual Machine”, written by Zhou Zhiming. The main purpose is to learn the knowledge from classic books and choose the parts that you may be interested in to extract and deepen your impression

The transition from native machine code to bytecode as a result of code compilation is a small step in the development of storage formats, but a giant leap in the development of programming languages.

1. An overview of the

In the previous chapter, we looked at the details of the Class file storage format. Information described in a Class file must eventually be loaded into a virtual machine before it can be run and used. This chapter will explain how to load these Class files and what happens to the information in the Class files when they enter the VIRTUAL machine.

The Java VIRTUAL machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by the VIRTUAL machine. This process is called the virtual machine Class loading mechanism. Unlike languages that need to be wired at compile time, the Java language loads, joins, and initializes types while the program is running. This strategy makes early compilation difficult for the Java language and adds a slight performance overhead to class loading. However, it provides high extensibility and flexibility for Java applications. The language features that Java can naturally be dynamically extended are realized by dynamic loading and dynamic linking at runtime. For example, writing an interface oriented application can wait until run time to specify its actual implementation classes, and a user can have a native application load a binary stream at run time from the network or elsewhere as part of its program code through Java’s preset or custom classloaders. This dynamic assembly method has been widely used in Java programs, from the most basic applets, JSPS to the relatively complex OSGi technology, all rely on the Java language runtime class loading to be born.

In order to avoid possible deviations in language expression, before the formal start of this chapter, the author has set two language conventions:

First, in practice, each Class file has the possibility of representing a Class or interface in the Java language. The direct description of “type” in the following article implies the possibility of Class and interface, and the scenarios that need to describe classes and interfaces separately will be specified by the author.

Second, and in front of the Class for the file format, the provisions of the agreement, the “Class file” of this chapter is not qualities a exists in specific files in the disk, and should be a string of binary byte streams, no matter in what form it, including but not limited to a disk file, network, database, memory or dynamic, etc.

2. Timing of class loading

From the moment a type is loaded into the VM memory to the moment it is uninstalled from the vm memory, its whole life cycle goes through seven stages: loading, verifying, preparing, parsing, initializing, using, and uninstalling. The three parts of verifying, preparing, and parsing are collectively referred to as connections. The sequence of these seven stages is shown below

Diagram, load, validation, preparation, initialization and unload the order of the five stages are correct, the type of loading process must be in this order the beginning of the track, while parsing stage does not necessarily: he can in some cases to start after the initialization phase, this is in order to support the Java language runtime binding features (also called dynamic binding or late binding). Note that I write a step-by-step “start” here, not a step-by-step “proceed” or a step-by-step “finish,” which I emphasize because these phases are often interleaved and mixed, invoking and activating one phase as it is executed

The Java Virtual Machine Specification does not enforce the conditions under which to begin the first phase of the classloading process, “loading,” and leaves this up to the implementation of the virtual machine. But for the initialization phase, the Java Virtual Machine Specification specifies that there are only six classes that must be “initialized” immediately (and loading, validation, and preparation naturally begin before then) :

  1. When you encounter four bytecode instructions — New, getstatic, putstatic, or Invokestatic — if the type has not been initialized, you need to trigger its initialization phase first. Typical Java code scenarios that can generate these four instructions are:
  • When an object is instantiated using the new keyword.
  • Read or set static fields of a type (except static fields that are modified by final and have been put back into the constant pool at compile time)
  • When a static method of a type is called
  1. When a reflection call is made to a type using the java.lang.Reflect package’s methods, initialization needs to be triggered if the type has not already been initialized.
  2. When initializing a class, if the parent class has not been initialized, the initialization of the parent class must be triggered first.
  3. When the virtual machine starts, the user needs to specify a primary class (the one containing the main() method) to execute, and the virtual machine initializes this primary class first.
  4. When using the new dynamic language support in JDK 7, If a Java lang. Invoke. The final analytical results for MethodHandle instance REF_getStatic, REF_putStatic, REF_invokeStatic, REF_newInvokeSpecial handle four kinds of methods, And the class corresponding to this method handle has not been initialized, so it needs to be initialized first.
  5. When an interface defines a new JDK8 default method (an interface method decorated with the default keyword), the interface should be initialized before the implementation class of the interface is initialized.

The Java Virtual Machine Specification uses a very strong qualifier “have and only” for the six scenarios that trigger the initialization of a type, and the actions in these six scenarios are referred to as ** active references to a type. In addition, methods of all reference types do not trigger initialization and are called passive references **. Here are three examples of what a passive reference is, each in the code listing.

Listing 1, an example of a passive reference

package org.fenixsoft.classloading;

public class SuperClass{
	static {
    	System.out.println("SuperClass init!");
    }
    public static int value = 123;
}

public class SubClass extends SuperClass {
	static {
    	System.out.println("SubClass init!"); }}public class NonInitialization{
	public static void main(String[] args) { System.out.println(SubClass.value); }}Copy the code

After the above code runs, it simply prints “SuperClass init!” Instead of printing “SubClass init!” . For static fields, only the class that directly defines the field is initialized, so referring to a static field defined in a parent class by its subclass triggers initialization of the parent class but not the subclass. Whether to trigger the loading and validation phase of subclasses is not explicitly specified in the Java Virtual Machine Specification, so it depends on the implementation of the virtual machine. In the case of the HotSpot virtual machine, you can observe that this operation causes subclass loading with the -xx: +TraceClassLoading parameter.

Listing 2 shows the second example of a passive reference

package org.fenixsoft.classloading;

public class NotInitialization {
	public static void main(String[] args) {
    	SuperClass[] sca = new SuperClass[10]; }}Copy the code

To save space, this code reuses the SuperClass in Listing 1, and when it runs it doesn’t output “SuperClass init!” That does not trigger the class org. Fenixsoft. Classloading. SuperClass initialization phase. But this code inside triggered another called “[Lorg. Fenixsoft. Classloading. SuperClass” class initialization phase, for the user code, this is not a valid type name, It is a subclass that is automatically generated by the virtual machine and directly inherits from java.lang.Object. Creation is triggered by the bytecode instruction Newarray.

This class represents an element type is org. Fenixsoft. Classloading. The SuperClass of a one-dimensional array, the array of its properties and methods (the user can directly use only be modified for the length attribute of public and clone () method) are implemented in this class. Access to arrays in Java is safer than in C/C++, largely because this class wraps access to array elements, whereas in C/C++ it translates directly to the movement of array Pointers. In the Java language, thrown when check to an array of Java. Lang. ArrayIndexOutOfBoundsException abnormalities, avoids the direct cause for illegal memory access.

Listing 3 shows the third example of a passive reference

package org.fenixsoft.classloading;

/** * Constants are stored in the constant pool of the calling class at compile time. There is essentially no direct reference to the class that defines the constant, and therefore no initialization of the class that defines the constant is triggered **/
public class ConstClass {
	static {
    	System.out.println("ConstClass init!");
    }
    public static final String HELLOWORLD = "hello world";
}

public class NonInitialiazation {
	public static void main(String[] args) { System.out.println(ConstClass.HELLOWORLD); }}Copy the code

After the above code is run, there is no “ConstClass init! This is because the Java source code does refer to the ConstClass constant HELLOWORLD, but it is optimized for constant propagation at compile time by storing the value “Hello World” directly in the constant pool of the NonInitialization class. Subsequent references by NonInitialization to constclass. HELLOWORLD are actually converted to references by the NonInitialization class to its own constant pool. That is, there is no reference point to a ConstClass in NonInitialization’s Class file, and the two classes are no longer related to each other once they are translated to Class.

The interface loading process is slightly different from the class loading process. Some special instructions are required for the interface: the interface also has the initialization process, which is always the same as the class. The above code uses the static code block “static{}” to output the initialization information. The “static{}” code block cannot be used in the interface, but the compiler still generates a “()” class constructor for the interface, which initializes the member variables defined in the interface. Interfaces are really different from classes when the third of the six “have and only” scenarios that trigger initialization are described earlier: When a class is initialized, all of its parent classes are required to be initialized. However, when an interface is initialized, it does not require all of its parent classes to be initialized. Only when the parent interface is actually used (such as referencing constants defined in the interface) is initialized.

This chapter covers only section 2, and will continue to update the process of class loading in section 3 of the VIRTUAL machine class loading mechanism, including: loading, verification, preparation, parsing, initialization, and so on.