Learning the JVM from Scratch series:

Learning from the Ground up ->JVM (Preface)

Learning from the ground up ->JVM (Preface) – Why delay?

JVM (a) : The Java Memory Model (JMM) is not the Java Virtual Machine memory model (JVM) oh!

Learning from the Ground up – JVM part 2: Why does Java Need a JVM (Java Virtual Machine)?

Learning from the Ground up ->JVM (3) : Classloader (1)

Learning from the Ground up ->JVM (4) : Classloader (Middle)

Learning from the ground up ->JVM (5) : Classloader (2)

Learning from the Ground up – JVM part 6: The Relationship between threads and JVMS

Learning from the ground up ->JVM (7) : Runtime Data Area (1)

Learning from the ground up ->JVM (8) : Runtime Data area (2)

Learning from scratch ->JVM (9) : Garbage Collection (1)

Learning from the ground up ->JVM (10) : Garbage Collection (middle)

Learning from scratch ->JVM (11) : Garbage Collection (2)

preface

As we now know, the JVM is the foundation of the Java language, the foundation on which Java programs run, and a necessary prerequisite for implementing Java features.

But what exactly does the JVM do? We still don’t know. Therefore, we need to take a closer look at the JVM. This, too, is the main purpose of my series of articles.

Let’s start with a picture:

This diagram shows the major components of the JVM.

We find that the class bytecode files generated by our Java program are loaded into our JVM through the class loader in the JVM. What about the loading process of the class loader?

Although the topic of this article is class loaders, we should remember that class loaders only play a role in the loading process of a certain node, but this does not mean that we do not need to do an overall understanding of the class loading process, because the class loading process is very important.

Therefore, we need to have a general understanding of the loading process of the class.

The loading flow of the class

Before we look at class loaders, let’s look at the overall flow of class loading.

A Java class begins when it is loaded into virtual machine memory and ends when it is unloaded. Its entire life cycle will go through Loading, Verification, Preparation, Resolution, and initialization Initialization, Using and Unloading. The three parts of Initialization, preparation and parsing are collectively known as Linking. The sequence of these seven stages is shown below.

The life cycle of the class (as shown) :

Generally speaking, the whole process of class loading in a Java virtual machine refers to the following five:

1. Load 2. Verify 3. Prepare 4Copy the code

Unlike languages that need to be wired at compile time, Java classes are loaded, wired (wired, including the validation, preparation, and parsing phases), and initialized while the program is running. This strategy makes early compilation difficult. It also slightly increases the performance overhead of class loading, but provides high extensibility and flexibility for Java applications. Java’s dynamically extensible language features rely on runtime dynamic loading and dynamic linking.

Load, validation, preparation, initialization and unload the order of the five stages is certain, the type of loading process must, in accordance with the order, step by step while parsing stage does not necessarily: in some cases it can start again after the initialization phase, this is in order to support the Java language runtime binding features (also called dynamic binding or late binding). Note that I am referring to a step-by-step “start” rather than a step-by-step “proceed” or a step-by-step “finish”, emphasizing that these phases are often intermingled with each other, invoking and activating one phase as it is being executed.

For example, writing an interface-oriented application can wait until run time to specify its actual implementation classes, and a user can have a native application load a binary stream at run time from the network or elsewhere as part of its program code through Java preset or custom classloaders.

One thing to note is that it is only visible to the developer during the load phase, and developers can participate in this step by writing their own class loaders.

Next, let’s take a look at each part of the Java class loading process.

1. Class loading process: loading

The loading stage is where the classloader (and this is where the classloader is mainly useful) looks for such a bytecode file by the fully qualified name of a Class and uses the bytecode file to create a Class object. So, at this stage, the JVM needs to do three things:

  1. Gets the binary byte stream that defines a class by its fully qualified name.
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area.

And, of course, the class loading is done by our class loader, class loader is usually provided by the JVM, but relative to the rest of the class loading process stage, not an array type of loading phase (phase accurately, is loaded in class for binary byte stream action) is the strongest one developer controllable phase.

What’s special about array types?

Array classes: Array classes themselves are not created by classloaders, but are dynamically constructed directly in memory by the Java virtual machine. But array classes are still closely related to class loaders, because the Element Type of an array (i.e., the Type of the array with all dimensions removed) is ultimately loaded by the class loader.

The loading phase can be done either by using the bootstrap class loaders built into the Java virtual machine or by user-defined class loaders. Developers define their own class loaders to control how the byte stream is retrieved (overriding a classloader’s findClass() or loadClass() method). The implementation wants to give the application the dynamics to get the running code.

As a result, the Java Virtual Machine Specification manages this phase so loosely that Java virtual machine developers can experiment with a variety of things during the loading phase, and users can define their own classloaders, on which much of the Java technology we take for granted is built.


The binary data of a class can be loaded from different sources using different class loaders.

  1. Reading from ZIP packages was common and eventually became the basis for future JAR, EAR, and WAR formats.
  2. Load a class file from the local file system, which is how most programs load classes.
  3. Load class files over the network.
  4. It can be obtained from an encrypted file. This is a typical protection against Class file decompilation. The Class file is decrypted during loading to protect the program running logic from snooping.
  5. Runtime compute generation, the scenario that is most commonly used is the dynamic Proxy technique. In java.lang.reflection.proxy, Is to use the ProxyGenerator. GenerateProxyClass () for the specific interface generated form of Proxy class for “* $Proxy” binary byte streams.

After the loading phase, the binary byte streams outside the Java virtual machine are stored in the method area in the format specified by the virtual machine.

Note that the stages of loading and the connection part of the action, such as part of the bytecode file format validation action) was performed by cross loading stage has not yet completed, connection phase (connection phase, contains validation, preparation and analysis the three stages) may have begun, but the clip in the middle of the stage of loading action, still belongs to the part of connection, The two stages still have a fixed sequence.

2. Class loading process: verification

First of all, we know that Java is a relatively safe language compared to C and C ++. If we use pure Java code, there are some things that cannot be done, such as:

  1. You can’t access data that is like outside the bounds of an array.
  2. You cannot transform an object into a class that it does not implement.
  3. Unable to jump to nonexistent code.

If we try to do this, our compiler will throw an error without hesitation.

But, as we all know, as a result of the JVM is running on the Java program compiled into bytecode file, so if we pass the bytecode code, modify the software, thus to achieve some of the things I said just now, when the JVM loading, and no validation, then lead to our program will appear a lot of mistakes even the possibility of collapse. The possibility of such an attack is why our JVM must have a validation phase.

Therefore, the simple conclusion is that the purpose of validation is to ensure that the byte streams in the “.class “files are fully compliant with the JVM’s specifications and do not compromise the JVM’s own security and are not subject to malicious intrusion.

So what is done to ensure compliance with the JVM’s specifications during validation?

  1. File format validation: The main purpose is to ensure that the input byte stream is properly parsed and stored in the method area in a format that describes the information of a Java type.

  2. Metadata verification: Semantic analysis of the information described by bytecode to analyze whether it conforms to the specification of Java language syntax.

  3. Bytecode validation: The most important validation step, analyzing the data flow and control to determine that the semantics are legitimate and logical. The main purpose is to verify the method body after metadata verification. Ensure that class methods do not have hazards at runtime.

  4. Symbolic reference verification: mainly for the conversion of symbolic reference to direct reference, it will extend to the parsing stage, mainly to determine the access type and other situations involving references, mainly to ensure that the reference will be accessed, there will be no class inaccessible problems.

3. Class loading process: Preparation

The preparation phase is the phase where you allocate memory and set the initial values for variables defined in a Java class.

Note that only class variables (i.e. static variables), not instance variables, are allocated.

The initial value at this stage is generally zero for the data type.

Zero values for the base type are as follows:

The data type Zero value
int 0
long 0L
short (short)0
char ‘\u0000’
byte (byte)0
boolean false
float 0.0 f
double 0.0 d
reference null

So, is it possible that the initial value is not zero? There is.

Take a look at this code:

 private static int id = 123456789;
Copy the code

We can conclude from what we’ve said before, that in the preparation phase, the initial value that id, the static variable, is going to get, is going to be zero, and that’s pretty obvious.

So, let’s look at the second code:

 private static final int id = 123456789;
Copy the code

You can obviously see that this code has one more final than the previous code, so what is the initial value of this variable?

In this code, the id variable, whose initial value is 123456789.

The reason is that this code, with final, generates a property ConstantValue for the variable ID in the field table of the bytecode during the compilation to class bytecode stage from the Java program.

When our virtual machine is preparing, if it encounters this property, the initial id is set to 123465798.

Note that we can say that static variables with final modifications have this initial value, but it is not just static variables with final modifications that do this. To say so would be narrow and probably incorrect.

The initial value is generated when a variable with the ConstantValue attribute appears in the field table after compilation.

4. Class loading process: parsing

First, let’s look at the definition of this phase: the parsing phase is the process by which the Java Virtual machine replaces symbolic references in the constant pool with direct references.

Here, we have a new concept, which is symbolic reference. So what is a symbolic reference?

When our Java code is compiled into Class bytecode, Java does not know the actual address of the referenced Class. When I say the actual address, I mean the exact location of the Class in the JVM memory.

But compilation continues, and the Java class must have a set of symbols to locate the class. Instead, Java decided to use symbolic references, which can be any literal, as long as the literal is accurately and unambiguously located to the target.

A direct reference, on the other hand, is a pointer that points directly to a target, a relative offset, or a handle that can be indirectly located to the target.

The Java Virtual Machine Specification does not specify when the parsing phase occurs, only that symbolic references are parsed before the following bytecode instructions are executed to manipulate symbolic references:

Bytecode instructions for manipulating symbol references:  ane-warray checkcast getfield getstatic instanceof invokedynamic invokeinterface invoke-special invokestatic invokevirtual ldc ldc_w ldc2_w multianewarray new putfield putstaticCopy the code

So the virtual machine implementation can decide for itself whether to resolve symbolic references in the constant pool as soon as the class is loaded by the loader, or to wait until a symbolic reference is ready to be used.

In the analysis stage, the following seven types of symbolic references are analyzed:

1. Class 2. Interface 3. Field 4. Class method 5. Call the point qualifierCopy the code

5. Class loading process: initialization

The initialization stage of a Java class is the last step in the class loading process. In the previous several class loading actions, except for the loading stage, the user can participate in the loading by means of a custom class loader, the rest of the actions are completely controlled by the Java VIRTUAL machine.

It is not until the initialization phase that the Java virtual machine actually begins to execute the Java program code written in the class, handing over control to the Java application.

So, what does this initialization do? Let’s take a look at some of the code that we showed you earlier.

 private static int id = 123456789;
Copy the code

In this code, in preparation, the static variable ID is assigned an initial value, which is generally zero for the data type, i.e., 0. So, at our current initialization stage, the static variable ID will be assigned 132456789.

Of course, the initialization phase does more than that. The main work in this phase is:

  1. Assign a defined value to a class variable (static variable)
  2. Execute static code blocks (static code blocks can only access static variables defined before static code blocks, and static variables defined after static code blocks can be assigned, but cannot be accessed)

It is important to note that during initialization, the Java virtual machine must ensure that a class is properly locked and synchronized in a multithreaded environment.

If multiple threads attempt to initialize the same class at the same time, only one of them will do so, and all the other threads will block and wait until the active thread completes its initialization.

Of course, if you have a very long operation in a method of this class, it can cause other waiting processes to block, and in practice this blocking is often hidden, so I’ll explain it here.

But in reality, regardless of multithreading, a type is initialized only once in the same classloader.

conclusion

In this article, we have only written about the loading process of the class, but we have not written about what the class loader is and the loading mechanism of the class loader.

However, to understand classloaders, or the loading process of Java classes in general, we need to have a clear understanding of what this article is about. When we have a clear understanding of the loading process of classes, we can understand the role of classloaders.

(PS: Well, we all know that class loaders are used in the “load” step of the class loading process.)

When I was writing this article, I felt I was really tired. It was very difficult for me to find the most suitable direction and presentation way from numerous existing articles and books.

That’s it. The next article, “Learning the JVM from the Ground up: Classloaders (Middle),” will be published as soon as possible.

Refer to the blog

Blog.csdn.net/javazejian/…

Blog.csdn.net/m0_38075425…

Blog.51cto.com/lavasoft/15…

Juejin. Cn/post / 689056…

Reference books

Understanding the Java Virtual Machine by Zhiming Zhou