This article begins with an introduction to Java execution, which leads to a discussion of just-in-time compilers. The next part introduces the mechanism for hierarchical compilation, and finally the impact of just-in-time compilers on application startup performance.
This article is based on the HotSpot VIRTUAL machine, and the design of the Java version is explained in this article.
0 Java program execution process
Do Java programs interpret execution or compile execution?
When we first learned Java, we probably thought that Java was compiled and executed. In fact, Java has both interpreted and compiled execution.
The usual execution process of a Java program is as follows:
Java files compiled into.class bytecode by javac command, and then executed by Java command.
It should be noted that compilation principles are usually divided into front-end and back-end. The front end performs lexical, grammatical, and semantic analysis of the program, and then generates an Intermediate Representation called IR: Intermediate Representation. The backend then optimizes this intermediate representation to generate the target machine code.
In Java, the intermediate expression (.class) is generated after javac, for example
public class JITDemo2 {
public static void main(String[] args) {
System.out.println("Hello World"); }}Copy the code
Javap decompiled the above code as follows:
// javap -c JITDemo2.class Compiled from "JITDemo2.java" public class com.example.demo.jitdemo.JITDemo2 { public com.example.demo.jitdemo.JITDemo2(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello World 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;) V 8: return }Copy the code
When executing, THE JVM will first read IR instructions one by one to execute, and this process is the process of explaining the execution. When the number of times a method is called reaches the threshold defined by just-in-time compilation, just-in-time compilation will be triggered. At this time, the just-in-time compiler will optimize the IR and generate the machine code of the method. When the method is called later, it will directly invoke the machine code execution, which is the process of compilation execution.
So, from the.java file to the final execution, the process is roughly as follows:
(CodeCache is described below.)
So, when is real-time compilation going to happen? What about just-in-time compilation? Let’s keep going.
A preliminary study of Java instant compiler
The HotSpot virtual machine has two compilers called C1 and C2 compilers (a Graal compiler has been added since Java10).
C1 compiler corresponds to the parameter -client. C1 can be selected for programs with short execution time and requirements on startup performance.
C2 compiler corresponding parameter -server, peak performance requirements of the program, can choose C2.
But both C1 and C2 are involved in compiling, whether -client or -server. This mode is called mixed mode, which is the default mode, and can be seen from java-version:
C:\Users\Lord_X_>java -version
java version "1.8.0 comes with _121"
Java(TM) SE Runtime Environment (build 1.8. 0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Copy the code
The mixed mode in the last line illustrates this.
We can also enforce interpreted mode only with the -xint parameter, at which point the compiler is not involved at all, and the final line of Java-version shows Interpreted mode.
The -xcomp parameter can be used to specify only compile mode. In this case, all code will be compiled directly after the program is started. This method will slow down the startup time, but the efficiency of the code execution will be greatly improved because the interpretation execution and C1 and C2 compilation time are saved after startup. In this case, the last line of Java -version displays Compiled Mode.
The following code compares the efficiency of the three modes (a crude performance) :
public class JITDemo2 {
private static Random random = new Random();
public static void main(String[] args) {
long start = System.currentTimeMillis();
int count = 0;
int i = 0;
while (i++ < 99999999){
count += plus();
}
System.out.println("time cost : " + (System.currentTimeMillis() - start));
}
private static int plus(a) {
return random.nextInt(10); }}Copy the code
- The first is pure explain execution
-xint-xx :+PrintCompilation (Prints compilation information)
Execution Result:
The compile information is not printed, which proves that the just-in-time compiler is not working.
- Then there is the pure compile execution mode
Parameter for adding a VM: -xcomp -xx :+PrintCompilation
Execution Result:
A lot of compile information is generated
- Finally, there is hybrid mode
Parameter for adding a VM: -xx :+PrintCompilation
Execution Result:
Conclusion: Time consuming in descending order is pure interpretation mode > pure compilation mode > mixed mode
But here is only a very short program, if it is a long time running program, I do not know whether the efficiency of pure compilation mode will be higher than the mixed mode, and this test method is not strict, the best way should be in strict benchmark test.
When instant compilation is triggered
The basis for just-in-time compiler firing is twofold:
- The number of times the method was called
- The number of times the loop back to the edge is executed
When the JVM calls a method, it increments the counter by 1, and if the method has a loop body inside it, it increments the counter by 1 each time.
When hierarchical compilation is disabled (described in the next section), just-in-time compilation is triggered when a method’s counter reaches a value (1500 for C1 and 10000 for C2) specified by -xx :CompileThreshold.
Here’s an example of how to turn off hierarchical compilation when triggering just-in-time compilation:
- The first is triggered on method calls (no loops involved)
// Parameter: -xx :+PrintCompilation -xx: -tieredcompilation (close hierarchical compilation)
public class JITDemo2 {
private static Random random = new Random();
public static void main(String[] args) {
long start = System.currentTimeMillis();
int count = 0;
int i = 0;
while (i++ < 15000){
System.out.println(i);
count += plus();
}
System.out.println("time cost : " + (System.currentTimeMillis() - start));
}
// When called, the compiler counter +1
private static int plus(a) {
return random.nextInt(10); }}Copy the code
The result is as follows:
Since the count at interpret execution is not strictly synchronized with the compiler, it is not strictly 10000. In fact, as long as the number of calls is large enough, it can be considered hot code, and there is no need for strict synchronization.
- According to the loop back edge
public class JITDemo2 {
private static Random random = new Random();
public static void main(String[] args) {
long start = System.currentTimeMillis();
plus();
System.out.println("time cost : " + (System.currentTimeMillis() - start));
}
// When called, the compiler counter +1
private static int plus(a) {
int count = 0;
// Each time through the loop, the compiler counter +1
for (int i = 0; i < 15000; i++) {
System.out.println(i);
count += random.nextInt(10);
}
return random.nextInt(10); }}Copy the code
Execution Result:
- Loop back to the edge based on method calls
PS: There are 10 cycles per method call, so each method call counter should be +11, so just-in-time compilation should be triggered at approximately more than 10000/11=909 calls.
public class JITDemo2 {
private static Random random = new Random();
public static void main(String[] args) {
long start = System.currentTimeMillis();
int count = 0;
int i = 0;
while (i++ < 15000) {
System.out.println(i);
count += plus();
}
System.out.println("time cost : " + (System.currentTimeMillis() - start));
}
// When called, the compiler counter +1
private static int plus(a) {
int count = 0;
// Each time through the loop, the compiler counter +1
for (int i = 0; i < 10; i++) {
count += random.nextInt(10);
}
return random.nextInt(10); }}Copy the code
Execution Result:
3 CodeCache
CodeCache is the staging area for hot code, where code compiled by the just-in-time compiler resides, in off-heap memory.
The -xx :InitialCodeCacheSize and -xx :ReservedCodeCacheSize parameters specify the size of the CodeCache memory.
- -xx :InitialCodeCacheSize: specifies the initial CodeCache memory size. The default value is 2496 KB
- -xx :ReservedCodeCacheSize: specifies the reserved memory size for CodeCache. The default value is 48 MB
PS: You can print the default values for all parameters with -xx :+PrintFlagsFinal.
3.1 Monitoring the CodeCache Using JConsole
You can use the JDK’s built-in JConsole tool to see where the CodeCache is in memory, for example
It can be seen from the graph that more than 4M CodeCache has been used.
3.2 What happens when the CodeCache is full
CodeCache is usually ignored when allocating memory for an application. Although CodeCache does not take up much memory and has GC, it does not fill up. But if the CodeCache fills up, it can be disastrous for a high-QPS, performance-demanding application.
From the previous section, we learned that the JVM internally tries to interpret Java bytecode execution first, and that when a method call or loopback reaches a certain number of times, just-in-time compilation is triggered, which compiles Java bytecode to local machine code for efficient execution. The compiled native machine code is cached in the CodeCache, and the CodeCache will fill up if too much code triggers just-in-time compilation and does not GC in time.
Once the CodeCache is filled, already compiled code is executed as native code, but uncompiled code can only be run as interpreted execution.
From the comparison in Section 2, you can clearly see the performance difference between explain execution and compile execution. So for most applications, this is disastrous.
When the CodeCache is filled, the JVM prints a log:
The JVM provides a GC method for CodeCache: -xx :+UseCodeCacheFlushing. This parameter is turned on by default after JDk1.7._4 and will be attempted when the CodeCache is about to fill. JDK7 does not do too much recycling in this area, GC returns are lower and there is a big improvement in JDK8, so you can upgrade to JDK8 directly to improve performance in this area.
3.3 Recovery of CodeCache
So when is compiled code in CodeCache recyclable?
It starts with the way the compiler compiles. For example, the following code:
public int method(boolean flag) {
if (flag) {
return 1;
} else {
return 0; }}Copy the code
From the point of view of explaining execution, his execution process is as follows:
This is not necessarily the case with code compiled by the just-in-time compiler. The just-in-time compiler collects a lot of execution information before compiling. For example, if the code entered with a flag value of true, the just-in-time compiler might mutate it to look like this:
public int method(boolean flag) {
return 1;
}
Copy the code
So this is the picture below
If flag is set to false, the compiler will “de-optimise” it to make it run as made not entrant:
At this point, the method can no longer enter, and when the JVM detects that all threads have exited the compiled Made Not Entrant, it marks the method as made Zombie, and the memory occupied by this code is recoverable. As you can see from the compile log:
3.4 Tuning CodeCache
Java8 provides a JVM startup parameter: -xx :+PrintCodeCache, which prints the usage of CodeCache when the JVM is stopped. You can observe this value each time the application is stopped and slowly adjust it to the most appropriate size.
Use a SpringBoot Demo to illustrate:
// Start parameters: -xx :ReservedCodeCacheSize=256M -xx :+PrintCodeCache
@RestController
@SpringBootApplication
public class DemoApplication {
// ... other code ...
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
System.out.println("start....");
System.exit(1); }}Copy the code
Here I define CodeCache as 256M and print the CodeCache usage when the JVM exits, logging as follows:
At most 6721K (max_used) is used and a large amount of memory is wasted. In this case, you can try to reduce -xx :ReservedCodeCacheSize=256M and allocate the extra memory elsewhere.
4 Reference Documents
[1] https://blog.csdn.net/yandaonan/article/details/50844806
[2] Chapter 11 of an In-depth Understanding of Java Virtual Machines
[3] Zheng Yudi, Geek Time, In-depth Dismantling of Java Virtual Machine