In-depth understanding of the Java Virtual Machine (program compilation and code optimization)

This article was published on the official wechat account BaronTalk, welcome to follow!

The pursuit of performance and efficiency has always been an eternal tenet of program development. In addition to our own coding process, we should fully consider the performance and efficiency of the code, and the virtual machine will also optimize the code during compilation. This article looks at the virtual machine level to see how the virtual machine optimizes the code we write.

I. Early optimization (compile-time optimization)

The Java language’s “compile time” is actually an “indeterminate” operation. Because it may be a process by which a front-end compiler (such as Javac) compiles *.java files into *.class files; It could also be the process by which bytecode files are compiled into machine code by the just-in-time Compiler (JIT) at runtime. It is also possible that the static AOT Compiler (Ahead Of Time Compiler) directly compilers *. Java files to native machine code.

Compilers like Javac do little to improve the efficiency of the code, and the virtual machine design team puts performance improvements into the just-in-time compiler on the back end, This allows class files not produced by Javac (Groovy, Kotlin, etc.) to take advantage of compiler optimizations. However, Javac has made many optimization measures for the Java language coding process to improve the programmer’s coding style and improve the coding efficiency. Rather than relying on low-level improvements to the virtual machine, quite a few new Java syntax features are implemented with compiler “syntax sugar”.

In Java, just-in-time compiler optimizations at runtime are more important to program execution, whereas front-end compiler optimizations at compile time are more important to program coding.

1.1 Javac compiler

The compilation process of the Javac compiler can be roughly divided into three steps:

Parse and populate symbol tables;
Annotation processing of plug-in annotation processor;
Analysis and bytecode generation.

The relationship between these three steps is shown below:

Parse and populate symbol tables

The analytical step includes lexical analysis and grammar analysis in classical program compilation principle. After lexical and grammatical analysis, the next step is the process of populating the symbol table. A symbol table is a table consisting of a set of symbol addresses and symbol information. In semantic analysis, the contents registered in the symbol table are used for semantic checking and intermediate code generation. In the object code generation stage, symbol table is the basis of address allocation for symbol names.

Annotation processor

Annotations were added in JDK 1.5. With a standard API for compiler Annotation handling, our code can interfere with compiler behavior, such as generating class files at compile time.

Semantic analysis and bytecode generation

After parsing, the compiler gets an abstract syntax tree representation of the program code, which can represent an abstraction of a well-structured source program, but does not guarantee that the source program is logical. The main task of semantic analysis is to conduct context-specific reviews of structurally correct source programs, such as type reviews.

Bytecode generation is the last stage of the Javac compilation process. Bytecode generation involves not only converting the information generated in the previous steps (syntax tree, symbol table) into bytecode and writing it to disk, but also minor code addition and conversion by the compiler. The () method mentioned earlier is added to the syntax tree at this stage.

During the bytecode generation phase, in addition to generating the constructor, there are other code substitutions to optimize the implementation logic of the program, such as replacing string addition operations with StringBiulder or StringBuffer.

Completed the syntax tree traversal and after adjustment, will fill in the required information symbol table to com. Sun. View javac. JVM. ClassWriter class, by this class writeClass output byte code () method, the resulting bytecode file, And that’s the end of the compilation process.

1.2 Java syntax sugar

Java provides a lot of syntax sugar to facilitate program development. Although syntax sugar does not provide substantial functional improvements, it can improve development efficiency, syntax rigor, and reduce the chance of coding errors. Here’s what we can’t see behind the grammar candy.

Generics and type erasure

Generics, as the name suggests, are essentially the application of parameterized types, that is, the data type of an operation is specified as a parameter. This parameter can be used in the creation of classes, interfaces, and methods, called generic classes, generic interfaces, and generic methods, respectively.

Before generics existed in the Java language, type generalization could only be achieved through the combination of Object being the parent of all types and casting. For example, the get() method of a HashMap returns an Object, so only the programmer and the running virtual machine know what type the Object is. At compile time, there is no way for the compiler to check that the cast of this Object was successful, and if you rely solely on the programmer to ensure that it is correct, many of the ClassCastException risks are transferred to the runtime.

In the Java language, generics exist only in the program source code. In the compiled bytecode files, they have been replaced with the original native type, and cast code has been inserted in place. So ArrayList is the same type as ArrayList to the Runtime Java language, so generics are actually a syntactic sugar of the Java language, and the implementation of such generics is called type erasers.

Automatic packing, unpacking and traversing cycle

Autoboxing, unboxing, and traversal loops are the most commonly used syntactic sweeteners of the Java language. This one is easier, so let’s go straight to the code:

public class SyntaxSugars {

    public static void main(String[] args){

        List<Integer> list = Arrays.asList(1.2.3.4.5);

        int sum = 0;
        for(int i : list){
            sum += i;
        }
        System.out.println("sum = "+ sum); }}Copy the code

Automatic boxing, unboxing and traversal cycles after compilation:

public class SyntaxSugars {

    public static void main(String[] args) {

        List list = Arrays.asList(new Integer[]{
                Integer.valueOf(1),
                Integer.valueOf(2),
                Integer.valueOf(3),
                Integer.valueOf(4),
                Integer.valueOf(5)});int sum = 0;
        for (Iterator iterable = list.iterator(); iterable.hasNext(); ) {
            int i = ((Integer) iterable.next()).intValue();
            sum += i;
        }
        System.out.println("sum = "+ sum); }}Copy the code

The first code contains generics, autoboxing, autounboxing, traversal loops, and variable-length arguments. The second code shows how they change after compilation.

Conditional compilation

The implementation of conditional compilation in the Java language is also a syntactic candy. Depending on the Boolean constant value, the compiler will eliminate untenable blocks of code in the branch.

public static void main(String[] args) {
    if (true) {
        System.out.println("block 1");
    } else {
        System.out.println("block 2"); }}Copy the code

The decompilation result of the class file after the above code is compiled:

public static void main(String[] args) {
    System.out.println("block 1");
}
Copy the code

Ii. Late optimization (runtime optimization)

In some commercial virtual machines, Java is initially executed through an interpreter, which identifies a method or block of Code as “Hot Spot Code” when the virtual machine detects that it is being run particularly frequently. To improve the efficiency of hot code execution, the virtual machine compiles the code into local platform-specific machine code and performs various levels of optimization at run time by a compiler called a just-in-time compiler (JIT).

The just-in-time compiler is not a necessary part of the virtual machine, and the Java Virtual Machine specification does not stipulate that a just-in-time compiler must exist inside the virtual machine, nor does it limit or guide how it should be implemented. However, JIT compilation performance and code optimization is one of the most critical indicators to measure the excellence of a commercial virtual machine.

2.1 Just-in-time compiler in HotSpot VIRTUAL machine

Since the Java Virtual Machine specification does not define how the just-in-time compiler is implemented, the content of this section depends entirely on the implementation of the virtual machine. We take HotSpot to illustrate here, but the following content is very little about the implementation details, and the implementation of JIT in mainstream virtual machines is quite similar, so it is also a very high reference value to understand the implementation of other virtual machines.

Interpreters and compilers

Although not all Java virtual machines use an interpreter and compiler architecture, many mainstream commercial virtual machines, such as HotSpot, J9, etc., contain both interpreter and compiler.

Both the interpreter and the compiler have advantages:

When a program needs to be started and executed quickly, the interpreter can take over first, saving compilation time and executing immediately. After the program runs, as time goes by, the compiler comes into play, and more and more code is compiled into machine code to achieve higher execution efficiency.
When the program is running in an environment where memory resources are limited (such as some embedded systems), interpreter execution can be used to save memory, and compilation execution can be used to improve efficiency.

At the same time, the interpreter can also as a compiler optimization in radical “doors,” choose some of the most of the time when the compiler according to probability can improve running speed of optimization method, when the radical optimization of assumptions, such as load type inheritance structure after the new class changes, there was a “rare trap” can be explained through the inverse optimization back to the state to continue.

Compile objects and trigger conditions

There are two types of “hot code” that are compiled by the just-in-time compiler while a program is running:

A method that is called multiple times;
The body of a loop that is executed multiple times.

These two types of code that are executed repeatedly are called “hot code.”

For a method that is called multiple times, the code inside the method will naturally be executed multiple times, which is naturally hot code.
The loop body for multiple execution is to solve the problem that a method is called only once or a small number of times, but there are many loop bodies inside the method body. In this way, the code of the loop body is also executed repeatedly, so these codes are also hot codes.

In the first case, since the compilation is triggered by a method call, it is natural for the compiler to compile the entire method as a compilation object, which is standard JIT compilation in a virtual machine. In the latter case, even though the compile action is triggered by the body of the loop, the compiler still uses the entire method as the compile object, rather than the individual body of the loop. Because this compilation occurs during method execution, it is vividly referred to as On Stack Replacement (OSR compilation, where the method is replaced while the Stack frame is still On the Stack).

We’re saying it multiple times, but how many times is multiple? How does the virtual machine count how many times a method or piece of code has been executed? By answering these two questions, you have answered the just-in-time compiler trigger condition.

The act of determining whether a piece of code is hot and needs to trigger just-in-time compilation is called “hotspot detection.” In fact, hotspot detection does not necessarily need to know how many times the method has been called. At present, there are two main methods to determine hotspot detection.

Sample-based hotspot detection: A virtual machine that uses this approach periodically checks the top of each thread stack, and if it finds a method (or methods) frequently at the top of the stack, that method is a “hotspot method.” The advantages of sampling-based hot spot detection are simple and efficient implementation, and it is easy to obtain the method call relationship (just expand the call stack), but the disadvantages are that it is difficult to accurately confirm the heat of a method, and it is easy to disturb hot spot detection due to thread blocking or other external factors.
Counter based hot spot detection: A virtual machine with this approach sets up a counter for each method (even a block of code), counts the number of executions of the method, and considers it a “hot method” if the number of executions exceeds a certain threshold. This statistical method is more difficult to implement, need to establish and maintain counters for each method, and can not directly obtain the method call relationship, but the statistical results are relatively more accurate and rigorous.

The HotSpot VIRTUAL machine uses the second approach: counter – based HotSpot detection. So it has two types of counters for each method: the Method Invocation Counter and the Back Edge Counter.

In the case of the virtual machine running parameters, both counters have a defined threshold that triggers JIT compilation when the counter is exceeded.

Method call counter

As the name suggests, this counter counts the number of times a method is called. When a method is called, it is first checked to see if a JIT-compiled version of the method exists, and if so, it is executed using compiled native code in preference. If not, the method’s call counter is incremented by one, and then the method call counter and the back side counter are determined to exceed the threshold of the method call counter. If the threshold is exceeded, a code compilation request for the method is submitted to the just-in-time compiler.

If nothing is done, the execution engine does not wait synchronously for the compile request to complete, but continues to enter the interpreter and execute the bytecode as interpreted until the submitted request has been compiled by the compiler. When the compilation is complete, the method’s call entry address is automatically rewritten and the compiled version is used the next time the method is called.

Without setting anything, a method call counter does not count the absolute number of times a method is called, but rather a relative frequency of execution, the number of method calls over time. After a certain time limit, if a method has not been called enough times to submit to the just-in-time compiler, the method’s call counter value is reduced by half. This process is known as the method call counter heat decay, and this period of time is known as the statistical half-life of the method.

You can set the virtual machine parameters to turn off heat decay and have the method counter count the absolute number of method calls, so that most methods will be compiled into local code as long as the system is running. In addition, you can set VM parameters to adjust the half-life time.

Back edge counter

A loopback counter counts the number of times the loop body code is executed in a method. The instruction encountered in the bytecode that controls the redirection is called a “Back Edge.” The purpose of setting up the back counter statistics is to trigger OSR compilation.

When the interpreter encounters a loopback instruction, it looks for a compiled version of the code fragment to be executed, and if so, it executes the compiled code first; otherwise, it increments the loopback counter by one, and then determines whether the sum of the method call counter and the loopback counter exceeds the counter threshold. When the threshold is exceeded, an OSR compilation request is submitted and the value of the back side counter is lowered to continue the loop in the interpreter, waiting for the compiler to output the compilation results.

Unlike the method counter, the backside counter does not count heat decay, so this counter counts the absolute number of times the method loops. When the counter overflows, it also adjusts the value of the method counter to the overflow state, so that the standard compilation process is performed the next time the method is entered.

2.2 Compilation optimization technology

We all know that compiling native code is faster than interpreting it, partly because it saves the extra time the virtual machine spends interpreting bytecode execution. Another is that the virtual machine design team concentrated almost all of their code optimizations into the just-in-time compiler. In this section we take a look at the optimization techniques that the HotSpot VIRTUAL machine’s just-in-time compiler uses to compile code.

Overview of optimization techniques

There are many techniques for code optimization, and implementing them can be difficult, but most of them are fairly easy to understand. For the sake of introduction, let’s start with a simple piece of code and see what code optimizations the virtual machine does.

static class B {
    int value;
    final int get(a) {
        returnvalue; }}public void foo(a) {
    y = b.get();
    z = b.get();
    sum = y + z;
}
Copy the code

The first thing to be clear is that these code optimizations are based on some intermediate representation or machine code of the code, not Java source code. Java code is used here for demonstration purposes.

The above code looks simple, but there are many areas where you can optimize.

The first step is Method Inlining, which is more important than other optimization measures. The purpose of method inlining is mainly two, one is to remove the cost of method invocation (such as building stack frames), and the other is to establish a good foundation for other optimization. After method inlining expansion, it is convenient to adopt subsequent optimization means in a wider range, so as to obtain better optimization effect. Therefore, compilers of all kinds tend to put inline optimizations first in the optimization sequence. The inline optimized code looks like this:

public void foo(a) {
    y = b.value;
    z = b.value;
    sum = y + z;
}
Copy the code

The second step is redundancy elimination. In the code, “Z = B. value; Can be replaced by “z = y”. So you don’t have to access the local variable of object B anymore. If you think of B. value as an expression, you can also think of this optimization as common subexpression elimination. The optimized code looks like this:

public void foo(a) {
    y = b.value;
    z = y;
    sum = y + z;
}
Copy the code

The third step is to propagate the copy, because there is no need to use an extra variable z in this code, it is exactly equivalent to the variable y, so you can use Y instead of Z. The code copied and propagated is as follows:

public void foo(a) {
    y = b.value;
    y = y;
    sum = y + y;
}
Copy the code

The fourth step is garbage code elimination. Useless code can be code that never executes or code that makes no sense at all. Therefore, it is vividly called “Dead Code”. Y = y does not make sense in the above code, so the code after garbage elimination looks like this:

public void foo(a) {
    y = b.value;
    sum = y + y;
}
Copy the code

After these four optimizations, the latest optimized code achieves the same effect as the pre-optimized code, but the optimized code is more efficient in execution. These compiler optimizations are complex to implement, but they are easy to understand. Here’s how some of the most representative optimization techniques work:

Common subexpression elimination;
Array boundary checking elimination;
Method inline;
Escape analysis.

Common subexpression elimination

If an expression E has been evaluated, and the values of all the variables in E have not changed since the previous evaluation, then the occurrence of E becomes a common subexpression. For this type of expression, there is no need to spend time evaluating it again, just use the result of the previously evaluated expression instead of E. If the optimization is limited to the basic blocks of the program, it is called local common subexpression elimination, and if the optimization covers more than one basic block, it is called global common subexpression elimination.

Array boundary checking eliminated

Array [] = array[I]; I >= 0 && I < array.length; otherwise, an exception will be raised: Java. Lang. ArrayIndexOutOfBoundsException, this is the array bounds checking.

For the virtual machine execution subsystem, every read and write to an array element has an implicit conditional operation, which is not a small performance cost for the program code with a large number of array accesses. Array bounds checking is a must for security, but it doesn’t have to be done every time. For example, when accessing an array during a loop, if the compiler knows from data flow analysis whether the loop variable is in the range [0, array.length], it can eliminate the upper and lower bounds checking of the array throughout the loop.

Methods the inline

Method inlining has been introduced previously through code analysis and will not be covered here.

Escape analysis

Escape analysis is not a means to optimize code directly, but rather an analysis technique that provides the basis for other optimization methods. The basic behavior of escape analysis is analyzing the dynamic scope of an object: when an object is defined in a method, it may be referenced by an external method, such as passing it as a call parameter to another method, called method escape. It can even be accessed by an external thread, such as an assignment to a class variable or an instance variable that can be accessed in another thread, called thread escape.

If you can prove that an object cannot escape from a method or thread, that is, no other method or thread can access the method in any way, then you can do some efficient optimizations for this variable. Such as:

On-stack allocation: If it is determined that an object will not escape from a method, memory can be allocated on the stack, and the memory occupied by the object can be destroyed as the stack frame goes off. In general, the proportion of local objects that do not escape is large and can be allocated on the stack to greatly reduce the pressure on the GC.
Synchronization elimination: If escape analysis can determine that a variable cannot escape from the thread and be accessed by other threads, then the variable can be read and written without multithreaded contention, and thus the synchronization of variables can be eliminated.
Scalar substitution: A scalar is a piece of data that cannot be broken down into smaller pieces. Primitive data types in the Java virtual machine cannot be further broken down, so they are scalars. Conversely, a piece of data that can be further decomposed is called an aggregate, and an object in Java is an aggregate. Scalar substitution is used when a Java object is broken up and accessed by restoring its used member variables to their original types based on access conditions. If escape analysis proves that an object cannot be accessed externally and that the object can be disassembled, the program may not create the object, but instead create its member variables that are used by the method. After the object is split, the member variables of the object can be allocated and read and written on the stack, and the conditions can be created for further optimization means.

3. To summarize

In two sections, this article describes the process of compiling a Java program from source code to bytecode and from bytecode to native machine code. When combined, the execution of the Javac bytecode compiler and JIT compiler in the virtual machine is essentially the same as that performed by a traditional compiler. In the next article, we’ll talk about how virtual machines can efficiently handle concurrency.

References:

Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 2)

If you like my articles, follow my public account BaronTalk, my zhihu column or add a Star on GitHub.

Wechat official account: BaronTalk

Zhihu column: zhuanlan.zhihu.com/baron

GitHub：github.com/BaronZ88