Back-end compilation and optimization

This book is excerpted from Understanding the Java Virtual Machine in Depth, 3rd Edition.

An overview of the

As mentioned earlier, front-end compilation is the process of compiling Java source code into Class bytecode, while back-end compilation corresponds to the process of converting Class files into binary machine code specific to the local machine. The JVM then hands each bytecode to be executed to the interpreter, which translates it into the corresponding machine code, which executes it, and the Java program runs

Just-in-time compiler

When the VIRTUAL machine finds a method or Code block that is running particularly frequently, it identifies the Code as HotSpot Code. To improve the efficiency of HotSpot Code, the virtual machine compiles the Code into local machine Code and optimizes the Code in various ways. Back-end compilers that do this at run time are called just-in-time compilers

1. Compile objects

There are two main types of hotspot code:

A method that is called multiple times
The body of a loop that is executed multiple times

In both cases, the compiled target object is the entire method body. In the first case, since compilation is triggered by method calls, there is no doubt that the entire method is compiled as an object. In the latter case, although the compiler still uses the entire method as a compilation object, the execution entry (starting from the bytecode execution of the method) is slightly different.

2. Trigger conditions

How do I identify hot code? Do you need to compile on the fly? This behavior is called Hot Spot Code Detection. For hotspot Detection, it is not necessary to know how many times the method has been called. Currently, there are two mainstream methods to determine hotspot Detection:

Sample Based HotSpot Code Detection

The virtual machine periodically checks the top of the call stack for each thread, and if it finds a method (or methods) that are frequently at the top of the stack, that method is a hot method. The advantage of this approach is that it is simple and efficient to implement, and it is easy to obtain the method call relationship (just expand the call stack), but the disadvantage is that it is very accurate to determine the popularity of a method, which can be affected by thread blocking or other external factors
Counter Based HotSpot Code Detection

The virtual machine sets up counters for each method (even a code block), counts the number of times a method is executed, and considers it a hot method if it is executed more than a certain threshold. This statistical method is more difficult to implement and cannot directly obtain the method call relationship, but the result is relatively more accurate

HotSpot uses Counter – based HotSpot detection and has two types of counters: the method Invocation Counter and the Back Edge Counter (which means to jump Back at the loop boundary). Both counters have a clear threshold that triggers just-in-time compilation when overflowing, provided that the vm’s operating parameters are determined.

When a method is called, the virtual machine checks to see if a just-in-time compiled version of the method exists, and if so, executes it using compiled native code first. If the just-in-time compiled version is not executed, the method’s call counter value is incremented by one, and the method’s call counter and backside counter are determined to exceed the threshold. If so, a code compilation request for the method is submitted to the just-in-time compiler

Execution engine if it didn’t make any setting default not synchronous waiting for build request is completed, but continue to enter the bytecode interpreter implementation, until the request has been submitted timely compiler, when compiled after the completion of the work, this method is called entry address will be automatically rewritten into a new value, the next call this method will use the compiled version

By default, a method call counter counts not the absolute number of times a method is called, but a relative frequency of execution, the number of times a method is called over a period of time. When over a certain time limit, if the method call number is still not enough to make it to instant compiler, that this method is called the Counter will be halved, this process is called method invocation Counter heat attenuation Decay (Counter), the action is in a virtual machine during garbage collection by the way, also can turn off the heat fade, Let the virtual machine count the absolute number of method calls. Over time, most of the methods in your program will be compiled into native code

Take a look at the Back Edge counter, which counts the number of times the loop body code is executed in a method. An instruction in bytecode that controls a redirection is called a Back Edge. When the interpreter encounters a loopback instruction, it looks for a compiled version of the code fragment to be executed. If so, it executes the compiled code first; otherwise, it increments the loopback counter by one, and then determines whether the sum of the method caller and the loopback counter exceeds the loopback counter threshold. When the threshold is exceeded, an on-stack replacement compilation request is submitted and the value of the backside counter is slightly lowered to continue the loop in the interpreter, waiting for the compiler to output the compilation results

The backside counter does not count heat decay, so it counts the absolute number of times a method loop is executed. When the counter overflows, it also sets the value of the method counter to the overflow state, so that the standard compilation process is performed the next time the method is entered

Ahead of time compiler

There are two distinct branches of precompiler research: static translation, which compiles program code to machine code before the program runs, and preloading and saving compilations that the just-in-time compiler would have done at runtime the next time it runs the code

The first traditional form of pre-compile applications addresses one of the biggest weaknesses of just-in-time compilation: the cost of running time and time resources. The second approach is essentially cache acceleration for the just-in-time compiler.

Precompilers, without execution time and resource constraints, can use heavy-load optimizations without fear, which is a great advantage, but just-in-time compilers also have their advantages:

Profile-guided Optimization of performance analysis

The real-time compiler collects performance monitoring information during execution, such as which branch the conditional is usually taken, how many times the loop is performed, and so on. This data is generally not available in static analysis, or can not get a clear solution. However, it can be seen that they have a very obvious bias when running dynamically. For example, if a path of a conditional branch is executed frequently, hot code can be optimized and more resources can be allocated
Aggressive Speculative Optimization

Static optimization must ensure that the optimization program before and after visible to external impact, not just the execution result) are equivalent, and instant compilation can need not so conservative, if performance monitoring monitoring information to support it to make some correct possibility is very large, but there is no guarantee that absolutely right prediction judgement, can boldly, in accordance with the high probability of the hypothesis is optimized, If you do go to a rare branch and fall back to a low-level compiler or even an interpreter, there will be no irreparable consequences
Link time optimization

The Java language is by nature dynamically linked, with Class files being loaded into virtual machine memory at runtime and optimized for native code produced in the just-in-time compiler

Compiler optimization techniques

The goal of a compiler is to translate program code into local machine code, but the difficulty is not whether it can successfully translate machine code, the optimization quality of output code is the key to determine whether a compiler is good or not

1. Method inlining

Method inlining is copying the code of the target method exactly into the calling method, avoiding the actual method call. Method inlining sounds simple, but implementation is not because of method resolution and dispatch mechanisms. Only private methods called with the Invokespecial directive, instance constructors, superclass methods, static methods called with the Invokestatic directive, and methods decorated with final are resolved by the compiler. While other Java methods must make polymorphic selection of method receivers at run time, they may have more than one version of method receivers

To solve this problem, the Java virtual machine introduced a technique called type inheritance relationship analysis to determine whether there is more than one implementation of an interface in a currently loaded class, whether a class has subclasses, and whether a subclass overrides a virtual method of its parent class. This way, if you encounter a non-virtual method, you can simply inline it. If only one version is found, the system will be directly lined up. This method is called “Guarded Inlining”. But because Java programs are dynamically wired, new types can be loaded in, so guarding inline is a radical predictive optimization and you have to have fallbacks. If the inheritance relationship changes, the compiled code must be discarded, returned to the interpreted state for execution, or recompiled

If a method has multiple versions of the target method to choose from, the virtual machine uses Inline Cache to reduce the overhead of method calls. The inline cache is a cache built before the normal entry of the target method. If no method call occurs, the inline cache state is empty. After the first call, the cache records the version information of the method receiver and compares the version of the receiver each time a method call is made. If it is the same every time, use it directly, otherwise look up the virtual method table for method dispatch

2. Escape analysis

The basic principle of escape analysis is: analyzing object dynamic scope. When an object is defined in a method, it may be referenced by an external method, for example, as a call parameter to other methods. This is called method escape. It may even be accessed by an external thread, which is called thread escape

Depending on the degree of escape of an object, different degrees of optimization can be performed:

On the stack

Objects are allocated on the stack, and you can access the object data stored in the heap by holding a reference to the object. If you are certain that an object will not escape the thread, you can have that object allocate allocated memory on the stack
Scalar replacement

Data that can no longer be decomposed into smaller data representations, such as primitive data types, are called scalars. In contrast, if a piece of data can be further decomposed, it is called an aggregate quantity, such as an object. If an object is not accessible from outside the method, the object can be split into multiple scalars that replace the original references to the object’s member variables
Synchronization to eliminate

If a variable does not escape from the thread, there is no contention for reading or writing the variable, and synchronization can be safely eliminated

3. Common subexpression elimination

An expression E is called a common subexpression if it has been evaluated before and the values of all the variables in E have not changed since the previous evaluation. And then you don’t have to spend any time recalculating, you just replace E with what you did before

Suppose you have the following code

int d = (c * b) * 12 + a + (a + b * c);
Copy the code

The compiler detects that c * b and b * c are the same expressions, and the values of b and C remain the same, so this expression may be treated as

int d = E * 12 + a + (a + E);
Copy the code

4. Array boundary check elimination

We know that arrays in Java cannot be accessed out of bounds or a runtime exception will be thrown, thanks to automatic context scoping. But having to check every time an array element is read or written is a burden. In any case, array boundary security checks must be done, but the vm will use the data analysis stream to determine whether the array subscript is out of bounds at compile time, avoiding excessive overhead