In-depth understanding of the JVM- front-end and back-end compilation

The front-end compiler

Compile front-end refers to compile the Java code into the bytecode process, in general, it is difficult to optimize the process, because it is just a “translation”, compile the front-end is also responsible for to provide all kinds of syntactic sugar support, and features such as type checking and this reading notes on understanding of performance optimization and the underlying principle, then skip this part. Please be informed.

The back-end compiler

In general, back-end compilation refers to the process Of compiling a class file into platform-specific binaries, and in Java, although post-AOT (Ahead Of Time) and other technologies, just-in-time compilation is dominant, we inevitably talk about it a lot.

Just-in-time compiler

On mainstream virtual machines, programs are compiled and run in an interpreted-run manner, and hot code is compiled in real-time to improve performance after running for some time.

Interpreters and compilers

In addition to being used as a “quick start,” interpreters can also be used as “backup” to keep the program running if the just-in-time compiler optimizes it. In general, just-in-time compiler optimizations are aggressive, reaching THE O2 level of C/C++ optimization.

The HotSpot VIRTUAL machine has two just-in-time compilers built in, and there is actually a third that is currently being implemented and may be released at a later date. Let’s start with these two. These are the client-side and server-side compilers, also known as C1 and C2 compilers.

Because just-in-time compilation takes a long time and requires interpretive execution to collect performance information, a hierarchical compilation model is introduced through the combination of interpretive executor and just-in-time compiler.

Hierarchical compilation itself is not complicated, is the interpreter, C1, C2 to compile the model, depending on the condition of selection, a program may be executed by the interpreter, C2 compiled, more C1 compiled methods. The criteria here include the judgment basis collected for method body information, runtime performance information, etc.

Hierarchical compilation details omitted from the table, please note.

Here’s a good article on JVM advances: Talk about just-in-time compilation

Generally speaking, C1 compilers have high compilation speed and low optimization degree, while C2 is the opposite. In the initial stage of hierarchical compilation, C1 can do simple compilation to gain enough time, and C2 can come on stage for in-depth compilation.

Compile objects and trigger conditions

The hot code mentioned above is compiled on the fly.

1 one method of repeated call ️
2 ⃣ ️ loop body

Although the body of the loop is in a block of code, the optimization of the body of the loop is done in terms of the method in which the body is located. That is, both types of hot code trigger just-in-time compilation of the method body.

There’s nothing to say about compiling a method, just compile it and call it later; For loop compilation, however, an on-stack substitution is performed, where part of the method is replaced with just-in-time compiled code at the time the method executes.

You might have an idea here, don’t you say compile by method? Why on the stack replacement? Why not just call the compiled method?

Conclusion first, this is to run points run high and temporary strategy!

Let’s imagine a situation where a method is executed only once, such as the main method, but the method has a loop! This loop is executed 100,000 times, so if we do it in terms of methods, we’re not going to be able to optimize, are we? So what? JVM statistics the method execution times, and then wait to threshold, such as ten thousand, triggering a JIT, compile the cycle, and then wait for the ten thousand one hundredth time when compiling is completed, the next cycle is submitted to the compiled code execution, have been executed at this time will be at the same time the state of the variable value (calculated) is passed to the compiled code. This is called on-stack substitution, where parts of the virtual machine stack are replaced with JIT code rather than the entire method.

This is usually not the case because the JVM is not stupid and programmers are not stupid, which is very useful in running times! But even with the on-stack substitution, the method is eventually compiled to JIT for future calls! That’s why we say compilation units are methods, because OSR is more of an AD hoc policy.

Some articles can be referred to:

What is the mechanism of ON-stack Replacement (OSR)? – Answer by RednaxelaFX – Zhihu

The way to determine whether a method or the body of a loop is hot code is hotspot detection. There are two hot spot detection technologies:

1 discount ️ based on sampling. This is done by periodically checking the top of the stack for thread calls, and if a method appears frequently on the top of the stack, it is probably a hot method.
2 discount ️ based on counters. Sets a counter for a method or block of code to count the number of calls.

Both methods have their advantages and disadvantages. Here we take HotSpot as an example for analysis. HotSpot uses counter – based HotSpot detection technology. Two counters are maintained: the method call counter and the loopback counter.

Method call counter: Counts method calls and triggers just-in-time compilation when the sum of a method’s method call counter and return counter exceeds the method counter threshold.

Backside counter: Trigger just-in-time compilation for on-stack replacement when the number of cycles for a block of code reaches a threshold.

With just-in-time compilation, the interpreter continues to execute the code without waiting for compilation to complete.

The build process

In general, when just-in-time compilation is triggered, it runs in the background compilation thread, and the program continues to run the code using the interpreter. The specific compilation process depends on the virtual machine implementation, and is relatively low-level and complex, not listed here.

Ahead of time compiler

AOT(Ahead Of Time) compilation technology, the best reference in Android, a better embodiment Of ART technology.

The pros and cons of precompilation

There are two ways to precompile:

1 one ️ as GCC/G++, the program is directly and statically translated into machine code to run.
2 one ️ Advance the real-time compilation to do the work and save the compilation results, need to use direct loading.

The first approach addresses the weakness of just-in-time compilation: the compilation process takes up execution time.

The second approach is cleaner and can even be called just-in-time compile caching.

The first type is like a direct compilation, the second type is like a modular compilation, run until not compiled, directly take the compiled module to use. But it’s all compiled and processed in advance.

The just-in-time compiler seems like a drag by comparison, but is it? Take a look at the advantages of the just-in-time compiler over AOT:

1 Performance analysis and optimization of ️.
2 Radical predictive optimization of ️. This ensures that most virtual methods with inline value can be inlined, which is important for optimization.
3 discount ️ link optimization.

Compiler optimization techniques

There are many optimization techniques, but let’s just look at some of the most important ones to know:

The most important optimization technique: method inlining
Cutting edge optimization technique: Escape analysis
Language-independent optimization techniques: common subexpression elimination
Language-related optimization techniques: array boundary checking elimination

Methods the inline

Method inlining is the most important optimization technique, and all other optimization techniques need to be based on it. Method inlining simply copies the code of the method to be called into the current method so that execution can continue without the call. Because it avoids the actual method calls, it greatly improves performance.

P.S. Because each method call means pushing, allocating stack frames, setting local variables, operand stacks, etc., even if compiled into binary code, it still requires pushing, setting parameters, passing parameters, etc., a bunch of Balabala operations.

Which methods can be inlined? All static methods, constructors, private methods, superclass methods, and final modified virtual methods are non-virtual and can be directly inlined. Is this the end of it? That would be too boring to optimize. Virtual method: Do you think I have a chance?

To be clear, virtual methods refer to methods that need to dynamically determine the recipient of a method at run time (i.e., Java’s runtime polymorphism, the method that determines which subclass to use). Now that being said, I need to decide at runtime which method to use collectively, which means that at compile time I don’t know which subclass of method implementation I should inline.

But we know that a lot of times, the whole App is only going to be implemented in one subclass, or even better, the class is only going to be implemented in one subclass, so it’s only really clear which one we’re going to call. At this point, you can directly inline. An example of this is Spring’s singleton pattern.

With that said, let’s take a look at what the JVM does. To solve the inlining problem of virtual methods, the JVM introduces a technique called type inheritance relationship analysis (CHA) for analysis. You can almost guess what this thing does from the name. It does inheritance analysis and inlines virtual method calls that have only one implementation or can be uniquely identified. This is radical optimization, however, because inheritance relationships can also change dynamically, such as the presence of dynamic proxies, which need to fall back to the interpreter.

In addition, if multiple version selection is encountered, the inline method caching behavior is performed again for further optimization. If this does not work, the virtual method table lookup is really used for the case of inconsistent method recipients.

Escape analysis

Escape analysis can be found in many languages. It simply means analyzing the scope of a variable to find out its possible scope. For example, if method A defines A variable and method B makes A reference, we can say that the variable escapes from the method. Similarly, thread escape.

There are three different degrees of escape: never escape, method escape, and thread escape. For different escape situations, the spatial allocation of variables can be optimized to different degrees:

Stack allocation. When a variable does not escape the thread, stack allocation can be used, so that space is freed automatically when the thread terminates. More specifically, if a method’s variables do not escape the method, they can be allocated on that method’s stack frame so that space can be reclaimed after the method ends.
Scalar substitution. If a variable is too small to decompose, such as primitive data types and reference types, it is called a scalar. Conversely, if the object continues to decompose, it is called an aggregate quantity. When an aggregate is divided into many scalars for access, it is required that the aggregate cannot escape from the method. In this case, objects can not be created, but only scalars can be created for access.
Synchronous elimination. If a shared variable is determined not to escape to another thread, synchronization is not necessary and can be removed.

Escape analysis algorithms are very expensive to calculate, and to use them, you must ensure that the benefits of using them can cover the costs. So right now it’s still experimental.

Common subexpression elimination

If the value of an expression E has been evaluated, and the constituent variable of E has not changed, then the next use of E can be directly replaced by the value. This occurrence of E is called a common subexpression.

Array boundary checking

Loop boundaries are checked at compile time, or some other attempt to access array boundaries. Security checks can reduce array exception detection at run time.

Another type of handling, called implicit exception handling, assumes that exceptions to be handled are rare, and then uses operating system-level exception handling to catch them, each time assuming that the exception will not occur, proceed to the next step, and when it does occur, trigger an interrupt that triggers interrupt handler processing. This is an expensive process to do only if you can be sure that the exception to be caught is rare.

Here’s an example of a common null check:

if(foo ! =null) {
    return foo.value;
} else {
    throw new NullPointException();
}
Copy the code

With implicit exception handling, you can do this:

try {
    return foo.value;
} catch (segment_fault) {
    uncommon_trap();
}
Copy the code

When foo is truly empty, a segment error is triggered and the exception is handled using an interrupt handler.