Point attention, don’t get lost; Keep updating Java architecture related technology and hot news!!

JVM optimization for code execution can be divided into runtime optimization and just-in-time compiler (JIT) optimization. Runtime optimization mainly explains some of the mechanisms common to execution and dynamic compilation, such as locking mechanisms (such as skew locking), memory allocation mechanisms (such as TLAB), and so on. In addition, there are several dedicated to optimizing the efficiency of interpretation execution, such as the template interpreter and inline cache (which optimizes the dynamic binding of virtual method calls).

Just-in-time compiler optimization for the JVM involves converting hot code into machine code on a method basis to run directly on the underlying hardware. It uses a variety of methods of optimization, including methods available to static compilers such as method inlining, escape analysis, and speculative optimization based on program run profiles. How do I understand this? For example, if I have an Instanceof instruction and the test object’s class remains the same during the execution before compilation, the just-in-time compiler can assume that it will be the same class during the execution after compilation and return the result of instanceof directly based on that class. If another class appears, discard the compiled machine code and switch back to explain execution.

Of course, the JVM optimizations only apply when running application code. If the application code itself blocks, for example while waiting for the result of another thread while running concurrently, this is not JVM optimization.

The test analysis

This interview question has been asked by many students in this column, and it is also a point of knowledge that will be asked to the bottom of the interview by the interviewer.

Most Java engineers are not JVM engineers, and knowledge will have to be grounded. The interviewer will most likely discuss practical aspects, such as how to interact with JVM modules such as the JIT in production practice, and how to actually tune them.

In today’s lecture, I’ll focus on:

Understanding the process of compiling and executing Java code as a whole is intended to give you an intuitive view of the basic mechanisms and flow to ensure you understand the logic behind your tuning choices.

From the perspective of production system tuning, this paper discusses the possible ideas of applying JIT knowledge to practical work. This consists of two parts: how to collect JIT-related information, and the specific tuning tools.

Knowledge extension

First, let’s take a holistic look at the entire Life cycle of Java code, which you can refer to in the diagram I provided

As I mentioned, Java shields hardware differences by introducing an intermediate representation of bytecode, leaving the JVM to take care of the translation from bytecode to machine code.

Compile time is the process by which a compiler such as javac or related API transforms source code into bytecode. This stage also involves minor optimizations such as constant folding, so you can see the details directly using a decompile tool.

Java optimization is also related to internal JVM optimization, after all, it is responsible for bytecode generation. For example, string concatenation in Java 9 is replaced by javac with calls to StringConcatFactory, providing a unified entry point for string concatenation optimization by the JVM. In a real world scenario, you can also interfere with this process through different policy options.

Today I’m going to focus on JVM runtime optimization. In general, the compiler and interpreter work together, as illustrated in the diagram below

The JVM dynamically decides, based on statistics, what methods are compiled and what methods are interpreted to execute. Even compiled Code may not be hot at various runtime stages, and the JVM needs to remove such Code from the Code Cache because its size is limited.

Lock the optimization

The Intrinsic mechanism, or built-in methods, is that the JDK team directly provides a custom implementation of a particularly important base method, written in the middle of an assembly or compiler, which the JVM replaces directly at run time.

There are a number of reasons for doing this, such as the fact that cpus of different architectures differ in terms of instructions and so on, and customization brings out the best in the hardware. Hotspot provides built-in implementations of the typical string manipulation, array copying, and other basic methods we use everyday.

The just-in-time compiler (JIT) is responsible for more optimization. The basic unit of JIT compilation for Java is the entire method, which counts method calls to identify hot methods and compile them into native code. Another optimization scenario, which is most targeted at so-called hot loop code, is usually referred to as on-stack Replacement (OSR, on-stack Replacement, for more details). If the method itself is not called frequently enough to compile the standard, but there is a large internal loop, etc., There will be further optimization value.

In theory, THE JIT can be thought of as being implemented based on two counters, a method counter and a loopback counter that provide the JVM with statistics to locate hot code. In practice, JIT is much more complex. Dr. Zheng has mentioned escape analysis, loop unrolling, method inlining, and other general mechanisms, including the intrinsics mentioned earlier, that also occur at the JIT stage.

Second, what means are there to see how these optimizations are actually happening?

A few have been introduced in the column, so LET me summarize briefly and fill in some of the details.

Print the details of how compilation takes place.

Output more compilation details.

The JVM will generate a file in the form of XML, and the LogFile option is optional and will be printed to

The format can be found in the JitWatch tool and analysis guide provided by Ben Evans.

Print the occurrence of inline, using the diagnostic options below, which also need to be explicitly unlocked.

How do I know the Code Cache usage status?

Many tools already provide specific statistics, such as JMC, JConsole, and so on, and I’ve described using NMT to monitor their usage.

Third, what tuning angles and tools are at our disposal as application developers?

Adjust the hotspot code threshold

I have described the default JIT threshold, which is 10000 for server mode and 1500 for client. The threshold size can also be tuned using the following parameters; At the same time, this parameter can also play a disguised role in reducing the preheating time.

Many people may have questions, since it is hot, is not to reach the threshold number sooner or later? This may not be true because the JVM will periodically decrement the value of the CompileThreshold so that the call counter will never reach the threshold. Another way to do this is to turn off the counter decrement.

If you are using the DEBUG version of the JDK, you can also experiment with the following parameters, but the production version does not support this option.

Adjust the Code Cache size

We know that JIT-compiled Code is stored in the Code Cache, but it is important to note that the Code Cache is limited in size and does not adjust dynamically. This means that if the Code Cache is too small, only a small percentage of Code may be JIT compiled, leaving the rest with no choice but to interpret execution. So, a potential tuning advantage is to adjust its size limit.

Of course, you can also adjust its initial size.

Note that in relatively recent versions of Java, because of Tiered Compilation, the space requirements of the Code Cache increased significantly, and the default size itself was increased.

Adjust the number of compiler threads, or select the appropriate compiler mode

The number of compiler threads for the JVM depends on the mode we choose. In client mode, there is only one compilation thread by default, and in Server mode, there are two by default. In the current most common hierarchical compilation mode, C1 and C2 are calculated based on the number of CPU cores. You can specify the number of threads to compile with the following argument.

In a robust multi-processor environment, increasing the number of compilation threads can make better use of CPU resources and make processes such as warm-up faster. However, the reverse can also cause compilation threads to scramble for too many resources, especially if the system is very busy. For example, if the system has multiple Java application instances deployed, then reducing the number of compilation threads may be considered.

In production practice, it has also been recommended to turn off hierarchical compilation on the server and use the Server compiler directly, which leads to a slightly slower warm-up, but may result in a small throughput gain for certain workloads.

Some other so-called “optimizations” with confusing relative boundaries

For example, reduce access to safe points. Strictly speaking, this happens much more frequently than during dynamic compilation, and you can diagnose the impact of security points with the following options.

Note that after the JDK 9 PrintGCApplicationStoppedTime has been removed, you need to use “- Xlog: safepoint” way to specify.

Many optimization phases can be associated with safety points, such as:

Scenarios such as reverse optimization may require safety points to be inserted during JIT.

Normal lock optimization phases may also occur, for example, where the skew lock is designed to avoid synchronization overhead when there is no contention, but when a contention does occur, removing the skew lock triggers a safety point, which is a heavy operation. Therefore, the value of skew locks is actually questioned in concurrent scenarios, and it is often explicitly recommended that skew locks be turned off.

Point attention, don’t get lost; Keep updating Java architecture related technology and hot news!!