Introduction
For most application developers, the Java compiler refers to the Javac directives that come with the JDK. This instruction compiles the Java source program into a.class file, which contains code in a format we call Java Bytecode. This code format cannot be run directly, but can be executed by interpreter interpretation in JVMS on different platforms. Because interpreter is inefficient, the JIT compiler (just-in-time compiler) in the JVM selectively compiles frequently run methods into binary code at run time to run directly on the underlying hardware. Oracle’s HotSpot VM comes with two JIT compiler implementations in C++ : C1 and C2.
In contrast to the Interpreter, GC, and other subsystems of the JVM, JIT Compiler does not rely on low-level language features such as direct memory access. It can be viewed as a black box for input Java Bytecode output binary code, and its implementation depends on the developer’s requirements for development efficiency, maintainability, and so on. Graal is a Compiler for Java Bytecode that uses Java as the primary programming language. It is much more modular and easier to maintain than C1 and C2 implemented in C++. Graal can be used as a dynamic compiler to compile hot methods at run time. It can also be used as a static compiler to achieve AOT compilation. Graal was released with Java 10 as an experimental JIT compiler (JEP 317). This article will introduce Graal to dynamic compilation. For static compilation, consult JEP 295 or Substrate VM.
Tiered Compilation
Before we talk about Graal, we look at Tiered Compilation in HotSpot. As mentioned earlier, HotSpot integrates two JIT compiler-C1 and C2 (or Client and Server). The difference between the two is that the former does not apply aggressive optimization techniques, as these optimizations are often accompanied by lengthy code analysis. As a result, C1 compiles faster, while C2 compiles faster. Before Java 7, users needed to choose an appropriate JIT compiler based on their application scenarios. For example, C1 is used for GUI client programs that prefer high startup performance, and C2 is used for server programs that prefer peak performance.
Java 7 introduced the concept of tiered Compilation, combining the high start performance of C1 with the peak performance of C2. These two JIT compiler and Interpreter divide the way HotSpot executes into five levels:
-
Level 0: Interpreter interprets the execution
-
Level 1: C1 compiled without profiling
-
Level 2: C1 compilation, only method and loop back-edge profiling
-
Level 3: C1 compilation, in addition to profiling in Level 2, includes branch (for branch jump bytecode) and Receiver Type (for member method calls or class checks such as checkcast, Instnaceof, aastore bytecode) profiling
-
Level 4: C2 compile Where levels 1 and 4 are the accepted states – HotSpot does not issue a compile request for the compiled method unless it is invalidated (usually triggered in DeOptimization).
The figure above illustrates four (but not all) compilation modes. Typically, a method is first interpreted (level 0), then compiled by C1 (level 3), and then compiled by C2 (Level 4), which yields profile data. If the compiled object is simple enough that the virtual machine sees no difference between compiling through C1 or C2, it will compile directly from C1 without inserting profiling code (Level 1). In cases where C1 is busy, Interpreter triggers profiling, and the method is compiled directly by C2; In the case of busy C2, methods are compiled by C1 first and maintain less profiling (level 2) to achieve higher execution efficiency (30% higher than level 3).
Graal can replace C2 as the top-level JIT compiler for HotSpot, namely level 4 above. Graal takes a more aggressive approach to optimization than C2, so when a program reaches a steady state, its execution efficiency (peak performance) is superior.
Early Graal, like C1 and C2, was tightly coupled to HotSpot. This means that HotSpot needs to be recompiled every time Graal is compiled. JEP 243 separates HotSpot dependent code from Graal into a Java-level JVM Compiler Interface (JVMCI). The interface provides the following functions:
-
Respond to HotSpot’s compile request and distribute to java-level JIT compiler
-
Allows Java-level JIT Compiler to access JIT compilation-related data structures in HotSpot, including classes, fields, methods, and profiling data, and provides Java Level abstraction of these data structures
-
Provides a Java abstraction for HotSpot Codecache, allowing the Java-level JIT compiler to deploy compiled binaries
By combining these three capabilities, we can integrate a Java-level compiler (not limited to Graal) into HotSpot, respond to Level 4 compile requests from HotSpot and deploy the compiled binaries into HotSpot’s Codecache. In addition, the third feature can be used independently to bypass HotSpot’s compilation system – the Java-level compiler will deploy the compiled binaries directly as a class library for the upper-layer application. Graal’s own unit tests rely on direct deployment rather than waiting for HotSpot to compile requests; Truffle also uses this mechanism to deploy compiled language interpreters.
Graal v.s. C2
As mentioned earlier, JIT Compiler does not depend on the underlying language features; it is simply a conversion from one code form to another. Therefore, any optimization implemented in C++ in C2 can theoretically be implemented in Java in Graal and vice versa. In fact, many of the optimizations implemented in C2 have been ported to Graal, such as the recent porting of String.compareTo Intrinsic contributed by other developers. Of course, many of Graal’s proven optimizations, such as Graal’s inlining algorithm and partial escape analysis (PEA), have not been successfully ported to C2 due to the development/maintenance difficulties of C++ (my guess).
Inlining refers to identifying the target method of a callsite at compile time, including its body in the compile scope, and replacing the callsite with the result it returns. The simplest and most intuitive example is the getter/setter method common in Java – inlining can optimize callsites that call getters/setters in a method into a single memory access instruction. Inlining is nicknamed the mother of optimization in the industry because it leads to more optimization. In practice, however, we are often limited by the size of the compilation unit or compilation time and cannot recurse inline indefinitely. Therefore, inlining’s algorithms and strategies largely determine compiler performance, especially when using Java 8’s Stream API or the Scala language. The Java Bytecodes corresponding to these two scenarios contain a large number of multi-layer single-method calls.
Graal has two Inliner implementations. The community version of Inliner takes a depth-first search approach, tracing back to the method caller whenever a callsite is not inline worthy while analyzing a method. Graal allows you to customize policies to determine whether a callsite is inline or not. By default, Graal takes a relatively greedy strategy, making decisions based on the size of the callsite’s target method. Graal Enterprise’s Inliner does a weighted ranking of all callsites, depending on the size of the target method and the optimizations it might trigger. When the target method is inline, the callsite it contains is also placed in the weighted queue. Both search methods are suitable for applications with multiple single-method calls.
Escape analysis (EA) is a class of program analysis that identifies the dynamic range of an object. Common applications in compilers fall into two categories: removing locks on objects that are accessed only by a single thread; If the object is heap-allocated and only accessed by a single method (again, the importance of inlining), then the object can be converted to stack allocation. The latter is usually accompanied by scalar replacement, in which access to object fields is replaced with access to virtual local operands, further converting an object from a stack allocation to a virtual allocation. Not only does this save memory that would otherwise be used to hold object headers, but with the help of Register Allocator you can store (some) object fields in registers, improving execution efficiency while saving memory (converting memory access to register access).
For-each loop, common in Java, is a big target for EA. We know that the for-each loop calls the iterator method of the iterator, returns an object that implements the Interface iterator, and iterates using its hasNext and next interfaces. Container classes in Java Collections, such as ArrayList, typically construct a new instance of Iterator whose life is confined to the for-each loop. If the Iterator instance’s constructor and hasNext, next method calls (along with method calls in their method bodies with this as receiver, such as checkForComodification()) are inline, EA assumes that the instance has not escaped. Besides, the stack allocation and scalar replacement are adopted.
Ideally, foo. bar would be optimized to look like this:
C2 for HotSpot has already implemented Scalar Replacement with control-flow-independent EA. Graal’s PEA introduces control flow information on this basis, virtualizing all heap allocation operations and only at the branch of Materialize where the object is determined to escape. Compared with C2’s EA, PEA analysis is less efficient, but it can achieve Scalar replacement on branches where objects do not escape. As shown in the following example, if then-Branch has a 1% probability of execution, then peA-optimized code does not perform heap allocation 99% of the time, whereas C2’s EA does heap allocation 100% of the time. Another typical example is the rendering engine Sunflow – when running the default workload that comes with DaCapo Benchmark Suite, Graal’s PEA determines that approximately 27% of the heap allocation (700M in total) can be virtualized. This figure far exceeds the EA of C2.
Using Graal
In Java 10 (Linux/ X64, macOS/ X64) HotSpot still uses C2 by default, But by adding the – XX to the Java command: + UnlockExperimentalVMOptions – XX: + and C2 can be replaced with Graal UseJVMCICompiler parameters.
Oracle Labs GraalVM is a version of the JDK released directly by Oracle Labs. It is based on Java 8 and includes Graal Enterprise. If you are interested in the source code, check out the GitHub repo of Graal Community edition directly. Source code is compiled using mx tools and labsJDK (note: please download labsJDK at the bottom of the page, using GraalVM directly may cause compilation problems).
Use MX EclipseInit, MX Intellijinit, or MX NetBeansInit in the Graal/Compiler directory to generate Eclipse, IntelliJ, or NetBeans engineering configuration files, respectively.
AD time
No way, a lot of friends want to find me free advertising, this time not take the opportunity to talk about the estimate will find me trouble, there is a need for yourself, disturb you
-
Recently we released the JVM parameter analysis product XXFox, which can generate JVM parameters with one click, and can check whether there are some problems with your existing JVM parameters. If you have experienced it, you can experience it, http://xxfox.perfma.com, you can click the following [original reading] to experience it
-
The author of this article, Yu Di, will be the speaker of 2018QCON Beijing to share “GraalVM and its Ecosystem”. If you are interested, please pay attention to 2018QCON Beijing
-
Our team is based in Hangzhou, and we hope to do some performance analysis related products around JVM. If you are interested, please join us
-
My friend’s company, CODING (https://coding.net/), is recruiting a large number of Java and full stack development, positioning shenzhen, requirements are mainly three keywords: Spring, Linux, Docker. Not only can they afford the money, but they are making cool products: Git code hosting services, collaboration management tools for vertical technology teams, WebIDE, and more. For more information, please email: [email protected] or add wechat to learn about panpan071.
-
Ali, including Ant, has several teams that also need Java people. Because there are too many teams, I will not mention them. If you are interested, please directly add my wechat [han_quanzi], and mark the place you want to vote and I will forward it for you