Abstract: why does C++ compile much slower than Java? What is the speed difference between the two? You can understand this problem if you look at Java’s early and late processes.

Do you Really know Java compiler optimization? 15 Questions to ask yourself if you do?

First of all, why does C++ compile so much slower than Java? What is the speed difference between the two? You can understand this problem if you look at Java’s early and late processes.

Here are 15 questions to make sure you really understand them, and if you don’t, take a look at the “NOTES on JVM Compilation and Optimization” section at the end of this article.

Early compilation process

Q: What were the three steps in the early Java compilation process? A:

1. Parsing lexical grammar and filling symbol table

2. Annotation processing

3. Semantic analysis and bytecode generation.

Q: What is the symbol table for in the above steps? A: SYMBOL table is A table of symbol addresses and symbol information.

  • Used for syntax checks in later stages to retrieve information from tables for comparison.

  • Symbol table is the basis of address assignment during object code generation

Q: What does the annotation processor do? A: The annotation processor scans for annotated elements in the abstract syntax tree and updates the syntax tree. The point is that he updates it based on the syntax tree. After the update we’ll go back to parsing and populating and reprocess.

Q: Which of the above three steps is the solution sugar? A: It’s the third step, the syntactic sugar processing that happens when the bytecode is generated.

Q: What is French sugar? What are they? A:

  • The virtual machine itself does not support this syntax, but converts it into normal syntax structures at compile time.

  • Including automatic unpacking, generic strong transfer applications.

Q: Is there a difference between final and non-final local variables when generating a bytecode class file? A: No difference. Local variables do not hold symbolic references in the constant pool, so there is no acesses_FLASG information. ** Therefore final local variables have no effect at run time and are checked only at compile time. **

Q: At what stage will a= 1 + 2 be optimized? A: Constant folding is done during the semantic analysis of the earlier compilation process to become A =3. Similarly, the string + sign is optimized to stringBuilder.append() at this stage.

Q: Class objects are loaded in a bunch of sequences

A: In bytecode.

  • During bytecode generation, the compiler generates a method for the object new that specifies the order in which the members and constructors are called.

  • The order in which static members of a class are called is similarly encapsulated in.

Late compiler optimization

Q: What is the difference between early and late compilation optimizations? A:

  • In the early days of compiler optimization, Java files were converted into bytecode, and some simple optimizations and syntactic sugar processing were done in the process of bytecode conversion.

  • Late compiler optimization refers to dynamic optimization combined with some information in the process of bytecode to machine code execution, or many machine code optimization measures are applied.

Q: When Java program runs, is it directly converted to the optimized machine code to run again? A: error.

  • When the program is first launched, it uses the interpreter immediately, without much optimization, to explain execution directly.

  • After the program runs, the compiler comes into play, gradually compiling unused code into machine code.

Note the difference between the compiler here and the one mentioned earlier, one compiled to bytecode and the other compiled to machine code.

Q: There are two types of late optimized compilers

  • Client Compiler — C1 Compiler

  • Server Compiler — C2 Compiler

What’s the difference between them?

A:

  • Difference between speed and mass:

C1 compiler, higher compilation speed, compilation quality is average.

C2 compiler, better compile quality, but slower.

  • Differences in optimization characteristics

C1 compilers are optimized operations that do not require runtime information.

The C2 compiler makes aggressive and dynamic optimizations based on the monitoring information provided by the interpreter

Q: How do you distinguish between C1 and C2 in Java? A: About the parameters of these two compilers:

  • -Xint parameter: Force the explain mode to be used

  • -xcomp parameter: enforce compilation mode (but explain if compilation fails)

  • When selecting compilation mode, you can choose from -client, -server, and MixedMode

In mixed mode, JDK7 introduces a layered compilation strategy: Layer 0: explain execution. Performance monitoring is not enabled. Layer 1: C1 compilation, which compiles bytecode to native code for some simple optimizations and adds performance monitoring layer 2: C2 compilation, which starts long optimizations and makes aggressive optimizations based on performance monitoring information

Q: In hierarchical optimization, how does the JVM know which code needs JIT or OSR optimization if it is running? A:

1. Methods that are called multiple times. Trigger JIT compilation (hot code counter)

2. The body of a loop that is executed multiple times triggers OSR compilation (on-stack substitution), which occurs during method execution, so the method is compiled and switched on the stack. (Use back edge counter)

Q: Which methods are inlined in early optimizations and which methods are inlined in late optimizations? A:

  • Methods that cannot be overridden by inheritance, such as private, constructor, static, and so on, can be optimized inline directly in earlier optimizations.

  • Other methods that would be abstractly inherited could not be inlined early on because they did not know which code was actually being used.

  • In late-stage optimization, you can use some runtime information to determine whether or not you’re always going to run with a certain subclass method, and if so, try inlining, and if there’s another subclass, cut back.

Q: Java arrays are usually automatically checked for boundaries, and exceptions are thrown if they are not met. When will this automatic check be optimized away? A: At run time, the check action will be optimized if it is found that the passed argument will not exceed the bounds when used in the array.

Why C++ and Java compile and run faster?

1. Java just-in-time compilation may affect the user experience if there is a significant delay in running.

2. There are more virtual methods in Java than in C++, because there are more and more checks and optimizations to do inline analysis

3. Java always need to do security check, C++ do not do, I made a mistake directly crash crossed the line

4. Memory release in C++ is controlled by the user, without the need for a garbage collector to always check and operate in the background

5. Java benefits: just-in-time compilation can be optimized for runtime performance monitoring, which C++ cannot

JVM compiler optimization study notes

In the early

The compilation process can be broadly divided into three categories:

1. Parse and populate the symbol table

2. Annotation processing

3. Analysis and bytecode generation

Key points:

  • Lexical parsing is the first step, generating symbols

  • Annotation processing is the second step

  • Then syntactic sugar, bytecode is all about step three.

Detailed explanation of the above steps:

The first step:

——- lexical analysis:

That’s code to token. For example, int a=b+2 converts to int \a=\b+ 2.

——- syntax analysis (note that it is actually just a syntax tree, and no syntax verification has been done) :

Based on the generated tokens, an abstract syntax tree is constructed.

——- Fill symbol table:

Generates a table of symbolic addresses and symbolic information. (The third stage is used for annotation checks in semantic analysis, such as whether names are used in accordance with instructions, and for generating intermediate code.) Symbol tables are the basis for address assignments in object code generation

The second step:

——- Annotation processor:

The annotation processor scans for annotated elements in the abstract syntax tree and updates the syntax tree. After the update we’ll go back to parsing and populating and reprocess. The processor is a plug-in to which we can constantly add ourselves.

Note that the above 2 steps are simply to convert the source file and do not involve any syntactic rules.

Step 3:

——-

Determine if the syntax tree is correct. There are two kinds of examinations:

1. Annotation check: check whether the variable has been declared, assigned value, and whether the data type of the equation matches. Constant folding will be carried out in annotation check.

2. Data and control flow analysis further verifies the program context logic, which involves many interactive context interaction dependencies, such as whether the method with return value contains all the return paths, whether the checked exception is handled externally, and whether local variables are assigned before being used.

There is no difference between final local variables (or final parameters) and non-final local variables in the generated class file. Because local variables do not hold symbolic references in the constant pool, there is no acesses_FLASG information. So the class file doesn’t know if a local variable is final, so final parts have no effect at runtime and are checked only at compile time.

——- Solution sugar:

The virtual machine itself does not support this syntax, but converts the syntactic sugar into normal syntactic structures at compile time (in other words, it converts syntactic sugar code into normal code, such as auto-unboxing, possibly into a specific call to the wrapper method).

——- bytecode generation:

The initialization order of an object actually converges to a method during the bytecode generation phase. Note that the default constructor is done during the fill symbol table phase. The string substitution (+ operation to sb) is generated at the bytecode stage. After walking through the syntax tree, the final symbol table is handed over to the ClassWRITE class, with design concepts from a bytecode and file

In the late

In HotSpot, the interpreter and compiler coexist. When the program is first started, it uses the interpreter immediately. After the program runs, the compiler comes into play, gradually compiling code that is not yet used. When memory resources are low, the interpreter can be used to run the program, reducing the number of files generated by compilation. If the compiler optimizations are buggy, they can be run by reverse-tuning back to the original interpreter mode

The interpreter Interperter

The compiler

There are two kinds of compilers

  • Client Compiler – C1 Compiler, higher compilation speed

  • Server Compiler – C2 Compiler, better compile quality

That is, when -client or -server is selected.

  • Default MixedMode: interpreter and compiler coexist, known as MixedMode.

Parameters for these two compilers:

  • -Xint parameter: Force the explain mode to be used

  • -xcomp parameter: enforce compilation mode (but explain if compilation fails)

In mixed mode, the interpreter needs to collect performance information for compile-time judgment and optimization. This performance information is a bit wasteful, so JDK7 introduces a hierarchical compilation strategy:

  • Layer 0: Explain execution. Performance monitoring is not enabled.

  • Level 1: C1 compilation, which compiles bytecode to native code, makes some simple optimizations, and adds performance monitoring

  • Layer 2: C2 compilation, which takes longer to start optimizations and makes aggressive optimizations based on performance monitoring information

The difference between CC and SC compilation is as follows:

  • Client Compiler Compilation process:

Front-end bytecode – method inlining/constant propagation (basic optimization) – HIR (Advanced intermediate code) – null-check elimination/range-check elimination

– “back end HIR to LLR (low-level intermediate code) -” “Linear scan algorithm register allocation -” “peep hole optimization -” “machine code generation -” “native code generation

These are operations that can be optimized without runtime information

  • ServerCompiler Compilation process:

Performs all classic optimization actions

Aggressive optimizations are made based on monitoring information provided by the CC or interpreter

Register allocator is a global graph coloring allocator

Some common measures of late optimization (i.e., steps that are optimized only during operation)

—- Hotspot code

1. Methods that are called multiple times. JIT compilation is triggered

2. The body of a loop that is executed multiple times triggers OSR compilation (on-stack substitution), which occurs during method execution, so the method is compiled and switched on the stack.

HotSpot uses the HotSpot detection method of the counter to determine HotSpot code. * Establish method counters for each method, and if a threshold is exceeded during a cycle, JIT compilation is triggered and method entries are replaced after compilation. * If not exceeded within a cycle, the counter /2 (half-decay) * if not triggered, is interpreted to run rigidlyaccording to the bytecode content.

The counter parameter -xx :-UserCounterDecay Disables heat decay. -xx: CounterHalfLifeTime sets the half-life. -xx :CompileThreshold Sets the method compilation threshold

A loopback counter is a counter that counts the number of cycles * without half decay * but is lowered when OSR compilation is triggered to avoid repeated firing at run time. * overflows, and the method counter is also adjusted to overflow. * In Clint mode and server mode, OSR thresholds are calculated differently. Clint =CompileThredshold * OSR ratio, server=CompileThredshold * (OSR ratio – interpreter monitoring ratio)

– Redundant access deletion:

A. value = a value = a value = a value = a value = a value = a value = a value = a value

y=b.value
z=y
c = z + y
Copy the code

become

y = b.value
y = y
c = y + y
Copy the code

Useless code elimination: Remove Y= Y above

—- Common subexpression elimination

It’s a simplification of some of the longer formulas a plus a plus b 2_ will optimize to a_3 plus b times 2 as many times as possible

Array bounds check:

If you can be sure that a for loop does not operate outside the range of arrays, then [] does not perform array bounds checking.

— Implicit exception handling:

If (a == null) {XXX}else{throw Exception} try {XXX} catch(Exception e) {throw e}

—- method inlining:

Methods that cannot be overridden by inheritance, such as private, constructor, static, and so on, can be optimized inline directly in earlier optimizations.

Other methods that are abstractly inherited cannot be inlined by the compiler because it does not know which code is actually being used.

  • Final methods are not non-virtual (why?)

  • Type inheritance analysis CHA: If a virtual method is found, the CHA checks whether there are multiple implementations of the method in the current VM. If only one implementation is found, the method can be directly inlined.

  • If more than one implementation of the method is used after other classes are dynamically loaded, the compiled inline code is discarded and execution falls back to the interpreted state.

  • Inline caching: Even if an application finds multiple implementations of the method, it still inlines the first method used, unless another override method is called (that is, if you define it, you probably don’t use it, so I’ll always use your first method unless you actually use multiple override methods to run.

—- Escape analysis:

If the new object is only used within the method, it will not be referred to outside the method, then the optimization will be done, such as: * Instead of putting the new object on the heap, it will be put on the method stack, and the object will disappear when the method ends. * Scalar substitution: The smallest primitive type member of this object is separated and used as a local variable.

—- Java vs. C++, just-in-time vs. statically compiled:

1. Just-in-time compilation may affect the user experience if there is a significant delay at run time

2. There are more virtual methods in Java than in C++, because there are more and more checks and optimizations to do inline analysis

4. In C++, memory is freed for the user to control, no need to have a garbage collector in the background to always check and operate 5. Java benefits: just-in-time compilation can be optimized with runtime performance monitoring, which C++ cannot do.

Click to follow, the first time to learn about Huawei cloud fresh technology ~