The introduction
The execution engine subsystem is one of the important parts of the JVM. In the beginning of the JVM series, we mentioned that the JVM is a platform built on a platform, and the virtual machine is a concept similar to the “physical machine”, which has the same code execution capability. However, the biggest differences between virtual machines and physical machines are as follows: Physical machine execution engine is directly based on instruction set processor, caching, platform and operating system level, physical machine execution engine can directly call all resources to direct execution of code, while the virtual machine is based on the software level of the platform, to interpret its execution engine is compiled execute instruction set of the definition of the code itself. At the same time, it is Java’s design of the STRUCTURE of the JVM virtual machine that enables Java to implement the concept of “compile once, execute everywhere” regardless of the physical platform.
Knowledge of the execution engine is very helpful in understanding the JVM, but the existing articles/books on the JVM have little or no coverage of this area. This article is intended to provide a comprehensive overview of the EXECUTION engine subsystem of the JVM.
The relationship between machine code, instruction set and assembly language and high-level language
Before preparing to analyze the JVM’s execution engine, we must first understand the relationship between machine code, instruction set, assembly language, and high-level language. Only after understanding the relationship between these several can we better understand the PRINCIPLE of the JVM’s execution engine.
1.1 machine code
Machine, also known as the machine instruction code, that is, refers to various by binary encoding instruction (011101, 11110, etc.), said at the beginning of the programmer is to write programs in this way, in this way to write code can be read directly by the CPU, because the most close to the hardware machine, so also is the fastest instruction execution. But because these instructions are closely related to the CPU, different types of CPU correspond to different mechanical instructions. At the same time, mechanical instructions are all made up of binary numbers, which are too complicated, difficult to understand, easy to remember and error-prone for people. The way of final instruction replaces this coding method.
1.2 instructions and instruction sets
Because machine code is composed of 0 and 1 instruction code, the readability is really poor, so slowly introduced instructions, used to replace the machine code encoding mode. Instruction refers to the simplification of the specific sequence of 0 and 1 in the mechanical code into corresponding instructions, such as INC, DEC, MOV, etc. In terms of readability, it is much better than the previous machine code composed of binary sequence. However, because different hardware platforms have different architecture, the corresponding machine code is often different when executing an instruction. Therefore, different hardware platforms will have different machine code even if the same instruction (such as INC) is executed.
At the same time, because the instructions supported by different hardware platforms are slightly different, the instructions supported by each platform are called the corresponding instruction set. For example, X86 instruction set corresponding to X86 architecture platform, ARM instruction set corresponding to ARM architecture platform, etc.
1.3 assembly language
Although the machine code composed of 0 and 1 sequences was replaced by instructions and instruction sets, the readability of instructions was relatively poor, so assembly language was invented. In assembly language, Mnemonics are used to replace the opcodes of machine instructions, and address symbols and labels are used to replace the addresses of instructions or operands. In different platforms, assembly code corresponds to different instruction sets, but because the computer only recognizes machine code, so the program written by assembly language must go through the assembly stage and become machine instruction code that can be recognized by the computer before it can be executed.
1.4. High-level languages
In order to make it easier for developers to write programs, various high-level languages emerged, such as Java, Python, Go, Rust, and so on. High-level language compared to the previous machine code, instruction, assembly and other methods, more readable, code writing is less difficult. But the program written by high-level language, it needs to go through the interpretation or compilation process, first translated into assembly instructions, and then through the assembly process, converted into a computer recognizable machine instruction code to execute.
OK~, a brief description of the machine code, instruction set and assembly language, high-level language, from this exposition, we can know that Java belongs to a high-level language, in the execution of the code it needs to be compiled into assembly instructions, and then converted into mechanical instructions to be recognized by the computer. But it seems that when we use Java, there is no such process. What’s the reason for that?
This is because Java has a virtual platform, the JVM, whose main job is to load the bytecode files generated by the compilation of JavAC into the JVM, but the bytecode cannot run directly on the operating system because the bytecode instructions are not equivalent to the local machine instructions. It contains only bytecode instructions, symbol tables, and other auxiliary information that can be recognized by the JVM, but not directly by the OS. So what is the root cause of a Java program running on an operating system? The answer is: it depends on the JVM’s execution engine subsystem.
Second, the JVM execution engine and source code compilation principle
Java’s execution engine subsystem is primarily responsible for interpreting/compiling bytecode instructions into local machine instructions on the corresponding platform. In simple terms, the JVM execution engine acts as a “translator” between the Java virtual machine and the operating system platform.
At present, the main execution technologies include: interpretation execution, static compilation, just-in-time compilation, adaptive optimization, and direct execution on chip, with the following definitions:
- Interpretation execution: the program only converts some code into machine code for computer execution every time it is used.
- Static compilation: The so-called static compilation refers to the program before starting, according to the corresponding hardware/platform, all the code is compiled into the corresponding machine code.
- Just-in-time compilation: The application dynamically detects frequently run code using related technologies (such as HotSpot detection in HotSpot HotSpot), and then converts the frequently executed code into mechanical code and stores it during execution. The next time the application is executed, the machine code is directly executed.
- Adaptive optimization: Start with all code interpreted and monitored, then start a background thread for frequently invoked methods, compile them into native code, and carefully optimize them. If the method is no longer used frequently, the compiled code is cancelled and interpreted execution continues.
- Direct execution on a chip: this is the method of writing machine code directly, so that the code can be read by the CPU and executed directly.
These are some of the existing execution techniques, where interpretation execution belongs to first generation JVMS, just-in-time compilation jitter belongs to second generation JVMS, and adaptive optimization (currently used by Sun’s Hotspot) is a combination of both. Static compilation is implemented in both BEA’s JRockit virtual machine and JDK9’s AOT compiler. The benefit of static compilation is that it performs best, but the downside is that it takes a long time to start up and breaks Java’s “compile once, run anywhere” rule.
In fact, when Java was first born, when JDK1.0, Java was positioned as an interpreted language, that is, after Java programs were written, the source code was first compiled into bytecode through javac, and then the generated bytecode was interpreted and executed line by line. But this led to the program execution speed is slower, startup is not optimistic, because need to compile a starting. Compile the Java file, and compiled the generated bytecode instruction computer recognition, also cannot be need when executed again after an explanation, to become a computer can identify the machine code instructions, So that the code can be executed by the machine. After the above analysis, the disadvantages of this interpretation execution in JDK1.0 are obvious. Java’s overall performance is greatly reduced in order to achieve the “compile once, run anywhere” principle. Why? Because it’s an extra step compared to other languages. Generally speaking, in order for a Java program to run, it must go through a process of compilation and interpretation before it can actually execute. Now let’s look at implementation in other languages.
Pure compiled language: when the program starts, the compiled source code will be all compiled into the mechanical code instructions of the platform.
Features: Best performance, long startup time, poor portability, different platforms need to re-send packages.
Pure interpreted language: in the process of running a program, when a certain code needs to be executed, the code will be interpreted as the corresponding mechanical code instructions of the platform, and then the computer to execute.
Features: fast startup speed, poor performance, good portability.
OK, after a brief look at interpreted and compiled language features, go back to the 1.0 version of Java and find that Java is stuck in the middle because of virtual machines. Because Java programs have to compile source code and interpret execution when they run, the result is mediocre execution performance and mediocre startup speed.
More recently, Java addressed this problem with the introduction of a backend compiler in 1.2, the JIT just-in-time compiler (described below), which enables dynamic generation of native mechanical code during Java execution. Modern high-performance JVMS work with an interpreter and even compiler, so Java is also known as a “half-interpreted, half-compiled language.”
This article will analyze the execution engine of the JVM based on the current HotSpot VIRTUAL machine. The execution engine of the current HotSpot virtual machine also works with the model of interpreter and even compiler, but the execution mode of the virtual machine is adaptive optimization execution.
2.1. Execute the working process of the engine
As for the execution engine, it has been mentioned in the Virtual Machine Specification that all vendors are required to implement the same input and output, that is, the input content of the execution engine must be binary stream data of bytecode, and the output must be the execution result of the program. And the execution engine exactly needs to execute what operation, is completely dependent on the PC register (program counter), every time the execution engine processes an instruction operation, the program counter needs to update the address of the next instruction to be executed.
In the process of executing a Java method, the execution engine may directly access the Java object instance data stored in the heap according to the reference information of the operand stack in the stack frame. It may also locate the object type information through the metadata pointer (KlassWord) recorded in the object header of the instance object. That is, metadata Pointers are used to access data in the metadata space (method area). The diagram below:
2.1.1 Java source code compilation process
As mentioned earlier, the JVM only recognizes bytecode files, so when written.java
Suffixes Java source code when we often need to passjavac
Such source code compilers (front-end compilers) compile and generate Java code.class
After being loaded into memory by JVM, source code compilation process is as follows:
Compilation is the process of converting one language specification into another. Compilers usually convert human-friendly language specifications (programming languages) into machine-friendly language specifications (mechanical codes made up of binary sequences). C/C++ or assembly languages, for example, compile source code directly into target machine code.
As the source code compiler for the Java language, Javac is compiled not for a hardware platform, but for the JVM. Javac’s job is to convert Java source code into bytecode that the JVM can recognize, that is, a. Java file to a. Class file. The task of smoothing out differences between classes and platforms is left to the JVM, with the execution engine in the JVM translating bytecode instructions into machine-code instructions that are recognizable to the platform on which the current program is running.
The javAC compilation process is explained as follows:
- (1) Lexical analysis: first read the source code byte stream data, and then according to the grammar rules of the source language to find the definition of the language keywords in the source code, such as
If, else, while, for
And so on, and then judge whether the definition of these keywords is legal, for the legal keywords to generate symbol sequence for grammar analysis, while creating symbol table, will
All identifiers are recorded in the symbol table, a process known as lexical analysis. – Symbol table is used to record identifiers used in source code and collect various attribute information for each representation. – Results of lexical analysis: find some legitimate Token streams from source code and generate Token sequences.
- ② Grammar analysis: the lexical analysis is obtained
Token
Flow syntax analysis is based on the syntax rules of the source program, check whether the combination of these keywords conforms to the Java language specification, such as if is followed by a Boolean judgment expression, else is written after if, etc. For conforming to the specification, the sequence of tokens generated in the previous step in the organization generates a syntax tree.- The result of syntax analysis: form an abstract syntax tree conforming to Java language regulations. An abstract syntax tree is a structured syntax expression that organizes the main lexical forms of a language into a structured form that can be later reorganized according to new rules.
- ③ Semantic analysis: After parsing is no syntax errors of these problems, the semantic analysis, there are two main tasks, one is in shangbu check syntax tree, including type checking, control flow checking, uniqueness checking, etc., the second is to convert some complicated grammar for simpler syntax, equivalent to translate some of the writings in classical style, classical poetry and idioms into the meaning of the word. Such as the
foreach
intofor
The loop, loop flag bit is replaced bybreak
And so on.- The result of semantic analysis: simplifying the syntax generates a syntax tree that is closer to the syntax rules of the target language.
- (4) Bytecode generation: the simplified syntax tree is converted to
Class
The format of the file, that is, the bytecode is generated at this stage based on a simplified syntax tree.- Bytecode generation result: Bytecode data conforming to vm specifications is generated.
Class bytecode files are then loaded into memory by the virtual machine’s class loading mechanism at startup. When a method is called during the program’s execution, the corresponding bytecode instructions are handed over to the execution engine.
In general, the process of Java code execution will be mainly divided into three stages, respectively: source code compilation stage, class loading stage and class code (bytecode) execution stage, and then we will analyze the process of execution stage.
2.1.2. Execute the engine execution process
The final execution of the bytecode loaded into memory is the responsibility of the execution engine, but the EXECUTION engine of the JVM does not actually execute the bytecode instructions, but translates the bytecode instructions to the execution engine of the physical machine for actual execution. The overall process is as follows:
In general, after the bytecode is loaded into memory, it will go through the above steps before being translated into local mechanical instructions, but these optimization steps are not necessary and can be turned off by JVM parameters when the program is started. But in general, although the process of optimization takes some time, it can greatly improve the execution speed of the program, so in general, the advantages outweigh the disadvantages.
As you can see from the above figure, the data entry to the execution engine is a bytecode file. In the HotSpot VIRTUAL machine, the Class file structure is defined as follows:
struct ClassFile {
u4 magic; // Recognize the Class file format with the value 0xCAFEBABE
u2 minor_version; // Class file format minor version number
u2 major_version; // Class file format Major version number
u2 constant_pool_count; // Number of constant table items
cp_info **constant_pool; // A variable length symbol table
u2 access_flags; // The mask of the modifier used in the Class declaration
u2 this_class; // Index of the constant table, which holds the class name or interface name
u2 super_class; // Index of the constant table, where the parent class name is stored
u2 interfaces_count; // Number of superinterfaces
u2 *interfaces; // Index of constant table, name of each superinterface
u2 fields_count; // The number of fields of the class
field_info **fields; // Domain data, including attribute name index
u2 methods_count; // Number of methods
method_info **methods; // Method table: includes method name index/method modifier mask, etc
u2 attributes_count; // The number of additional attributes of the class
attribute_info **attributes; // Class additional attribute data, including source file name, etc
};
Copy the code
Any Java source code with the. Java suffix is compiled to produce a class bytecode file in the format above. The execution engine also receives input in class files in the same format, but it is worth noting that the JVM does not only receive.class files compiled from.java files, but any bytecode files that conform to the format specification can be received and executed by the JVM.
The HotSpot VIRTUAL machine is stack-based, meaning that the execution engine executes a method by executing a stack frame that contains information about the local variable table, operand stack, dynamic linking, and method return address. However, when the vm is running, the execution engine will only execute the top stack frame, because the top stack frame is the method that needs to be executed currently. After executing the current method, the top stack frame will pop up, and then the next stack frame (the new top stack frame) will be taken out to continue execution. We just mentioned that the method information is stored in the stack frame, which is read from the class bytecode file. Each method is described by the method_info structure as follows:
struct method_info
{
u2 access_flags; // Method modifier mask
u2 name_index; // The index of the method name in the constant table
u2 descriptor_index; // The method descriptor whose value is the index in the constant table
u2 attributes_count; // The number of properties of the method
attribute_info **attributes; // Attribute table of method (local variable table)
};
Copy the code
In method_info, there is an Attribute_info member called attributes. This member is known as the local variable table, which also holds method parameters and local variables. When the method is an instance method, the 0th bit of the local variable table is used to pass a reference to the object that the method belongs to. Is this. The Java Virtual machine execution engine is stack based. The stack is the stack of operands, and the depth of the operand stack is recorded in the Code attribute of the method attribute set. The Attributes member also records the amount of space required by the local variable table.
Here’s a simple example to get a feel for the execution engine:
/* ------Java code ------ */
public int add(a){
int a = 3;
int b = 2;
int c = a + b;
return c;
}
/ * -- -- -- -- -- - javap -c - v - p view to bytecode (omit the description method of bytecode) -- -- -- -- -- - * /
0: iconst_3 // Put 3 at the top of the operand stack
1: istore_1 // Write out the top element of the operand stack and place it at index 1 in the local variable table
2: iconst_2 // Put 2 at the top of the operand stack
3: istore_2 // Write out the top element of the operand stack and place it at index 2 in the local variable table
4: iload_1 // Load the value of index position =1 from the local variable table
5: iload_2 // Load the value of index position =2 from the local variable table
6: iadd // Pop two elements on top of stack and add (3 + 2)
7: istore_3 // Write the added result at index 3 in the local variation table
8: iload_3// Load the data value at index position =3 from the local variable table
8: ireturn // Return the loaded c
Copy the code
For the above process, the first four allocation instructions will not be analyzed, and the following operation process will be analyzed, that is, C =a+ B. The specific execution is as follows:
- (1) data
a
The bus transfers from the local table of variables to the operand stack - (2) data
b
The bus transfers from the local table of variables to the operand stack - (3) data
a
From the operand stack across the bus toCPU
- (4) data
b
From the operand stack across the bus toCPU
- 5.
CPU
When the calculation is complete, the results are transferred to the operand stack over the data bus - ⑥ The result of the operation is transferred from the operand stack to the bus
CPU
- All landowners
CPU
Transfer the data over the bus to the local variable table to assign values toc
- (8) Load the calculated result from the local variable table index 3 into the operand stack
- ⑨ Last use
ireturn
The command will calculate the resultc
Return to the caller of the method
As shown in the stack virtual machine, the length of the local variable table is determined by the compiler, which is equal to one this plus three local variables, and the length is finally 4. When the program executes the line of code defined by the method, data will be successively filled into the local variable table: this, 3, 2. Meanwhile, the program counter will be updated continuously with the execution position of the code. After the add operation is performed, the result 5 of data A + B will be filled into the local variable table.
The JVM execution engine subsystem
In the second phase, we briefly analyzed the Java code compilation process and execution process, and also mentioned above, Java uses interpreter + compiler coexistence mode, which means that the JVM execution engine subsystem contains interpreter and compiler, as shown in the following figure:
The Execution engine subsystem of the Java virtual machine consists of two types of actuators, the interpreter and the just-in-time compiler. When the execution engine gets the javac-compiled.class
The bytecode file is converted to final mechanical code execution at run time through the Interpreter. In addition, to improve efficiency, the JVM has added a method calledJust-in-time JIT compilationThe just-in-time compiler is designed to avoid frequently executed code being interpreted. JIT compiles the entire function to platform-native mechanical code, thus greatly improving execution efficiency.
3.1 Interpreter
When a Java program executes a method or some code, it finds the corresponding bytecode in the.class file and interprets each bytecode instruction line by line according to the defined specification, translating it into the platform’s corresponding local mechanical code for execution. When a bytecode instruction is interpreted and executed, it is read and interpreted again according to the next instruction to be executed recorded in the PC register (program counter).
In the HotSpot VIRTUAL machine, the Interpreter consists mainly of the Interpreter module, which implements the core functions of interpretation execution, and the Code module, which manages the local mechanical instructions generated by the Interpreter while it is running.
3.2 JIT Just-in-time Compiler
Due to the simple implementation and excellent cross-platform performance of the interpreter, many high-level languages, such as Python, Rust, JavaScript, etc., are implemented in the way of the interpreter. However, the performance of compiled languages, such as C/C++, Go, etc., is definitely inferior. As mentioned more than once before, Java solves performance problems by adopting a technique called JUST-in-time compilation, which compiles entire methods or blocks of code that are executed frequently directly to native machine code, and then executes the generated machine code when the methods or blocks are executed later.
OK~, so what is the benchmark for the code that is executed more frequently in the above mentioned? The answer: hot spot detection technology.
3.3 hotspot code detection technology
As the name suggests, HotSpot VM is a virtual machine capable of detecting HotSpot code. HotSpot code is code that is invoked and executed frequently. When a method is executed a certain number of times, it reaches a specified threshold. The JIT then deeply optimizes the code and compiles the method directly to the machine code for the current platform, thus improving the performance of the Java program.
A method executed by multiple calls or the body of a loop that loops more than once in a piece of code can be called hot code and thus JIT compiled into local machine instructions.
3.3.1 Replace on the stack
All programming languages, such as C/C++ and compiled languages such as GO, are statically compiled, meaning that all source code is compiled into the machine code for the platform when the program is started. However, jIts in the JVM are dynamic compilers, because compilation of hot code takes place at runtime. So this approach is also called On Stack Replacement, or OSR Replacement in some places.
3.3.2 Method call counter and loopback counter
As mentioned earlier, “a method that is executed multiple times or a loop body that circulates a lot in a piece of code can be called hot code.” So how many times does a method need to be called or a loop body need to loop a lot before it can be called hot code? There must be a threshold, and how does the JIT determine if a piece of code has been executed a certain number of times? It relies heavily on hotspot code detection technology.
In HotSpotVM, hotspot code detection technology is implemented primarily based on counters. In HotSpot, two different types of counters are created for each method, the Invocation Counter and BackEdge Counter. The Invocation Counter counts the number of times the method is invoked. The loopback counter is used to count the number of cycles in a method body.
Method call counter
The threshold of the method call counter is inClient
In mode, the default is 1500Server
The default in mode is 10000, and JIT compilation is triggered when a piece of code is executed a certain number of times. Of course, if you are not satisfied with these default values, you can also use JVM parameters- XX: CompileThreshold
To specify.
As shown above, when a method is called to execute, it is first checked to see if the method has been JIT compiled, and if so, the native machine code generated from the last compilation is executed directly. Otherwise, if the method has not been compiled, the counter +1 is called to determine whether the counter has reached the specified threshold, and if it has not, the code is executed in interpreter mode. If the specified threshold is reached, the compilation request is submitted, and the JIT is responsible for background compilation. After the background thread compiles, local machine code instructions are generated, and these instructions are put into the backgroundCode Cache
The next time the method is executed, the corresponding mechanical code is read directly from the cache.
Back edge counter
A loopback counter counts the number of times the body of a loop is executed in a method. The instruction in the bytecode that controls the redirection is called the “Back Edge”. Like the method call counter, OSR compilation is triggered when a certain threshold is reached. The diagram below:
The return counter is compiled in much the same way as the method call counter, except that: Whether the method invocation counter or back edge, when submitting OSR compile request that perform actions, or still use the interpreter, and do not wait to compile operation is completed to perform machine code, because it takes longer, only next to execute this code will execute the compiled machine code.
3.3.3 Heat attenuation
In general, if Java programs are started with default parameters, the method call counter counts not an absolute number of executions, but a relative frequency of executions, which also represents the number of times a method is executed over time. When a certain amount of time has elapsed before the Counter has reached the compilation threshold for submission to the JIT just-in-time compiler, the Counter is halved. This process is called Counter Decay of method call counters. This Time is called the Counter Half Life Time of the method call Counter.
Heat decay occurs when the VIRTUAL machine GC is garbage collecting. Heat decay can be turned off by -xx: -usecounterdecay. This allows the method call counter to be determined by absolute calls rather than by relative execution frequencies. Turning off heat attenuation, however, will result in a Java program running online long enough that the methods in the program will be largely compiled to native machine code.
You can also use the -xx: CounterHalfLifeTime parameter to adjust the half-aging period (unit: second).
In general, if the project is small and the product does not need iteration for a long time after launch, you can try to turn off heat attenuation, which can make the Java program perform better the longer it runs online. As long as the online running time is long enough, the performance of programs written in C can be equal to or even better (because C/C++ requires manual memory management, memory management is time-consuming, but Java programs do not need to worry about memory problems when executing programs, there will be GC mechanism to take care of).
3.3.4 Other hot spot detection technologies
In the previous analysis, we know that HotSpot code detection in HotSpot is based on the counter mode. However, in addition to the counter mode detection, HotSpot code detection can also be based on the sampling and Trace mode.
- Sampling probe: A virtual machine that uses this technique periodically checks the top of the virtual machine stack for each thread. If a method appears frequently at the top of the stack during the probe, it is called frequently and can be considered a hot method.
- Advantages: Simple implementation and easy to identify hot (frequently called) methods.
- Disadvantages: Accurate detection cannot be achieved, because the check is periodic, and some methods have factors such as thread blocking and sleep, so some methods cannot be accurately detected.
- Trace detection: A virtual machine that uses this approach takes and only compiles a frequently executed piece of code as a compilation unit consisting of a linear and continuous set of instructions with only one entry and multiple exits. This means that hotspot code compiled based on traces is not limited to a single method or block of code. A trace may correspond to multiple methods, and frequently executed paths in code may be identified as different traces.
- Advantages: This approach makes hotspot detection more accurate, avoids compiling all the code in a code block, and can greatly reduce unnecessary compilation overhead. This is because both sampling and counter probes are based on the method body or loop body as the basic unit of compilation.
- Disadvantages: The implementation process of trace detection is very complicated and difficult.
With the HotSpot VIRTUAL machine’s approach of counting probes, there is a good trade-off between implementation difficulty, compilation cost and detection accuracy. The three detection techniques are compared as follows:
- Implementation difficulty: sampling detection < count detection < trace detection
- Detection accuracy: sampling detection < counting detection < trace detection
- Compilation overhead: trace probe < count probe < sampling probe
3.4. Why doesn’t the JVM remove the interpreter?
If the program is executed in a pure JIT compiler, the performance will definitely exceed the interpreter + compiler hybrid mode. But why not remove the interpreter from the VIRTUAL machine so far, and still use the interpreter to drag down the performance of Java programs? As in the JRockit virtual machine mentioned in the introduction, the interpreter module is removed and the bytecode files are executed entirely by the just-in-time compiler.
There are two main reasons, one is to ensure that Java is absolutely cross-platform, and the other is to ensure that the startup speed, considering the overall performance. Absolutely cross-platform: Removing the interpreter from the virtual machine means that every time you migrate from Windows to Linux, for example, the JIT will have to be recompiled to generate the machine code instructions for the Java program to execute. However, if the interpreter and JIT compiler work in a mixed mode, there is no need to worry about this problem, because in the early stage, the interpreter can directly translate the bytecode instructions into the current machine code, and the interpreter will translate the corresponding machine code instructions according to the platform. This makes Java more cross-platform. ② Ensure the Java startup speed, considering the overall performance: because if the interpreter module is removed, it means that all the bytecode instructions need to be compiled into the local mechanical code at startup, so that the Java program can be executed normally. However, if you want to compile all the bytecode instructions into machine code at startup, the time overhead is very large, and if the interpreter is removed from the JVM, some projects that need to be rushed online may have to wait half a day to compile.
To sum up, removing the interpreter from a VIRTUAL machine has its pitfalls, as well as its benefits, such as the previously mentioned removal of the interpreter module in JRockitVM, which earned it the title of “fastest ever” VIRTUAL machine.
And HotSpot is adopted in the interpreter + JIT compiler immediately hybrid mode, the advantages of this model is that: in the Java program is running, the JVM can quick start, early by the interpreter to work, don’t need to wait until the compiler to compile all bytecode instructions executed after, this can save a large part of the compilation time. Later, as the program runs for longer periods of time online, the JIT kicks in and slowly replaces some of the program’s hot code with native machine code, making the program run more efficiently. Also, HotSpotVM has the concept of heat decay, so when a piece of code gets hot, the JIT will uncompile it and switch it back to interpreter execution mode, so HotSpot’s execution mode is called “adaptive optimization” execution.
Of course, we can also specify the execution mode ourselves with JVM parameters at startup:
① -xint: Execute programs in interpreter mode. ② -XCOMP: fully use the just-in-time compiler mode to execute programs. If the just-in-time compiler has problems, the interpreter steps in. ③ -xmixed: execute in interpreter +JIT just-in-time mode (default execution mode).
3.5. Matters needing attention in flow migration of hot and cold machine
Through the above analysis, we can draw a conclusion:
The way compiled execution is performed far outperforms interpreted execution.
This may sound like nonsense, because a discerning person can see this conclusion, but it’s not. At this point, we can think about this conclusion from the perspective of the system architecture. Is this conclusion different for the system as a whole? Yes, as follows:
Since compilation execution is more efficient than interpretation execution, it means that the system throughput is much higher during compilation execution than during interpretation execution. HotSpot, Java’s current default virtual machine, is not compiled from the start, but compiled dynamically with a JIT just-in-time compiler at run time.
The simple conclusion is that Java programs can be divided into two states:
- Engine: A machine that runs Java programs online for long periods of time, much of which has been jit-compiled into native machine code instructions.
- Cold machine: a machine for Java programs that have just been started and all the code is still in the interpretative execution stage.
From the above analysis, it can be known that the flow load of the machine in the thermal state will be far more than that of the cooling state. If the program switches traffic from the hot machine to the cold machine, the cold machine server may die because it cannot bear the traffic.
Before I in the development process have encountered such a problem, because a service expands, originally calculated according to the cluster scale before, then expands around a quarter of the machine can be carry new traffic, but behind the start there was a problem, a new start of the machine after forwarding gateway there distribution flow, immediately goes down, At the beginning, it was the first time that I encountered such a problem, and I thought it was the problem of the machine or the code in the program, but it was found that there was no problem. Later, I tried to increase the number of expanded machines from 1/4 to 1/3 of the original plan, and the traffic was smoothly transferred to a new machine without any downtime.
As can be seen from the above case, it is not feasible to directly transfer the flow of the heat engine to the machine of the cold machine. Therefore, there are generally two solutions at the software and hardware level to smoothly switch the flow to the new machine during capacity expansion. The solutions are as follows: The first solution is to use more machines to carry the flow from the heat engine state, as in the case above, and then stop the excess machines when these newly started chillers become the heat engine state. The second scheme is to control the traffic on the gateway side. Firstly, part of the traffic is forwarded to the chillers just started, and the chillers just started are preheated first. After running for a period of time, all the planned traffic will be transferred to these machines.
Fourth, a comprehensive analysis of JIT just-in-time compiler
Java compilers can be broadly divided into three categories:
- ① Front-end compiler: similar to THE ECJ incremental compiler in JAVAC, JDT, etc. Is to point to will
.java
Source code compiled into.class
Compiler for bytecode instructions. - ② Backend compiler: a JIT just-in-time compiler that compiles bytecode instructions into machine code instructions.
- Static compiler: similar to AOT compiler in Java9
.java
A compiler whose source code is compiled directly into machine code instructions.
The use of a hybrid interpreter + compiler for JVM execution, typically a JIT compiler, is relatively rare in Java for static compilers. Embedded in the HotSpot VIRTUAL machine are two JIT just-in-time compilers, Client Compiler and Server Compiler, commonly known as C1 and C2 compilers. C2 compilers are the default for JVMS on 64-bit systems. Otherwise known as the Server Compiler. However, it is also possible to specify which compiler is used at run time with an explicit argument at startup, as follows:
- -client: indicates that the C1 compiler is used when the JVM runs.
- The C1 compiler optimizes bytecode simply and reliably, with a relatively short time and speed of compilation.
- -server: specifies that the C2 compiler is used when the JVM runs.
- The C2 compiler optimizes bytecode aggressively and takes a long time to achieve compiled performance.
The two compilers pursue different directions, so the process of optimization is also different. Here is a brief analysis of C1 and C2 compilers.
4.1 C1 Client Compiler
The C1 compiler is conservative in its pursuit of stability and compilation speed, and there are several common optimizations in C1: common subexpression elimination, method inlining, de-virtualization, and redundancy elimination.
- Common subexpression elimination: If an expression E has been evaluated, and the values of all variables in E have not changed since the previous evaluation, the occurrence of E becomes a common subexpression, which can be eliminated by using the original expression, directly using the result of the last calculation, without needing to be evaluated again.
- Method inlining: Compiles the referenced method code to the reference point, which reduces stack frame generation, parameter passing, and jump process.
- De-virtualization: Inline unique implementation classes.
- Redundancy elimination: The removal of code that will not be executed during execution by stream analysis of bytecode instructions.
- NullCheck elimination: The NullCheck called explicitly is erased to be processed by ImplicitNullCheck.
- Automatic packing elimination: For some unnecessary packing operations will be eliminated, such as the data just packed and immediately after the unpacking, this useless operation will be eliminated.
- Safety point elimination: Safety points that a thread cannot reach or will not stay are eliminated.
- Reflection elimination: For some data that can be accessed without reflection mechanism, it will be changed to direct access, eliminating reflection operation.
4.2 C2 Server Compiler
C2 compilers are radical in their pursuit of post-compilation performance. C2 compilers are built on the basic optimization of C1 compilers. In addition to the optimization methods in C1, C2 compilers also use several radical optimization methods based on escape analysis, such as scalar replacement, allocation on the stack and synchronous elimination.
- Escape analysis: Escape analysis is based on the method as a unit to determine whether the scope of the variable exists in other stack frames or threads. If a member is created in the method body but does not leave the scope of the method body until the end of the method, the member can be considered as not escaping. Conversely, if a member at the end of the method is
return
If it goes out or is assigned to an external member in the logic of the method body, it means that the member escapes. The method to determine escape is called escape analysis.- Another way to put it is from a thread’s point of view: if an object in one thread cannot be accessed by another thread, it does not escape.
- Scope of escape:
- ① Stack frame escape: a local variable defined in the current method escapes the current method/stack frame.
- Thread escape: the current method defines a local variable that escapes from the current thread and can be accessed by other threads.
- Escape type:
- Global assignment escape: The current object is assigned to a class property, a static property
- Parameter assignment escape: The current object is passed as a parameter to another method
- Method return value escape: The current object is treated as return
- Scalar substitution: Based on escape analysis, the fundamental scalar is used to replace the aggregate quantity of an object.
- Scalars: Reference and the eight basic data types are typical scalars, which generally refer to data that cannot be disassembled.
- Benefits:
- ① It saves heap memory because objects after scalar substitution can be allocated memory on the stack.
- ② Relatively speaking, it saves the process of looking up object references in the heap, which is faster.
- ③ Because it is allocated on the stack, it is automatically destroyed when the method terminates and the thread stack is ejected, without GC intervention.
- On-stack allocation: Unescaped objects are disassembled using scalar substitution, and the disassembled scalar is then allocated in a local variable table to reduce instance object generation, reduce heap memory usage, and reduce GC times.
- Factors that determine whether an object can be allocated on the stack (both must be satisfied) :
- ① Objects can be decomposed into scalar quantities by scalar substitution.
- ② The object cannot escape in the scope of stack frame level.
- Factors that determine whether an object can be allocated on the stack (both must be satisfied) :
- Synchronous elimination: in appearance
synchronized
In the case of nesting, if one synchronized method calls another synchronized method, then the second synchronized methodsynchronized
The lock is eliminated because the second method is accessible only to the thread that acquired the first lock, and there is no thread concurrency safety issue.- Determine whether synchronous elimination is possible (one is enough) :
- ① The current object is allocated on the stack.
- ② The current object cannot escape the thread scope.
- Determine whether synchronous elimination is possible (one is enough) :
- Null check clipping: After flow analysis, Null branch judgments that will not be executed will be clipped
- If a parameter is tested for non-null values before being passed to the external method, but is tested again for non-null values in the internal method, the non-null values for the internal method are cut off.
① Stack frame escape: a local variable defined in the current method escapes from the current method/stack frame. Thread escape: the current method defines a local variable that escapes from the current thread and can be accessed by other threads. Global variable assignment escape: the current object is assigned to a class property. Static property parameter assignment escape: the current object is passed as an argument to another method. Method return escape: the current object is treated as a return value
As mentioned earlier, the C2 compiler is used by default on 64-bit JVMS, but in fact, on 64-bit machines after JDK1.6, when -server mode is either specified by default or explicitly specified, the JVM will enable a tiered compilation strategy, where C1+C2 works together to handle compilation tasks. The general logic of hierarchical compilation is: C1 compiler is used for simple optimization when Java program is just started in the cold state, in pursuit of compilation speed and stability; when JVM reaches the hot state, subsequent compilation requests are fully and aggressively optimized by C2 compiler, in pursuit of performance and efficiency when compiled and executed.
PS: The initial CodeCache size is 2496KB in Server mode and 160KB in Client mode. You can specify the maximum size of a CodeCache by using the -xx :ReservedCacheSize parameter.
4.3. Other compilers
In JDK10, HotSpot to join a new compiler: Graal compiler, the compiler performance after several generations of update soon caught up with old C2 compiler, in JDK10 can pass – XX: XX: + UnlockExperimentalVMOptions – The +UseJVMCICompiler parameter uses it.
Dispatch calls
While studying JavaSE, you should have learned the basic features of OOP, namely encapsulation, inheritance, and polymorphism. How does polymorphism find specific methods at run time, such as overwriting and overloading methods, how does it determine which method to call at run time? That is, through dispatch technology.
5.1. Method Invocation
First, method invocation is different from method execution. The main task of method invocation stage is to determine the version of the method to be called. This version refers to which method to call in the case of overloading or rewriting. In general, after a.java file is compiled into a.class file by the front-end compiler, all method calls stored in the class file are symbolic references rather than direct references (in-memory entry to the runtime method).
In general, the method of direct reference need to wait until phase of class loading even runtime can be determined, in the phase of class loading can be identified directly reference method only static methods, final method and a private method, because this a few kinds are belong to “compiler knowable, run-time immutable” approach, Because methods defined in these ways are either directly associated with the class, or are externally inaccessible and unmodifiable, they cannot change their version of the method through overridden methods. Therefore, direct references can be confirmed directly during the parsing phase, and parsing can be performed during the class loading phase.
Five instructions for method calls are provided in the JVM virtual machine:
invokestatic
: Invokes static methodsinvokespecial
: call construct<init>
Constructor, private method, andSuper (), super. XXX ()
The parent class methodinvokevirtual
Call all virtual methods (static, private, constructor, parent, and final methods are non-virtual)invokeinterface
The interface method is called, and the specific implementation class method is determined at runtimeinvokedynamic
: The runtime now dynamically resolves the method referenced by the call point qualifier and then executes that method. For the previous four instructions, the dispatch logic was hardwired into the JVMinvokedynamic
The dispatch logic of the instruction is determined by the bootstrap method set by the user
In general, methods that can be invoked by invokestatic and Invokespecial directives can determine the specific version of the call during the resolution phase. Static, private, construct, parent, and final methods all meet the call criteria, so these methods will replace symbolic references with direct references during the class loading phase. Because these methods are a static process, they can be fully versioned at compile time without having to defer the work to run time, and this type of invocation is called static dispatch. However, public instance methods and non-private member methods cannot be versioned at compile time, so these methods are called in a way called dynamic dispatch. At the same time, the method can be divided into single dispatch and multiple dispatch according to the number of cases.
The method case refers to the owner and parameters of the method, and the dispatch can be divided into single dispatch and multiple dispatch depending on how many cases the dispatch is based on. So to summarize a little bit, here it is:
- Nonvirtual methods: A method that can be versioned during class loading (converting symbolic references to direct references)
- Virtual method: A method that cannot determine the version (which direct reference a symbolic reference can resolve to) during the class load phase
- Static dispatch: The version of a method can be determined at compile time, and the version determination can be done at class load during parsing
- Dynamic dispatch: The runtime determines the method version, and the JVM determines the exact version of the method
- Single dispatch: Method version selection based on a single method case
- The selection of dynamic dispatch is based on the version selection of the method receiver, so dynamic dispatch is a single dispatch
- Multiple dispatch: Method version selection based on multiple method cases
- Static dispatch is multiple dispatch because the version selection is based on the receiver and parameters of the method
5.2. Static dispatch
Static dispatch is any dispatch action that depends on the static type to locate the method execution version. Static dispatch occurs at compile time and is performed by the compiler, so static dispatch is not performed by the virtual machine. What does static typing mean?
User u = new Admin();
In the code above, User is the static type (appearance type) of the variable u, and Admin is the actual type of the variable.
The typical example of static dispatch is method Overload, where methods are characterized by different method signatures (same method names, different parameter lists). To understand the last case:
public class User{
public void identity(VipUser vip){
System.out.println("I am a VIP member user....");
}
public void identity(AdminUser admin){
System.out.println("I am the administrator....");
}
public static void main(String[] args) {
User user = new User();
VipUser vip = newVipUser(); user.identity(vip); }}class VipUser{}
class AdminUser{}
Copy the code
Identity (VipUser) overloads identity(AdminUser). When parsing a method named identity, the compiler selects the version of the method based on the static type (i.e. appearance type or direct type) of its overloaded parameter.
I am a VIP member user…. This is not hard to understand because when the method is called: user.identity(VIP) is passed a parameter of static type VipUser, so the compiler will eventually find the identity(VipUser VIP) method.
If the argument is an untyped literal (primitive data type), then the compiler will derive the most literal version of the method as follows:
public class User{
public void print(char arg){
System.out.println("char....");
}
public void print(long arg){
System.out.println("long....");
}
public void print(int arg){
System.out.println("int....");
}
// omit other methods.......
public void print(char. arg){
System.out.println("char... ...");
}
public static void main(String[] args) {
User user = new User();
user.print('a'); }}// Output: char....
Copy the code
Looking at the code above, the output is char…. , this is normal, but what happens if we comment out the print(char arg) method and execute it again? An error? No, comment it out and execute it as follows:
The output is int….
As you can see from the above execution result, although the User class does not have a char argument method, in fact the compiler will help you find an “appropriate” method call through the parameter automatic transformation, which is as follows: Char → int → long → float → double, char → int → long → float → double, char → int → long → float → double Char () add_serialIZABLE; char () add_serializable; char () add_serializable; Char → int → long → float → double → Character → Serializable → Object → char…
In fact, when it comes to the compiler’s type conversion, deriving the most appropriate method call is a point that you can understand that there is such a concept, but in real development, the code is not so harsh.
5.3. Dynamic dispatch
Dynamic dispatch refers to the way in which the virtual machine determines the exact version of a method call at run time because the method version cannot be determined by static typing at compile time. A typical example of dynamic dispatch is method Override, which is the concept of method signature identical (method name identical, parameter list identical), as understood in the previous example. As follows:
public class User{
public void identity(a){
System.out.println("I am a user....");
}
public static void main(String []args) {
User user = newVipUser(); user.identity(); }}class VipUser extends User {
public void identity(a){
System.out.println("I am a VIP member user...."); }}class AdminUser extends User{
public void identity(a){
System.out.println("I am the administrator...."); }}// Output result: I am a VIP user....
Copy the code
This should come as no surprise, but how does the virtual machine locate the vipuser.identity () method at runtime? This is obviously not versioning by the static type of the variable, because the identity execution of the variable User of static type User ends up with the vipuser.identity () method. In fact, the truth is very simple, because the actual type of the user variable is different, so how does Java determine the method version by the actual type of the variable? Next step by step analysis.
Having decompiled the above source code using Javap, now look at the bytecode information for user.main () as follows:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=2, args_size=1
0: new #2 // class User$VipUser
3: dup
4: invokespecial #3 // Method User$VipUser."<init>":()V
7: astore_1
8: aload_1
9: invokevirtual #4 // Method identity:()V
12: return
LineNumberTable:
line 7: 0
line 8: 8
line 9: 12
Copy the code
In line 7 of the bytecode instruction, the virtual method vipuser. identity is invoked via the Invokevirtual instruction. The invokevirtual instruction is resolved by the JVM’s execution engine at runtime.
- Locate the first element at the top of the operand stack, which refers to the variable
user
The actual type of, i.eVipUser
- in
VipUser
Find the name and parameter type in the method tableinvokevirtual
The method symbol invoked by the directive refers to the same method- Here it is: representative
VipUser
There are methods in a classidentity()
Method to determine whether a method has access permission- Yes: Replaces a symbolic reference at the calling method with a direct reference to that method
- None: Throw
java.lang.IllegalAccessError
error
- Not found:
- Continue searching from the bottom up
VipUser
The method table of the parent class- Found: represents the parent class has
identity()
Method, and determines whether the access permission for the method is available- Yes: Replaces a symbolic reference at the calling method with a direct reference to that method
- None: Throw
java.lang.IllegalAccessError
error
- Still not found: method representing call does not exist, throw
java.lang.AbstractMethodError
error
- Found: represents the parent class has
- Continue searching from the bottom up
- Here it is: representative
Since the first step in the invokevirtual directive execution is to determine the actual type of recipient at runtime, the Invokevirtual directive in the call resolves symbolic references to class methods in the constant pool into different direct references, which is the essence of method rewriting in the Java language. At the same time, the dispatch process of determining the version of method execution at run time based on the actual type is called dynamic dispatch.
5.4. Implementation of dynamic dispatch in virtual machines
Because dynamic dispatch is a frequently performed action at run time. In addition, the method version determination of dynamic dispatch needs to search for the appropriate version in the metadata of the class, and the performance cost is also high. Therefore, in the actual implementation of virtual machines, most implementations do not really conduct such frequent searches based on performance considerations.
In general, JVM implementations create a virtual method table in the metadata space (the original method area) for each class. When parsing invokevirtual instructions, the method table index is used instead of looking up the metadata to improve performance. The virtual method table stores the actual entry address of each class method. If a method is not overridden in the subclass, the entry address of the method in the virtual method table of the subclass is the same as the entry address of the same method in the parent class, pointing to the implementation entry of the parent class. If a subclass overrides the method, the address in the subclass’s method table is replaced by the entry address pointing to the subclass’s implementation version. For example, if the XXX class does not override the toString() method of the Object class, the entry address of toString() in the virtual method table of the XXX class points to the Object.toString() method. On a VM, methods with the same signature have the same index number in the virtual method table of the parent class and child class. In this way, when the type changes, you only need to change the method table to translate the method entry address from different virtual method tables according to the index. The method table is initialized during the join phase of class loading. After the initial values of class variables are prepared, the virtual machine also initializes the method table of the class.
Of course, there are some unstable radical optimization strategies in C2 compiler execution mode, such as inline caching, and daemon inlining based on “type inheritance analysis” techniques.
For those of you who may be confused by the method dispatch call section, just remember that the purpose of the dispatch call is to determine the specific version of the method when it executes. At the same time, the process of dispatching calls is actually the process of replacing symbolic references with direct drinking, also known in some places as method binding. Statically dispatched invoked methods are also referred to as early bindings because the target method being invoked at compile time is already known. Dynamically dispatched calls are called late bindings, which are unknown at compile time and must wait until run time to bind to the actual type.