Note source: Silicon Valley JVM complete tutorial, millions of playback, the peak of the entire network (Song Hongkang details Java virtual machine)

Update: gitee.com/vectorx/NOT…

Codechina.csdn.net/qq_35925558…

Github.com/uxiahnan/NO…

[toc]

2. Bytecode instruction set

Summary of 2.1.

2.1.1. Execution model

Except for exception handling, the Java virtual machine interpreter can use the following pseudocode as the most basic execution model to understand

do{automatically calculates the PC register value plus1; Extract the opcodes from the bytecode stream according to the indicated position of the PC register;ifRetrieves operands from the bytecode stream; Perform the operation defined by the opcode; }while(Bytecode length >0);Copy the code

2.1.2. Bytecode and data types

In the Java virtual machine instruction set, most instructions contain the data type information corresponding to their operations. For example, the ILoAD directive loads int data from a local variable table into the operand stack, while the FLOAD directive loads float data.

For most data type-related bytecode instructions, there are special characters in their opcode mnemonics to indicate which data type is served specifically:

  • I stands for operation on int,
  • L on behalf of the long
  • S is short
  • B on behalf of the byte
  • C on behalf of the char
  • F on behalf of the float
  • D on behalf of the double

There are also instructions that have no letters in the mnemonic that explicitly indicate the type of operation, such as the ArrayLength instruction, which has no special characters for data types, but whose operands can only ever be objects of an array type.

Other instructions, such as the unconditional jump instruction goto, are data type independent.

Most directives do not support the integer types byte, char, and short, or even Boolean. The compiler extends byte and short data with sign-extend to the corresponding int at compile time or runtime, and Boolean and char data with zero-extend to the corresponding int. Similarly, arrays of types Boolean, byte, short, and CHAR are converted to use the corresponding bytecode instructions of type int. Thus, most operations on Boolean, byte, short, and CHAR data actually use the corresponding int type as the operation type.

2.1.3. Instruction analysis

It takes a lot of time to fully introduce and learn these instructions. To make it easier to get familiar with and understand these basic instructions, the set of bytecode instructions in the JVM has been broken down into nine categories, roughly divided by purpose.

  • Load and store instructions
  • Arithmetic instructions
  • Type conversion instruction
  • Object creation and access instructions
  • Method calls and return directives
  • Operand stack management instructions
  • Comparison control instruction
  • Exception handling instruction
  • Synchronous control instruction

When doing value dependent operations:

  • An instruction that fetches data from local variables, constant pools, objects in the heap, method calls, system calls, etc., which are pushed onto the operand stack (possibly values, possibly references to objects).
  • An instruction can also fetch one or more values from the operand stack (pop multiple times), complete assignments, addition, subtraction, multiplication, division, method arguments, system calls, and so on.

2.2. Load and store instructions

2.2.1. Role

Load and store instructions are used to pass data back and forth between the local variable table of a stack frame and the operand stack.

2.2.2. Common instructions

  1. Loading a local variable onto the operand stack:Xload, xload_ < n >(where x is I, L, f, D, a, and n is 0 to 3)
  2. To load a constant onto the operand stack:Bipush, sipush, LDC, LDC_w, LDC2_W, aconST_NULL, ICONST_M1, ICONST_ < I >, LCONST_ < L >), FCONST_ < F >, dCONST_ < D >
  3. To store a value from the operand stack to the local variable table:Xstore, xstore_ < n >(where x is I, L, f, D, a and n is 0 to 3);xastore(where x is I, L, F, D, a, B, C, s)
  4. An instruction to extend the access index of a local variable table:wide.

Some of the instruction mnemonics listed above end in Angle brackets (for example, iload_

). These instruction mnemonics actually represent a set of instructions (for example, ILoAD_

represents iloAD_0, ILOAD_1, ILOAD_2, and ILoAD_3). These sets of instructions are special forms of a general instruction with one operand (such as ILOAD). For these sets of special instructions, there is no operand on the surface, so there is no need to take the operand, but the operand is hidden in the instruction.

In addition, their semantics are exactly the same as the native generic instructions (for example, iloAD_0 has exactly the same semantics as ilOAD instructions with operands of 0). The letters between Angle brackets specify the data types of the instruction’s implied operands,

for non-negative integers, < I > for int data, < L > for long,

for float, and

for double.


Operations on byte, CHAR, short, and Boolean data are often represented by instructions of type int.

2.2.3. Operand stack and local variable table

Operand Stacks

As we know, Java bytecode is the set of instructions used by the Java virtual machine. As such, it is inseparable from the Stack-based computing model of the Java Virtual Machine. During interpretation execution, whenever a Java method is allocated a frame, the Java virtual machine often needs to carve out an extra space as an operand stack to hold the calculated operands and return results.

The Java virtual machine requires that the operands of each instruction be pushed into the operand stack before it can be executed. When an instruction is executed, the Java virtual machine pops up the operands required by the instruction and pushes the result of the instruction back onto the stack.

Take the addition instruction iadd for example. Assuming that the two elements at the top of the stack are int 1 and int 2 before executing this instruction, iadd will pop these two ints and push the sum and int 3 onto the stack.

Since iADD consumes only two elements at the top of the stack, iADD does not care about the existence of the element 2 away from the top of the stack, namely the question mark in the figure, let alone modify it.

Local Variables

Another important part of the Java method frame is the local variable area, where bytecode programs can cache the results of calculations.

In effect, the Java virtual machine treats the local variable area like an array, holding the this pointer (non-static methods only), the parameters passed in, and the local variables in the bytecode.

As with operand stacks, values of long and double occupy two cells, and the rest occupy only one.

For example:

public void foo(long l, float f) {{int i = e;
    }
    {
        String s = "Hello, World"; }}Copy the code

Corresponding diagram:

This refers to the reference of the current class, the values of the types L and F occupy two slots, and the I and S variables occupy the same slot (i.e. slot reuse) because they are in their respective code blocks and have no common life cycle.

The part of the stack frame that is most relevant for performance tuning is the local variable table. Variables in the local variable table are also important garbage collection root nodes, as long as objects referenced directly or indirectly in the local variable table are not collected.

2.2.4. Local variable pushdown instruction

Iload loads int values from local variables

Lload loads values of type long from local variables

Fload loads a float value from a local variable

Dload loads values of type double from local variables

Aload loads reference type values from local variables (RefernCE)

Iload_0 loads int values from local variable 0

Iload_1 loads int values from local variable 1

Iload_2 loads int values from local variable 2

Iload_3 loads int values from local variable 3

Lload_0 loads a value of type long from local variable 0

Lload_1 loads a value of type long from local variable 1

Lload_2 loads a value of type long from local variable 2

Lload_3 loads a value of type long from local variable 3

Fload_0 loads a value of type float from local variable 0

Fload_1 loads a value of type float from local variable 1

Fload_2 loads a value of type float from local variable 2

Fload_3 loads a value of type float from local variable 3

Dload_0 loads a value of type double from local variable 0

Dload_1 loads a value of type double from local variable 1

Dload_2 loads a value of type double from local variable 2

Dload_3 loads a value of type double from local variable 3

Aload_0 loads the reference type value from local variable 0

Aload_1 loads the reference type value from local variable 1

Aload_2 loads the reference type value from local variable 2

Aload_3 loads the reference type value from local variable 3

Iaload loads int values from arrays

Laload loads values of type long from an array

Faload loads a float value from an array

Daload loads a double value from an array

Aaload loads reference type values from an array

Baload loads byte or Boolean values from an array

Caload loads a char value from an array

Saload loads short values from an array

Set of common instructions for local variable pushdown

xload_n xload_0 xload_1 xload_2 xload_3
iload_n iload_0 iload_1 iload_2 iload_3
lload_n lload_0 lload_1 lload_2 lload_3
fload_n fload_0 fload_1 fload_2 fload_3
dload_n dload_0 dload_1 dload_2 dload_3
aload_n aload_0 aload_1 aload_2 aload_3

Local variable stack instruction analysis

The local variable pushdown instruction pushes data from a given local variable table onto the operand stack.

These directives can be broadly classified as:

  • xload_<n>(x is I, L, f, D, a, n is 0 to 3)
  • xload(x = I, L, f, d, a)

Note: Here, the value of x indicates the data type.

The instruction xload_n represents pushing the NTH local variable onto the operand stack, such as iloAD_1, fload_0, aload_0, etc. Where aload_n means pushing an object reference.

The xload directive pushes local variables into the operand stack by specifying parameters. When used, the number of local variables may exceed four, such as iload, fload, etc.

For example:

public void load(int num, Object obj, long count, boolean flag, short[] arr) {
    System.out.println(num);
    System.out.println(obj);
    System.out.println(count);
    System.out.println(flag);
    System.out.println(arr);
}
Copy the code

Bytecode execution process:

2.2.2. Constant push instruction

Aconst_null pushes the NULL object reference onto the stack

Iconst_m1 pushes a constant of type int -1

Iconst_0 pushes the constant 0 of type int

Iconst_1 pushes a constant of type int 1

Iconst_2 pushes a constant of type 2 int

Iconst_3 pushes a constant of type int 3

Iconst_4 pushes a constant of type 4 int

Iconst_5 pushes the constant 5 of type int

Lconst_0 pushes the constant 0 of type long onto the stack

Lconst_1 pushes a constant of type 1 of long onto the stack

Fconst_0 pushes the constant 0 of type float onto the stack

Fconst_1 pushes the constant of type float 1 onto the stack

Dconst_0 pushes the constant 0 of type double onto the stack

Dconst_1 pushes the constant of type 1 double onto the stack

Bipush pushes an 8-bit signed integer onto the stack

Sipush pushes 16-bit signed integers onto the stack

The LDC pushes items in the constant pool onto the stack

Ldc_w pushes items in the constant pool (using wide indexes)

Ldc2_w pushes items of type long or double in the constant pool (using wide indexes)

Constants are pushed into the common instruction set

xconst_n The scope of xconst_null xconst_m1 xconst_0 xconst_1 xconst_2 xconst_3 xconst_4 xconst_5
iconst_n [- 1, 5] iconst_m1 iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5
lconst_n 0, 1 lconst_0 lconst_1
fconst_n 0, 1, 2 fconst_0 fconst_1 fconst_2
dconst_n 0, 1 dconst_0 dconst_1
aconst_n null, String literal, Class literal aconst_null
bipush One byte, 2^8^, [-2^7^, 2^7^ -1], i.e. [-128, 127]
sipush Two bytes, 2^16^, [-2^15^, 2^15^ -1], i.e. [-32768, 32767]
ldc Four bytes, 2^32^, [-2^31^, 2^31^ -1]
ldc_w Wide indexes
ldc2_w Wide index, long or double

Constant push instruction anatomy

The function of constant push instruction is to push constant onto the operand stack. According to different data types and pushing contents, it can be divided into const series, push series and LDC instructions.

Const series of instructions: Used to push specific constants that are implicit in the instruction itself. The commands are: iconst_< I > (I from -1 to 5), lCONST_ < L > (1 from 0 to 1), fCONST_ < F > (f from 0 to 2), dCONST_ < D > (D from 0 to 1), aconst_NULL. For instance,

  • Iconst_m1 pushes -1 onto the operand stack;
  • Iconst_x (x is 0 to 5) pushes x onto the stack;
  • Lconst_0 and LCONST_1 push long integers 0 and 1, respectively.
  • Fconst_0, fCONST_1, and fCONST_2 push floating point numbers 0, 1, and 2, respectively.
  • Dconst_0 and dCONST_1 push double 0 and 1, respectively;
  • Aconst_null pushes null onto the operand stack;

It is not difficult to find the rule from the naming of instructions. The first character of an instruction mnemonic always likes to represent data types. I represents an integer, L represents a long integer, F represents a floating point number, D represents a double precision floating point number, and it is customary to use a for object reference. If an instruction implies an operation parameter, it is given in underlined form.

Instruction push series: mainly includes bipush and sipush. They differ in the type of data they receive, with bipush receiving 8-bit integers as arguments and SIpush receiving 16-bit integers, and both pushing arguments onto the stack.

Instruction LDC series: if the above instructions do not meet the requirements, then you can use universal

  • The LDC instruction, which takes an 8-bit argument to the index of an int, float, or String in the constant pool and pushes the specified contents onto the stack.
  • Similarly, LDC_W takes two 8-bit parameters and can support a range of indexes larger than LDC.
  • If the element to be pressed is of type long or double, the ldC2_w directive is used similarly

The summary is as follows:

type Constant instructions The scope of
int(boolean,byte,char,short) iconst [- 1, 5]
bipush [- 128, 127]
sipush [- 32768, 32767]
ldc any int value
long lconst 0, 1
ldc any long value
float fconst 0, 1, 2
ldc any float value
double dconst 0, 1
ldc any double value
reference aconst null
ldc String literal, Class literal

For example:

2.2.3. Load local variable table instructions out of the stack

Istore stores int values into local variables

Lstore stores values of type long into local variables

Fstore stores a float value into a local variable

Dstore stores values of type double into local variables

Astore stores a reference type or returnAddress type value into a local variable

Istore_0 stores a value of type int into the local variable 0

Istore_1 stores a value of type int to the local variable 1

Istore_2 stores a value of type int to local variable 2

Istore_3 stores a value of type int to the local variable 3

Lstore_0 stores a value of type long into the local variable 0

Lstore_1 stores a value of type long into the local variable 1

Lstore_2 stores a value of type long into local variable 2

Lstore_3 stores a value of type long into local variable 3

Fstore_0 stores a value of type float into the local variable 0

Fstore_1 stores a float value into the local variable 1

Fstore_2 stores a value of type float into local variable 2

Fstore_3 stores a value of type float into local variable 3

Dstore_0 stores a value of type double into the local variable 0

Dstore_1 stores a value of type double into the local variable 1

Dstore_2 stores a value of type double into local variable 2

Dstore_3 stores a value of type double into local variable 3

Astore_0 stores a reference type or returnAddress type value into the local variable 0

Astore_1 stores a reference type or returnAddress type value into the local variable 1

Astore_2 stores a reference type or returnAddress type value into local variable 2

Astore_3 stores the reference type or returnAddress type value into the local variable 3

Iastore stores int values into arrays

Lastore stores values of type long into an array

Fastore stores a float value into an array

Dastore stores a value of type double into an array

Aastore stores reference type values into an array

Bastore stores byte or Boolean values into arrays

Castore stores char values into arrays

Sastore stores short values into an array

Wide instruction

Wide extends the local variable index with additional bytes

Out of the stack load local variable table common instruction set

xstore_n xstore_0 xstore_1 xstore_2 xstore_3
istore_n istore_0 istore_1 istore_2 istore_3
lstore_n lstore_0 lstore_1 lstore_2 lstore_3
fstore_n fstore_0 fstore_1 fstore_2 fstore_3
dstore_n dstore_0 dstore_1 dstore_2 dstore_3
astore_n astore_0 astore_1 astore_2 astore_3

Out of the stack load local variable table instruction analysis

The off-stack loading local variable table instruction is used to load the specified position of the local variable table after the top element in the operand stack is popped, and used to assign values to local variables. These instructions are mainly in the form of store, such as xstore (x is I, L, f, D, a) and xstore_n (x is I, L, f, D, a, and n is 0 to 3).

  • Where the instruction istore_n pops an integer from the operand stack and values it to the local variable index n position.
  • Since the instruction xstore has no implicit parameter information, it needs to provide a byte parameter class to specify the location of the target local variable table.

Note: In general, commands like store require an argument that specifies the position in the local variable table to place the pop-up element. However, in order to minimize the instruction size, a special ISTore_1 instruction is used to indicate that the pop-up element is placed at position 1 in the local variable table. Similarly, istore_0, istore_2, and istore_3 represent an element popped from the top of the operand stack at positions 0, 2, and 3 of the local variable table, respectively. Because the first few positions of the local variable table are often used, this increases the number of instructions but greatly reduces the size of the generated bytecode. If the local variable table is large and needs to be stored in more than 3 slots, the istore instruction can be used with an additional parameter indicating the slot location to be stored.

For example:

2.3. Arithmetic instructions

Role 2.3.1.

Arithmetic instructions are used to perform a particular operation on the values on two operand stacks and push the result back onto the operand stack.

Classification of 2.3.2.

Roughly speaking, arithmetic instructions can be divided into two types: those that operate on integer data and those that operate on floating-point data.

The byte, short, CHAR, and Boolean types are described

Within each category, there are special arithmetic instructions for specific Java virtual machine data types. However, there is no direct support for byte, short, CHAR, and Boolean arithmetic instructions. These data operations are handled by ints. In addition, arrays of types Boolean, byte, short, and CHAR are converted to use the corresponding bytecode instructions of type int.

2.3.4. Overflow during operation

Data operations can cause overflows, such as the addition of two large positive integers, which can result in a negative number. The Java VIRTUAL machine specification does not specify the ArithmeticException result of overinteger data overflow, but only the division instruction and the ArithmeticException that the virtual machine throws when the divisor is 0 in the ArithmeticException instruction.

2.3.5. Operation mode

Rounding to nearest: The JVM requires that when performing floating-point calculations, all results must be rounded to the appropriate precision, and inexact results must be rounded to the nearest exact value that can be represented. If two representable forms are as close as this value, the least significant bit zero is preferred.

Rounding to zero: When converting a floating point number to an integer, this mode selects the closest, but no greater, number of the target value type as the most accurate rounding result;

2.3.6. NaN value is used

When an operation produces an overflow, it is represented by a signed infinity, and if the result of an operation is not mathematically defined, it is represented by a NaN value. And all arithmetic operations that use NaN values as operands return NaN;

2.3.7. All arithmetic instructions

Integer arithmetic

Iadd performs addition of int type

Ladd performs addition of type long

Isub performs subtraction of type int

Lsub performs subtraction of type long

Imul performs multiplication of type int

Lmul performs multiplication of type long

Idiv performs division of type int

Ldiv performs division of type long

Irem computes the remainder of an int division

Lrem computes the remainder of a division of type long

Ineg Reverses the value of an int

Lneg inverts a value of type long

Iinc adds a constant value to a local variable of type int

Logical operations

Shift operation

Ishl performs a shift of type int to the left

LSHL performs a shift to the left of type long

Ishr performs a shift to the right of type int

LSHR performs a shift to the right of type LONG

Iushr performs a logical shift to the right of int

Lushr performs a logical shift to the right of type LONG

Bitwise Boolean operation

Iand performs logical and operations on values of int type

Land performs logical and operations on values of type long

Ior performs logical or operations on values of type int

Lor performs logical or operations on values of type long

Ixor performs logical xor operations on values of int type

Lxor performs logical xor operations on values of type long

Floating point arithmetic

Fadd performs float addition

Dadd performs addition of type double

Fsub performs subtraction of type float

Dsub performs subtraction of type double

Fmul performs a float multiplication

Dmul performs multiplication of type double

Fdiv performs a float division

Ddiv performs a division of type double

Frem computes the remainder of a float division

Drem evaluates the remainder of a division of type double

Fneg inverts a value of type float

Dneg inverts a value of type double

Arithmetic instruction set

Arithmetical command int(boolean,byte,char,short) long float double
Add instruction iadd ladd fadd dadd
Subtraction instructions isub lsub fsub dsub
Multiplication instructions imul lmul fmul dmul
Division instructions idiv ldiv fdiv ddiv
For more instruction irem lrem frem drem
Take the instruction ineg lneg fneg dneg
On the instructions iinc
Bit operation instruction By bit or instruction ior lor
By bit or instruction ior lor
By bit and instruction iand land
Bitwise xOR instruction ixor lxor
More instructions lcmp fcmpg / fcmpl dcmpg / dcmpl

Example of arithmetic instruction

For example 1

public static int bar(int i) {
	return ((i + 1) - 2) * 3 / 4;
}
Copy the code

For example 2

public void add(a) {
	byte i = 15;
	int j = 8;
	int k = i + j;
}
Copy the code

For example, 3

public static void main(String[] args) {
	int x = 500;
	int y = 100;
	int a = x / y;
	int b = 50;
	System.out.println(a + b);
}
Copy the code

2.4. Type conversion instructions

A conversion instruction can convert two different numeric types to and from each other.

These conversions are typically used to implement explicit type conversions in user code, or to deal with problems where data type-related instructions in the bytecode instruction set do not have a one-to-one mapping to the data type.

Wide type conversion

I2l converts int to long

I2f converts int data to float

I2d converts an int to a double

L2f converts long data to float

L2d converts long data to double

F2d converts a float to a double

Narrow type conversions

I2b converts int data to byte

I2c converts int data to char

I2s converts int data to short

L2i converts long data to int

F2i converts a float to an int

F2l converts data of type float to long

D2i converts a double to an int

D2l converts a double to a long

D2f converts data of type double to float

Type conversion instruction set

byte char short int long float double
int i2b i2c i2s a. i2l i2f i2d
long l2i i2b l2i i2c l2i i2s l2i a. l2f l2d
float f2i i2b f2i i2c f2i i2s f2i f2l a. f2d
double d2i i2b d2i i2c d2i i2s d2i d2l d2f a.

2.4.1. Broaden type conversion instructions

Number Conversions for the design of experiments

  1. Transformation rules

The Java VIRTUAL machine directly supports wide type conversion of the following values (by the creation of numeric Conversion for the Korean manufacturing Process). That is, there is no instruction to execute, including

  • From int to long, float, or double. The corresponding instructions are: i21, i2F, i2D

  • From long to float to double. The corresponding instructions are i2f and i2D

  • From float to double. The corresponding instruction is: f2d

Int –>long–>float –> double

  1. Accuracy loss problem
  • A wide conversion does not lose information by exceeding the maximum value of the target type. For example, converting from int to long or from int to double does not lose any information, and the values are exactly the same.

  • Conversion from int or long to float, or from long to double, may result in loss of precision — several least significant bits of value may be lost, resulting in a float value that is the correct integer value from IEEE754’s closest approximation to the inclusion mode.

Although it is actually possible to lose precision with a wide cast, such a cast will never cause the Java Virtual machine to throw a runtime exception

  1. added

Wide-type conversions from byte, char, and short to int are virtually non-existent. When the byte type is converted to int, the simulator does no actual conversion, but simply swaps the two data through the operand stack. When converting byte to long, i2L is used. It can be seen that internally, byte is treated as int, and short is treated similarly. This processing method has two characteristics:

On the one hand, the actual data type can be reduced. If a set of instructions is prepared for both short and byte, the number of instructions will increase greatly. However, the current design of virtual machine only wants to use one byte to represent instructions, so the total number of instructions cannot exceed 256. It also makes sense to treat short and byte as ints.

On the other hand, since slots in the local variable table are fixed at 32 bits, whether byte or short are stored in the local variable table, 32 bits of space will be occupied. From this point of view, there is no need to distinguish between these data types.

2.4.2. Narrow type conversion instructions

Narrowing type Conversion (Numeric Conversion)

  1. Transformation rules

The Java virtual machine also directly supports the following narrow type conversions:

  • From primary int to byte, short, or char. The corresponding instructions are: I2B, I2C, i2S

  • From long to int. The corresponding instruction is l2I

  • From float to int or long. The corresponding commands are: f2I, f2L

  • From double to int, long, or float. The corresponding commands are: D2I, D2L, and D2F

  1. Accuracy loss problem

Narrowing type conversions can result in conversion results with different signs and orders of magnitude, and therefore, the conversion process is likely to result in numeric loss of precision.

Although data-type narrowing conversions can cause upper bound overflows, lower bound overflows, and loss of precision, narrowing conversions that are explicitly specified in the Java Virtual Machine specification for numeric types can never cause a virtual machine to throw a runtime exception

  1. added

When a floating-point value is narrowed to an integer type T(T is limited to either int or long), the following conversion rules are followed:

  • If the floating-point value is NaN, the result of the conversion is 0 of type int or long.

  • If the floating-point value is not infinite, the integer value v is rounded using IEEE754’s incline to zero mode. If v is within the representation of the target type T(int or long), the conversion result is V. Otherwise, v will be converted to the largest or smallest positive number that T can represent, depending on the sign of v

When narrowing a double to float, the following conversion rules are followed by rounding the nearest number to a number that can be represented using float. The final result is judged according to the following three rules:

  • If the absolute value of the conversion result is too small to be represented using float, a plus or minus zero of type float is returned

  • If the absolute value of the conversion result is too large to be represented using float, positive or negative infinity of type float is returned.

  • NaN values of type double are converted to NaN values of type float by convention.

2.5. Object creation and access instructions

Java is an object-oriented programming language, and the virtual machine platform has made deep support for object-oriented from the level of bytecode. There are a series of instructions dedicated to object manipulation, which can be further subdivided into create instructions, field access instructions, array manipulation instructions, and type checking instructions.

Object operation instruction

New Creates a new object

Getfield retrieves the field from the object

Putfield Sets the value of a field in an object

Getstatic gets static fields from the class

Putstatic Sets the value of the static field in the class

Checkcast Determines that the object is of the given type. With the target class, determine whether the top element is an instance of the target class/interface. If not, throw an exception

Instanceof determines whether an object is of a given type. With the target class, determine whether the top element is an instance of the target class/interface. If yes, press 1, otherwise, press 0

Array manipulation instruction

Newarray allocates a newarray of data members of type basic data

Anewarray allocates anewarray of data members of type reference

Arraylength gets the length of the array

Multianewarray allocates anew multidimensional array

2.5.1. Create instructions

Create instruction meaning
new Creating a class instance
newarray Create an array of primitive types
anewarray Creates an array of reference types
multilanewarra Creating multidimensional arrays

Although class instances and arrays are objects, the Java virtual machine creates and manipulates class instances and arrays using different bytecode instructions:

  1. Directives for creating class instances:
    • Directive to create class instances: new
    • It takes an operand, an index to the constant pool, representing the type to be created, and when it’s done, pushes the reference to the object onto the stack.
  2. Instructions for creating arrays:
    • Instructions for creating arrays: newarray, anewarray, multianewarray
    • The above creation instructions can be used to create objects or arrays, and because objects and arrays are widely used in Java, they are often used.

2.5.2. Field access instruction

Field access instruction meaning
Getstatic, putstatic An instruction that accesses class fields (static fields, or class variables)
, getfield, putfield Directives that access class instance fields (non-static fields, or instance variables)

Once an object is created, you can retrieve fields or array elements in an object instance or array instance through object access instructions.

  • Directives to access class fields (static fields, or class variables) : getStatic, putStatic
  • Directives that access class instance fields (non-static fields, or instance variables) : getField, putfield

For example, the getStatic directive contains an operand that is the Fieldref index to the constant pool. Its job is to get the object or value Fieldref specified and push it onto the operand stack.

public void sayHello(a) {
    System.out.println("hel1o"); 
}
Copy the code

Corresponding bytecode instructions:

0 getstatic #8 <java/lang/System.out>
3 ldc #9 <hello>
5 invokevirtual #10 <java/io/PrintStream.println>
8 return
Copy the code

Here is:

2.5.3. Array manipulation instructions

Array operation instructions mainly include: Xastore and Xaload instructions. Specific as follows:

  • The instruction to load an array element into the operand stack: baload, caload, Saload, iaload, laload, faload, daload, aaload
  • Instructions to store the values of an operand stack in an array element: Bastore, Castore, sastore, iastore, lastore, fastore, dastore, aastore
An array of instruction byte(boolean) char short long long float double reference
xaload baload caload saload iaload laload faload daload aaload
xastore bastore castore sastore iastore lastore fastore dastore aastore

The instruction to take the length of an array: arrayLength. This instruction pops the array element at the top of the stack, gets the length of the array, and pushes the length onto the stack.

The xaload instruction pushes the elements of an array, such as saload and caload, which push the elements of an array into a short array and a char array respectively. When executing the instruction Xaload, it requires that the top element of the stack in the operand be array index I and the second element of the top of the stack be array reference A. The instruction will pop the top two elements and push a[I] back onto the stack.

Xastore is specific to array operations. Iastore, for example, is used to assign a value to a given index of an int array. Before iastore executes, three elements need to be prepared at the top of the operand stack: value, index, and array reference. Iastore pops these three values and assigns them to the location of the specified index in the array.

2.5.4. Type checking instruction

Directives that check class instances or array types: instanceof, checkcast.

  • The checkcast directive checks whether a cast can be cast. If it can, the checkcast directive does not change the operand stack, otherwise it throws a ClassCastException.
  • The instanceof directive is used to determine whether a given object is an instanceof a class, pushing the result onto the operand stack.
Type checking instruction meaning
instanceof Check whether type casting can take place
checkcast Determines whether a given object is an instance of a class

2.6. Method calls and return directives

Method call instruction

The InvokCVirtual runtime invokes instance methods based on the object’s class

Invokespecial invokes instance methods based on compile-time types

Invokestatic invokes class (static) methods

Invokcinterface invokes interface methods

Method return instruction

Ireturn returns int data from the method

Lreturn returns data of type long from the method

Freturn returns data of type float from the method

Dreturn returns data of type double from the method

Areturn returns referenced type data from the method

Return Returns void from a method

2.6.1. Method call instructions

Method invocation instructions: Invokevirtual, InvokeInterface, Invokespecial, Invokestatic, invokeDynamic. The following 5 instructions are used for method invocation:

  • The Invokevirtual directive is used to invoke instance methods of an object and dispatch (virtual method dispatch) according to the actual type of the object, supporting polymorphism. This is also the most common method dispatch method in the Java language.
  • The InvokeInterface directive is used to invoke an interface method, which searches at run time for the interface method implemented by a particular object and finds an appropriate method to invoke.
  • The Invokespecial directive is used to invoke instance methods that require special processing, including instance initializer methods (constructors), private methods, and superclass methods. These methods are statically typed and are not distributed dynamically when called.
  • The Invokestatic directive is used to invoke class methods (static methods) in named classes. This is statically bound.
  • The invokedynamic directive, which invokes dynamically bound methods, is new in JDK1.7. Used to dynamically resolve the method referenced by the call point qualifier at run time and execute that method.
  • The dispatch logic for invokeDynamic instructions is determined by the bootstrapped method specified by the user, whereas the dispatch logic for the previous four invokedynamic instructions is solidified inside the Java VIRTUAL machine.
Method call instruction meaning
invokevirtual Invokes the instance method of the object
invokeinterface Calling interface methods
invokespecial Invoke instance methods that require special handling, including instance initializer methods (constructors), private methods, and superclass methods
invokestatic Calling class methods in named classes (static methods)
invokedynamic Invoke dynamically bound methods

2.6.2. Method return instruction

Before the method call ends, a return is required. Method return directives are distinguished by the type of return value.

  • These include iReturn (used when the return value is Boolean, byte, CHAR, short, and int), lReturn, freturn, dreturn, and Areturn
  • There is also a return directive for methods declared as void, instance initializers, and class initializers for classes and interfaces.
Method return instruction void int long float double reference
xreturn return ireturn lreturn freutrn dreturn areturn

With the iReturn instruction, the top element of the operand stack of the current function is ejected and pushed into the operand stack of the caller (because the caller is concerned about the return value of the function), and all other elements in the operand stack of the current function are discarded.

If the synchronized method is currently returned, an implicit Monitorexit directive is also executed to exit the critical section.

Finally, the entire frame of the current method is discarded, the caller’s frame is restored, and control is transferred to the caller.

For example:

public int methodReturn(a) {
    int i = 500;
    int j = 200;
    int k = 50;
    
    return (i + j) / k;
}
Copy the code

Here is:


2.7. Operand stack management instructions

The operand stack management instructions provided by the JVM can be used to manipulate the operand stack instructions directly, just as they would in a normal data structure.

Such directives include the following:

  • Pop one or two elements off the top of the stack and discard them:Pop, pop2
  • Copy one or two values from the top of the stack and push the copied value or double copy back to the top:Dup, DUP2, DUp_x1, DUp2_ ×1, DUP_ ×2, DUp2_ ×2
  • Swap the two slots at the top of the stack:swap. The Java virtual machine does not provide instructions to exchange two 64-bit data types (long, double).
  • instructionnop, is a very special instruction whose bytecode is 0x00. Like NOP in assembly language, it means to do nothing. This instruction can be used for debugging, placeholders, etc.

These instructions are generic and do not specify a data type for pushing or ejected from the stack.

  • Instructions without _x copy and push data to the top of the stack. There are two instructions,The dup, dup2. The dUP coefficient represents the number of slots to be copied. The dUP command is used to copy the data of one Slot. For example, 1 int or 1 Reference type data dup2 instructions are used to copy data from 2 slots. For example, 1 long, or 2 ints, or 1 int+1 float
  • The _x instruction copies the top of the stack and inserts it somewhere below the top. There are four instructions,Dup_ x 1, dup2_ x 1, dup_ x 2, dup2 x 2. For copy insert instructions with _x, simply add the dUP of the instruction and the coefficient of x, and the result is the position to insert. Therefore, 1+1=2, that is, 1+2=3, that is, 1+2=3, that is, the three slots at the top of the stack. Insert dup2 x 1 into the Slot 2+1=3, that is, under the three slots at the top of the stack
  • Pop: Removes the value of the Slot at the top of the stack. For example, a value of type short
  • Pop2: removes the two slots at the top of the stack. For example, 1 double, or 2 ints

Generic (no type) stack operation

Nop does nothing

Pop Pops a word at the top of the stack

Pop2 pops two words at the top of the stack

Dup copies the top of the stack one word long

Dup_x1 copies the one word at the top of the stack and then pushes the copy along with the original popup of two words

Dup_x2 copies the one word at the top of the stack, and then pushes the copied content onto the stack along with the original pop-up three words

Dup2 copies the top two words of the stack

Dup2_x1 copies the top two words of the stack and then pushes the copied contents onto the stack along with the original pop-up three words

Dup2_x2 copies the top two words of the stack and then pushes the copied contents onto the stack along with the original pop-up four words

Swap swaps the top two words of the stack

2.8. Control transfer instruction

Conditional control is indispensable to program flow. In order to support conditional jump, virtual machine provides a large number of bytecode instructions, which can be broadly divided into

  • 1) Compare instructions,
  • 2) Conditional jump instruction,
  • 3) Compare conditional jump instructions,
  • 4) Multi-conditional branch jump instruction,
  • 5) Unconditional jump instruction, etc.

More instructions

LCMP compares values of type long

FCMPL compares float values (-1 when NaN is encountered)

FCMPG compares values of type float (returns 1 when NaN is encountered)

DCMPL compares values of type double (-1 when NaN is encountered)

DCMPG compares values of type double (returns 1 when NaN is encountered)

Conditional jump instruction

If ifeq equals 0, jump

If ifne is not equal to 0, the system jumps

If iFLT is less than 0, the system jumps

If ifGE is greater than or equal to 0, the system jumps

If ifgt is greater than 0, the system jumps

If ifLE is less than or equal to 0, the system jumps

Compare conditional branch instructions

If_icmpeq Jumps if two int values are equal

If_icmpne Jumps if two ints are not equal

If_icmplt Jumps if one int value is less than another int

If_icmpge Jumps if one int value is greater than or equal to another int

If_icmpgt If one int value is greater than another int value, this command jumps

If_icmple Jumps if one int is less than or equal to another int

Ifnull If equal to null, the jump

Ifnonnull If the value is not null, the system jumps

If_acmpeq Jumps if two object references are equal

If_acmpne Jumps if two object references are not equal

Multi-conditional branch jump instruction

Tableswitch Accesses the hop table based on indexes and hops

Lookupswitch accesses jump tables by key value matching and performs jumps

Unconditional jump instruction

Goto Unconditional jump

Goto_w Unconditional jump (wide index)

2.8.1. Comparison instruction

The compare instruction compares the size of the two elements at the top of the stack and pushes the result onto the stack. The comparison commands are DCMPG, DCMPL, FCMPG, FCMPL, and LCMP. Like the previous instructions, the first character d represents type double, f represents float, and L represents long.

For double and float numbers, there are two versions of the comparison instruction each due to NaN. For float, there are FCMPG and FCMPL directives. The difference between them is that when a NaN value is encountered during a numeric comparison, the result is different.

Instruction DCMPL and DCMPG are similar, and their meanings can be inferred from their names, which will not be described here.

Data of numeric type can only talk about size! Boolean, reference data types cannot compare sizes.

For example,

The instructions FCMP and FCMPL eject two operands from the stack and compare them. Set the top element as v2 and the second element in the top order as v1: if v1=v2, press 0; If v1>v2, press 1; If v1 is less than v2, press -1.

The difference between the two instructions is that FCMPG presses 1 if a NaN value is encountered, while FCMPL presses -1

2.8.2. Conditional jump instruction

Conditional jump instructions are usually used in conjunction with comparison instructions. Before the conditional jump instruction is executed, the comparison instruction can be used to prepare the top element of the stack, and then the conditional jump can be performed.

Conditional jump instructions are: IFEq, IFLT, IFLE, IFNE, IFGT, IFGE, IFNULL, IFnonNULL. Each of these instructions receives a two-byte operand that calculates the jump position (a 16-bit signed integer as the offset for the current position).

Their unified meaning is: pop the top element of the stack, test whether it meets a certain condition, if so, jump to a given location.

< < = = = ! = > = > null not null
iflt ifle ifeq ifng ifge ifgt ifnull ifnonnull

Same as the previous operation rules:

  • Boolean, byte, CHAR, short conditional branch comparisons are performed using int comparison instructions
  • Conditional branch comparisons of long, float, and double are performed using a comparison instruction that returns an integer value to the operand stack, followed by an int conditional branch comparison to complete the branch jump

The Java virtual machine provides the most abundant and powerful int-type conditional branch instructions, because all types of comparisons eventually become int-type comparisons.

2.8.3. Compare conditional jump instructions

The comparison conditional jump instruction is similar to the combination of comparison instruction and conditional jump instruction, which combines the two steps of comparison and jump into one.

These are if_ICMPEq, IF_ICMPne, IF_ICMplt, IF_ICMPgt, IF_ICmple, IF_ICMPGE, IF_ACMPEq, and IF_ACMPne. Where the instruction mnemonic is added with “if_”, the instruction beginning with the character “I” is for it integer operations (including short and byte types), and the instruction beginning with the character “A” represents the comparison of object references.

< < = = = ! = > = >
if_icmplt if_icmple If_icmpeq, if_acmpeq If_icmpne, if_acmpne if_icmpge if_icmpgt

Each of these instructions takes a two-byte operand as an argument that calculates the jump location. At the same time, two elements need to be prepared at the top of the stack for comparison when executing the instruction. When the instruction completes, the top two elements of the stack are cleared and no data is added to the stack. If the default condition is true, the jump is executed; otherwise, the next statement continues.

2.8.4. Multi-conditional branch instruction

The multi-conditional branch command is designed for switch-case statements, including tableswitch and lookupswitch.

Instruction names describe
tableswitch Used for switch conditional jump, case value continuous
lookupswitch Used for switch conditional jump, case value is not consecutive

Both are mnemonic implementations of switch statements. The difference is:

  • Tableswitch Requires that multiple conditional branch values be consecutive, it stores only the start and end values, as well as a number of jump offsets. With the given operand index, the jump offsets can be immediately located, soRelatively high efficiency.
  • Lookupswitch houses discrete case-offset pairs inside, each execution will search all case-offset pairs, find the matching case value, and calculate the jump address according to the corresponding offset, soLow efficiency.

The following figure shows the diagram of tableswitch. Because the case values of tableswitch are consecutive, you only need to record the lowest value, highest value, and offset offset of each item. You can directly locate the offset based on the specified index value by simple calculation.

The lookupswitch instruction deals with discrete case values, but for efficiency reasons, the case-offset pair is sorted according to the size of the case value. When an index is given, the case equal to the index needs to be searched to obtain its offset. If no case can be found, the switch goes to default. The instruction lookupswitch is shown below.

2.8.5. Unconditional jump instruction

At present, the main unconditional jump instruction is GOto. The instruction goto takes two bytes of operands, which together form a signed integer that specifies the offset of the instruction. The purpose of the instruction is to jump to the given offset.

If the instruction offset is too large, beyond the range of two-byte signed integers, the instruction goto_w can be used, which has the same effect as goto, but accepts 4-byte operands and can represent a larger address range.

Although the JSR, jSR_w, and ret commands also jump unconditionally, they are mainly used in try-finally statements and have been gradually abandoned by VMS. Therefore, they are not introduced here.

Instruction names describe
goto Unconditional jump
goto_w Unconditional jump (wide index)
jsr Jumps to the specified 16-bit offset position and pushes the address of the next JSR instruction to the top of the stack
jsr_w Jumps to the specified 32-bit offer location and pushes the address of the next jSR_w instruction to the top of the stack
ret Return to the instruction location given by the specified local variable (commonly used in conjunction with JSR and jSR_w)

2.9. Exception handling instructions

Exception handling instruction

Athrow throws an exception or error. Throws the top of the stack exception

JSR jumps to the subroutine

Jsr_w Jumps to a subroutine (wide index)

Ret returns from the subroutine

Athrow instruction

The operations (throws) that display an exception thrown in a Java program are implemented by the Athrow instruction.

In addition to using throw statements to display thrown exceptions, the JVM specification specifies that many runtime exceptions are automatically thrown when other Java virtual machine instructions detect an exception condition. For example, in the integer arithmetic described earlier, the virtual machine throws an ArithmeticException in the idiv or 1div instruction when the divisor is zero.

Normally, the push and eject of the operand stack is done one instruction at a time. The only exception is when the Java virtual machine clears everything on the operand stack when an exception is thrown, and then pushes the exception instance onto the caller’s operand stack.

Handle exceptions

In Java virtual machines, catch statements are handled not by bytecode instructions (JSR, RET instructions were used earlier), but by exception tables.

Exception table

If a method defines a try-catch or try-finally exception handling, an exception table is created. It contains information for each exception handler or finally block. The exception table holds each exception handling information. Such as:

  • The starting position
  • End position
  • The program counter records the offset address of the code processing
  • Index of the captured exception class in the constant pool

When an exception is thrown, the JVM looks for a matching handle in the current method. If it doesn’t find one, the method is forced to terminate and the current stack frame is popped, and the exception is rethrown to the upper method called (in the calling method stack frame). If no suitable exception handling is found before all stack frames are ejected, the thread terminates. If this exception is thrown in the last non-daemon thread, it will cause the JVM itself to terminate, for example if the thread is a main thread.

Whenever an exception is thrown, if the exception handling ends up matching all exception types, the code continues to execute. In this case, if the method ends without throwing an exception, the finally block is still executed, and it jumps directly to the finally block to accomplish its goal before returning

2.10. Synchronize control instructions

The Java virtual machine supports two synchronization structures: method-level synchronization and synchronization of a sequence of instructions within a method, both of which are supported using Monitor

2.10.1. Method level synchronization

Method-level synchronization: it is implicit, that is, it is not controlled by bytecode instructions, and it is implemented in method calls and return operations. A virtual machine can tell whether a method is declared to be synchronized from the ACC_SYNCHRONIZED access flag in the method table structure of the method constant pool.

When a method is called, the calling instruction checks that the ACC_SYNCHRONIZED access flag of the method is set.

  • If set, the thread of execution will first hold the synchronization lock and then execute the method. Finally, the synchronization lock is released when the method completes, either normally or abnormally.
  • During method execution, the executing thread holds the synchronous lock, and no other thread can acquire the same lock again.
  • If a synchronized method throws an exception during execution and cannot handle the exception inside the method, the lock held by the synchronized method is automatically released when the exception is thrown outside the synchronized method.

For example:

private int i = 0;
public synchronized void add(a) {
	i++;
}
Copy the code

Corresponding bytecode:

0  aload_0
1  dup
2  getfield #2 <com/atguigu/java1/SynchronizedTest.i>
5  iconst_1 
6  iadd
7  putfield #2 <com/atguigu/java1/SynchronizedTest.i>
10 return
Copy the code

This code is no different from regular non-sync code, without using Monitorenter and Monitorexit for sync zone control.

This is because for synchronized methods, when the virtual machine determines that the method is a synchronized method by its access identifier, it automatically locks the method before calling it. When the synchronized method completes, the virtual machine releases the lock regardless of whether the method ends normally or an exception is thrown.

Thus, for synchronous methods, the Monitorenter and Monitorexit directives are implicit and do not appear directly in the bytecode.

2.10.2. Synchronization of instruction sequence within method

To synchronize a sequence of instructions: typically represented by a block of synchronized statements in Java. The JVM’s instruction set includes monitorenter and Monitorexit directives to support the semantics of the synchronized keyword.

When a thread enters a block of synchronized code, it uses the Monitorenter directive to request entry. If the monitor counter of the current object is 0, it is allowed to enter, if it is 1, it determines whether the thread holding the current monitor is itself, if so, it enters, otherwise it waits until the monitor counter of the object is 0 before it is allowed to enter the synchronized block.

When a thread exits a synchronized block, it needs to declare exit using MonitoreXit. In the Java Virtual machine, any object has a monitor associated with it to determine whether the object is locked. When the monitor is held, the object is locked.

The monitoRenter and MonitoreXit directives both execute by pressing an object at the top of the operand stack. The MonitoRenter and Monitorexit lock and release the object against the monitor.

The following figure shows how the monitor protects the critical section code from being accessed by multiple threads at the same time. Only after thread 4 leaves the critical section can threads 1, 2, and 3 enter.

Corresponding bytecode:

 0: aloade
 1: dup
 2: astore_1
 3: monitorenter
 4: aload_0
 5: dup
 6: getfield #2 //Field i:I
 9: iconst_1
10: isub
11: putfield #2 //Field i:I
14: aload_1
15: monitorexit
16: goto 24
19: astore_2
26: aload_1
21: monitorexit
22: aload_2
23: athrow
24: return

Exception table:
	from to target type
	4	 16	   19  any
	19	 22    19  any
Copy the code

The compiler must ensure that regardless of how the method completes, every Monitorenter directive called in the method must execute its corresponding Monitorexit directive, regardless of whether the method terminates normally or abnormally.

To ensure that monitorenter and Monitorexit can be paired correctly when the method exception completes, the compiler automatically generates an exception handler that claims to handle all exceptions. The exception handler is intended to execute monitorexit