Part of this article is excerpted from Understanding the Java Virtual Machine
Introduction to the
Java VIRTUAL machine instructions consist of opcodes and operands. An opcode is a one-byte number representing the meaning of a specific operation, and an operand is one or more parameters required for the operation. Because the Java virtual machine has an operand-stack-oriented architecture rather than a registrie-oriented architecture, most instructions do not include operands, only one opcode
Since the JVM opcodes are limited to one byte (0 to 255), this means that the total number of opcodes in the instruction set does not exceed 256. Gave up the compiled Class file format code operand alignment length, so the virtual machine when dealing with more than one byte of data, have to rebuild the specific data in the runtime from byte structure, it will lose some performance, but also omitted a lot of padding and space character, get short compile the code as much as possible
Bytecode and data types
In the Java virtual machine instruction set, most instructions contain information about the data type of their operations, and each data type is represented by a special character. However, the Java virtual machine opcodes are only one byte long, and if every data type-related instruction supported all the Java virtual machine runtime data types, the number of instructions would probably exceed the range represented by a single byte
As a result, the Java VIRTUAL machine provides only a limited number of type-specific instructions for a particular operation to support it, that is, there are not instructions for every data type and every operation. The following table shows the relationship between a particular operation and its supported data types. The T in the instruction can be replaced by the corresponding data type. The space indicates that the operation is not supported for the data type
opcode | byte | short | int | long | float | double | char | reference |
---|---|---|---|---|---|---|---|---|
Tipush | bipush | sipush | ||||||
Tconst | iconst | lconst | fconst | dconst | aconst | |||
Tload | iload | lload | fload | dload | aload | |||
Tstore | istore | lstore | fstore | dstore | astore | |||
Tinc | iinc | |||||||
Taload | baload | saload | iaload | laload | faload | daload | caload | aaload |
Tastore | bastore | sastore | iastore | lastore | fastore | dastore | castore | aastore |
Tadd | iadd | ladd | fadd | dadd | ||||
Tsub | isub | lsub | fsub | dsub | ||||
Tmul | imul | lmul | fmul | dmul | ||||
Tdiv | idiv | ldiv | fdiv | ddiv | ||||
Trem | irem | lrem | frem | drem | ||||
Tneg | ineg | lneg | fneg | dneg | ||||
Tshl | ishl | lshl | ||||||
Tshr | ishr | lshr | ||||||
Tushr | iushr | lushr | ||||||
Tand | iand | land | ||||||
Tor | ior | lor | ||||||
Txor | ixor | lxor | ||||||
i2T | i2b | i2s | i2l | i2f | i2d | |||
l2T | l2i | l2f | l2d | |||||
f2T | f2i | f2l | f2d | |||||
d2T | d2i | d2l | d2f | |||||
Tcmp | lcmp | |||||||
Tcmpl | fcmpl | dcmpl | ||||||
Tcmpg | fcmpg | dcmpg | ||||||
if_TcmpOP | if_icmpOP | if_acmpOP | ||||||
Treturn | ireturn | lreturn | freturn | dreturn | areturn |
As you can see, most instructions do not support byte, char, short, or Boolean. The compiler will extend byte and short data with signs to the corresponding int data at compile time or run time. Boolean and CHAR data zeros are extended to the corresponding int data, which is then processed using bytecode instructions of the corresponding int type. Therefore, most operations on Boolean, byte, short, and CHAR data are actually converted to int
Load and store instructions
Load and store instructions are used to transfer data back and forth between local variables in a stack frame and the operand stack. These instructions include:
- Load a local variable onto the operand stack: ILoad, ILoAD_ < N >, lload, lload_< N >, fload, fload_< N >, dload, dload_< N >, aload, aload_< N >
- Store a value from the operand stack to the local variable table: istore, istore_< N >, lstore, lstore_< N >, fstore, fstore_< N >, dstore, dstore_
, astore, astore_< N >
- Add a constant to the operand stack: bipush, sipush, LDC, LDC_w, LDC2_W, aconST_NULL, iconST_ML, iconst_< I >, LCONST_ < L >, fCONST_
, dCONST_
- Instruction that extends the access index of a local variable table: wide
Some of the instruction mnemonics listed above end in Angle brackets, such as ILoAD_ <n>, and actually represent the ilOAD_0, ILOAD_1, ILOAD_2, and ILOAD_3 instructions. Iload_0 is equivalent to ILoad 0, and ilOAD_1 is equivalent to ILoad 1…… They omit the displayed operands, do not fetch the operands, and are semantically identical to the native generic instructions
Operation instruction
Arithmetic instructions are used to perform a specific operation on the values on two operand stacks and to store the result back to the top of the operand stack. All arithmetic instructions include:
- Add instructions: iadd, ladd, fadd, dadd
- Subtraction instructions: ISub, LSUB, fsub, dsub
- Multiplication instruction: IMul, LMUl, FMUl, dMUl
- Division instructions: IDIV, Ldiv, fdiv, ddiv
- Redundant instructions: IREM, LREM, frem, DREM
- Fetch counter instruction: ineG, Lneg, fNEg, dNEg
- Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
- Bitwise or instruction: IOR, LOR
- Bit and instruction: IAND, LAND
- Xor instruction by bit: IXOR, LXOR
- Local variable increment instruction: iinc
- Comparison commands: DCMPG, DCMPL, FCMPG, FCMPL, LCMP
Type conversion instruction
Conversion instructions can convert two different numeric types to each other. These conversion operations are typically used to implement explicit conversion operations in user code, or for the one-to-one correspondence between data type related instructions and data types in the bytecode instruction set mentioned at the beginning
Java supports safe conversions from a small type to a large type, such as int to long, float, or double, as opposed to explicitly using a conversion instruction. These instructions include I2B, I2C, I2S, L2I, F2I, F2L, D2I, D2L, D2F the conversion process may result in loss of numerical accuracy
Object creation and access directives
Although class instances and arrays are both objects, the Java virtual machine uses different bytecode instructions to create and manipulate class instances and arrays. After the object is created, we can use the object access instruction to obtain the fields or array elements in the object or array instance:
- Create class instance directive: new
- Instructions for creating arrays: newarray, anewarray, multianewarray
- Directives that access class fields (static fields, or class variables) and instance fields (non-static fields, or instance variables) : getField, putfield, getStatic, putStatic
- The instruction to load an array element into the operand stack: baload, caload, Saload, iaload, laload, faload, daload, aaload
- Instructions to store the values of an operand stack in an array element: Bastore, Castore, sastore, iastore, fastore, dastore, aastore
- The instruction to take the length of an array: arrayLength
- Directives to check class instance types: instanceof, checkcast
Operand stack management instructions
As with the stack in a normal data structure, the Java virtual machine provides instructions for manipulating the operand stack directly, including:
- Remove one or two elements from the top of the operand stack: pop, POP2
- Copy one or two arrays to the top of the stack and push the copied or double-valued copied values back to the top: dUP, DUP2, DUp_X1, DUp2_X1, DUp_x2, dup2_x2
- Swap the top two values of the stack: swap
Control transfer instruction
Control transfer instructions allow the Java VIRTUAL machine to conditionally or unconditionally continue executing the program from the next instruction at the specified location. From the conceptual model, control instructions can be considered as conditional or unconditional modification of the VALUE of the PC register:
- Conditional branch: Ifeq, IFLT, IFLE, IFNE, IFGT, IFNULL, IFnonNULL, IF_ICMPEQ, IF_ICMPne, IF_ICMPLt, IF_ICMPGT, IF_ICMPLE, IF_ICMPGE, if_ACMPEq and if_acmpne
- Compound condition branches: Tableswitch and LookupSwitch
- Unconditional branches: GOTO, GOTO_W, JSR, jSR_W, ret
Method calls and return directives
Method invocation directives are independent of data type, whereas method return directives are differentiated by the type of return value
- The Invokevirtual directive: An instance method used to invoke an object, dispatched according to its actual type
- Invokeinterface directive: Invokes an interface method, which searches at run time for an object that implements the interface method and finds an appropriate method to invoke
- The Invokespecial directive is used to call instance methods that require special processing, including instance initialization methods, private methods, and parent methods
- Invokestatic directive: Used to invoke class static methods
- Invokedynamic instruction: Used to dynamically resolve the method referenced by the call point qualifier at run time and execute it
Exception handling instruction
In addition to throwing an exception explicitly in a Java program, the Java Virtual Machine specification specifies that many runtime exceptions are automatically thrown when other Java virtual machine instructions detect an exception condition. For catch operations, instead of bytecode instructions, exception tables are used
Synchronization instructions
The Java virtual machine can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are implemented using a pipe procedure (Monitor, or more commonly called a lock)
Method-level synchronization is implicit and implemented without bytecode instructions, in method calls and return operations. The virtual machine can tell if a method is declared to be synchronized from the ACC_SYNCHRONIZED access flag in the method table structure in the method constant pool. When a method is called, the calling instruction checks to see if the ACC_SYNCHRONIZED access flag of the method is set, and if so, the executing thread succeeds in holding the pipe first. During method execution, the executing thread holds the pipe, and no other thread can retrieve the same pipe. If an exception is thrown during the execution of a synchronized method and cannot be handled within the method, the pipe held by the synchronized method is automatically released when the exception is thrown outside the synchronized method boundary
Synchronizing a sequence of instructions is usually represented by the Synchronized statement block in Java language. The Java VIRTUAL machine has monitorenter and Monitorexit directives to support the semantics of synchronized. The sequence of instructions to be synchronized is wrapped between the two instructions to achieve the synchronization effect