Learning Java Virtual Machines

  • Learn Java Virtual Machine part 1: Memory area and garbage collection
  • Java Virtual Machine (2) : Class file structure

preface

Understanding the JVM is a basic requirement for Java programmers, but how many students like me are so obsessed with solving the bug heap layout that they forget the internal discipline and have a fragmentary understanding of the JVM? A systematic study of the JVM may take us further down the road.

Bytecode sample

HelloWorld bytecode from the previous episode:

HP-ProDesk-680-G6-PCI-Microtower-PC:~/DEBUG$ javap -verbose HelloWorld.class Classfile /home/mi/DEBUG/HelloWorld.class Last modified May 12, 2021; size 641 bytes MD5 checksum 1910a4531e5743c190636067d43d4bc4 Compiled from "HelloWorld.java" public class com.wang.javavmdemo.HelloWorld minor version: 0 major version: 52 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #3 // com/wang/javavmdemo/HelloWorld super_class: #6 // java/lang/Object interfaces: 0, fields: 1, methods: 2, attributes: 1 Constant pool: #1 = Methodref #6.#23 // java/lang/Object."<init>":()V #2 = Fieldref #24.#25 // java/lang/System.out:Ljava/io/PrintStream; #3 = Class #26 // com/wang/javavmdemo/HelloWorld #4 = String #27 // Hello World! #5 = Methodref #28.#29 // java/io/PrintStream.println:(Ljava/lang/String;) V #6 = Class #30 // java/lang/Object #7 = Utf8 HELLO_WORLD #8 = Utf8 Ljava/lang/String; #9 = Utf8 ConstantValue #10 = Utf8 <init> #11 = Utf8 ()V #12 = Utf8 Code #13 = Utf8 LineNumberTable #14 = Utf8 LocalVariableTable #15 = Utf8 this #16 = Utf8 Lcom/wang/javavmdemo/HelloWorld; #17 = Utf8 main #18 = Utf8 ([Ljava/lang/String;)V #19 = Utf8 args #20 = Utf8 [Ljava/lang/String; #21 = Utf8 SourceFile #22 = Utf8 HelloWorld.java #23 = NameAndType #10:#11 // "<init>":()V #24 = Class #31 // java/lang/System #25 = NameAndType #32:#33 // out:Ljava/io/PrintStream; #26 = Utf8 com/wang/javavmdemo/HelloWorld #27 = Utf8 Hello World! #28 =  Class #34 // java/io/PrintStream #29 = NameAndType #35:#36 // println:(Ljava/lang/String;)V #30 = Utf8 java/lang/Object  #31 = Utf8 java/lang/System #32 = Utf8 out #33 = Utf8 Ljava/io/PrintStream; #34 = Utf8 java/io/PrintStream #35 = Utf8 println #36 = Utf8 (Ljava/lang/String;)V { public com.wang.javavmdemo.HelloWorld(); descriptor: ()V flags: (0x0001) ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 3: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Lcom/wang/javavmdemo/HelloWorld; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=1, args_size=1 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #4 // String Hello World! 5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return LineNumberTable: line 7: 0 line 8: 8 LocalVariableTable: Args [Ljava/lang/String;} SourceFile: "helloWorld.java"Copy the code

Introduction to bytecode instructions

Java virtual machine instructions consist of a byte number representing the meaning of a particular operation (called Opcode) followed by zero or more parameters representing the operation (called Operand, Operand).

Except for exception handling, the Java virtual machine interpreter can use the following pseudocode as a basic execution model, which is simple but still works correctly and efficiently:

Do {automatically calculates the value of the PC register plus 1; Fetching opcodes from the bytecode stream according to the position indicated by the PC register; If (bytecode exists operands) fetches operands from the bytecode stream; Perform the operation defined by the opcode; } while (byte stream length > 0);Copy the code

Bytecode and data types

In the Java virtual machine instruction set, most instructions contain data type information for their operations. For example, the ILoAD directive loads int data from a local variable table into the operand stack, while the FLOad directive loads float data. The operations of these two instructions may be performed by the same piece of code inside the virtual machine, but they must have separate opcodes in the Class file. For most data type-related bytecode instructions, there are special characters in their opcode mnemonics to indicate which data type is served specifically: I for int, L for long, S for short, B for byte, C for char, f for float, D for double, and A for reference. There are also instructions that have no letters in the mnemonic that explicitly specify the type of operation, such as the ArrayLength instruction, which has no special characters for data types, but whose operands can only ever be an object of array type. There are other instructions, such as the unconditional jump instruction GOto, which is independent of data type.

The Instruction set of the Java virtual machine provides only a limited number of type-dependent instructions for a particular operation to support it; in other words, the instruction set will be deliberately designed not to be completely independent. (The Java Virtual Machine Specification calls this feature “Not Orthogonal,” meaning that Not every data type and every operation has corresponding instructions.) There are separate instructions that can be used to convert unsupported types to supported types if necessary.

The instruction set supported by the JAVA virtual machine

This is a map of Java virtual Machine opcodes and mnemonics by type provided by Oracle

Docs.oracle.com/javase/spec…

Load and store instructions

Load and store instructions are used to transfer data back and forth between local variables in the stack frame and the operand stack (see Learning Java Virtual Machines together: Memory regions and garbage Collection). These instructions include:

  • Load a local variable onto the operation stack: ILoad, ILoAD_, lload, lload_, fload, fload_, dload, dload_, ALOad, ALOad_
  • Store a value from the operand stack to the local variable table: ISTore, istore_, lstore, lstore_, fstore, fstore_, dstore, dstore_, astore, astore_
  • Load a constant onto the operand stack: bipush, sipush, LDC, LDC_w, LDC2_W, aconST_NULL, iconst_M1, iconst_, LCONST_, fCONST_, dCONST_
  • The operand stack and local variator that store data are mainly operated by loading and storing instructions. In addition, a few instructions, such as those that access fields of objects or array elements, also transfer data to the operand stack.

Operation instruction

Arithmetic instructions are used to perform a specific operation on the values on two operand stacks and to store the results back to the top of the stack. Basically, there are two kinds of instructions: those that operate on integer data and those that operate on floating-point data. Integer and floating – point arithmetic instructions also have different behaviors when they overflow and divide by zero. In other words, there are no arithmetic instructions that directly support byte, short, CHAR, and Boolean types. For these types of data operations, they should be replaced by instructions that operate on int types. All arithmetic instructions include:

  • Add instructions: iadd, ladd, fadd, dadd
  • Subtraction instructions: ISub, LSUB, fsub, dsub
  • Multiplication instruction: IMul, LMUl, FMUl, dMUl
  • Division instructions: IDIV, Ldiv, fdiv, ddiv
  • Redundant instructions: IREM, LREM, frem, DREM
  • Fetch counter instruction: ineG, Lneg, fNEg, dNEg
  • Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
  • Bitwise or instruction: IOR, LOR
  • Bit and instruction: IAND, LAND
  • Xor instruction by bit: IXOR, LXOR
  • Local variable increment instruction: iinc
  • Comparison commands: DCMPG, DCMPL, FCMPG, FCMPL, LCMP

Type conversion instruction

Conversion instructions convert two different numeric types to each other. These conversion operations are typically used to implement explicit conversion operations in user code, or to deal with the problem that the data type related instructions in the bytecode instruction set mentioned at the beginning of this section do not correspond to the data type. The Java VIRTUAL machine directly supports (i.e., Conversion without explicit Conversion instructions) the following Numeric types for wide type Conversion (by the creation of Numeric Conversion experiments) :

  • Int to long, float, or double
  • Long to float, double
  • Float to double

In contrast, Narrowing Numeric Conversion must be played explicitly with Conversion directives for I2b, I2C, I2S, L2I, F2I, F2L, D2I, d2L and D2F. Narrowing type conversions may result in different signs and orders of magnitude, and the conversion process may result in a loss of numerical accuracy.

Object creation and access directives

Although class instances and arrays are both objects, the Java virtual machine uses different bytecode instructions to create and manipulate class instances and arrays. After the object is created, fields or array elements in the object instance or array instance can be obtained by object access instructions. These instructions include:

  • Directive to create class instances: new
  • Instructions for creating arrays: newarray, anewarray, multianewarray
  • Directives that access class fields (static fields, or class variables) and instance fields (non-static fields, or instance variables) : getField, putfield, getStatic, putStatic
  • The instruction to load an array element into the operand stack: baload, caload, Saload, iaload, laload, faload, daload, aaload
  • Instructions to store the values of an operand stack in an array element: Bastore, Castore, sastore, iastore, fastore, dastore, aastore
  • The instruction to take the length of an array: arrayLength
  • Directives to check class instance types: instanceof, checkcast

Operand stack management instructions

As with the stack in a normal data structure, the Java virtual machine provides instructions for manipulating the operand stack directly, including:

  • Remove one or two elements from the top of the operand stack: pop, POP2
  • Duplicates one or two values and pushes them back to the top: dUP, dup2, dup_x1,
  • Dup2_x1, dup_x2, dup2_x2
  • Swap the top two values of the stack: swap

Control transfer instruction

A control transfer instruction allows the Java virtual machine to conditionally or unconditionally proceed from the next instruction at a specified location (instead of the control transfer instruction). From a conceptual model, a control instruction can be thought of as conditionally or unconditionally modifying the value of a PC register. Control transfer instructions include:

  • Conditional branch: Ifeq, IFLT, IFLE, IFNE, IFGT, IFGE, IFNULL, IFnonNULL, IF_ICMPEq, IF_ICMPne, IF_ICMPLt, IF_ICMPGT, IF_ICMPLE, IF_ICMPGE, IF_ACMPEq, and IF_ acmpne
  • Compound condition branches: Tableswitch and LookupSwitch
  • Unconditional branches: GOTO, GOTO_W, JSR, jSR_W, ret

Method calls and return directives

  • Invokevirtual directive: forInvokes the instance method of the objectAccording to the actual type of the object (virtual method dispatch), which is also the most common method dispatch method in the Java language.
  • Invokeinterface instruction: Used forCalling interface methodsAt run time, it searches for an object that implements the interface method and finds the appropriate method to call.
  • Invokespecial command: UsedCall some instance methods that require special handling, including instance initialization methods, private methods, and superclass methods.
  • Invokestatic instruction: Used toCall class static methods (static methods)).
  • Invokedynamic instruction: Used forThe method referenced by the call point qualifier is resolved dynamically at run time. And execute the method. The dispatch logic for the first four invocation instructions is fixed inside the Java virtual machine and cannot be changed by the user, whereas the dispatch logic for the InvokeDynamic instruction is determined by the bootstrapped method that the user sets. Method call directives are independent of the data type, while method return directives are differentiated by the type of value returned, including iReturn (used when the return value is Boolean, byte, CHAR, short, and int), LReturn, freturn, dreturn, and Areturn, There is also a return directive for methods declared as void, instance initializers, and class initializers of classes and interfaces.

Exception handling instruction

In addition to explicitly throwing an exception with athrow, the Java virtual machine specification specifies that many runtime exceptions are automatically thrown when other Java virtual machine instructions detect an exception condition. For example, in integer arithmetic, the virtual machine throws the ArithmeticException in the IDIV or Ldiv instruction when the divisor is zero. In the Java virtual machine, catch statements are handled not by bytecode instructions (JSR and RET instructions were used a long time ago, but are no longer used), but by exception tables.

Synchronization instructions

The Java virtual machine can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are implemented using a pipe procedure (Monitor, more commonly referred to as a “lock”). Method-level synchronization is implicit and does not need to be controlled by bytecode instructions. It is implemented in method calls and return operations. The virtual machine can tell if a method is declared to be synchronized from the ACC_SYNCHRONIZED access flag in the method table structure in the method constant pool. When a method is invoked, the calling instruction checks to see if the ACC_SYNCHRONIZED access flag of the method is set. If so, the thread of execution requires that it successfully hold the pipe before executing the method, and finally release the pipe when the method completes (either normally or abnormally). During method execution, the executing thread holds the pipe, and no other thread can retrieve the same pipe. If a synchronized method throws an exception during execution and cannot handle the exception inside the method, the pipe held by the synchronized method is automatically released when the exception is thrown outside the synchronized method boundary. Synchronizing a sequence of instructions is usually represented by the Synchronized statement block in Java language. The Java VIRTUAL machine has monitorenter and Monitorexit directives to support the semantics of synchronized. Proper implementation of the synchronized keyword requires the cooperation of both the Javac compiler and the Java virtual machine.

Let’s add a method to HelloWorld that holds a synchronization lock and look at its bytecode:

void doSomethingLocked() { synchronized (mLock) { doSomething(); }}Copy the code

Run javap -verbose helloworld.class

void doSomethingLocked(); descriptor: ()V flags: (0x0000) Code: stack=2, locals=3, args_size=1 0: aload_0 1: getfield #3 // Field mLock:Ljava/lang/Object; Access the mLock object 4: DUP // copy the top of stack element (reference to mLock) 5: astore_1 // save the top of stack element to local variable slot 1 6: Monitorenter // Start synchronization with top of stack element as lock 7: Aload_0: invokevirtual #10: // Method doSomething:()V 11: Aload_1 // pushes elements in local variable slot 1 (that is, mLock) 12: MONITorexit // exits synchronization 13: goto 21 // Method ends normally, jumps to 21 returns 16: Astore_2 // From this step is the exception path, as shown in Taget 16 of the exception table below 17: ALOAD_1 // Pushes elements of local variable slot 1 (i.e., mLock) 18: Monitorexit // Exits synchronization 19: Aload_2 // Pushes the elements of local slot 2 20: athrow // Rethrows the Exception object to the callers of doSomethingLocked() 21: return // The method normally returns Exception table: from to target type 7 13 16 any 16 19 16 any LineNumberTable: line 16: 0 line 17: 7 line 18: 11 line 19: 21 LocalVariableTable: Start Length Slot Name Signature 0 22 0 this Lcom/wang/javavmdemo/HelloWorld; StackMapTable: number_of_entries = 2 frame_type = 255 /* full_frame */ offset_delta = 16 locals = [ class com/wang/javavmdemo/HelloWorld, class java/lang/Object ] stack = [ class java/lang/Throwable ] frame_type = 250 /* chop */ offset_delta = 4Copy the code

The compiler must ensure that regardless of how the method completes, every Monitorenter directive called in the method must have its counterpart monitorexit directive, regardless of whether the method ends normally or abnormally. To ensure that monitorenter and MonitoreXit can be paired correctly when the method exception completes, the compiler automatically generates an exception handler that claims to handle all exceptions. The exception handler is intended to execute monitorexit.