This is the 26th day of my participation in Gwen Challenge

The article introduces

  • This article explains how Java code is compiled into bytecode and executed on a Java virtual machine. Understanding how Java code is compiled into bytecode and executed in the Java virtual machine is very important, because it helps you understand what happens to your program at runtime.

  • This understanding will not only ensure that you have a logical understanding of the language features, but also that you can understand the compromises and side effects of language features in specific discussions.

The number that precedes each instruction (or opcode) in the bytecode indicates the position of that byte.

  • For example, an instruction such as 1: iconst_1 is only one byte long and has no operands, so the following bytecode position is 2.

  • Another example is the instruction 1: bipush 5 will take two bytes, the opcode bipush one byte, and the operand 5 one byte.

  • So, the next byte code is in position 3, because the operand occupies bytes in position 2.

The Java virtual machine is a stack-based architecture. When a method includes the initialization of the main method, a stack frame is created on the stack, which holds the local variables of the method.

variable

A local variable

The local Veriable array contains all variables available during a method execution including a reference variable, this, all method parameters, and variables defined in the method body.

  • Class methods (e.g., static methods) start at 0.

  • Slot 0 is used to store this, so the parameter should start at 1. .

Local variable type

  • boolean
  • byte
  • char
  • long
  • short
  • int
  • float
  • double
  • reference
  • returnAddress

  • All types except long and double occupy a slot in the local variable array. Long and double require two consecutive slots because they are 64-bit types.

  • When a new variable is created on the operand stack to hold a value of the new variable. The value of this new variable is then stored in the corresponding location of the local variable array.

  • If the variable is not a basic type, the slot values store references to the variable. This reference refers to an object stored in the heap.

For example,

int i = 5;
Copy the code

Is compiled to bytecode as

0: bipush 5 (two bytes) 2: istore_0Copy the code
bipush

Push a byte as an integer onto the operand stack. In this case, 5 is pushed onto the operand stack.

istore_0

It is one of a set of operands of the form istore_n that store an integer in a local variable table.

N is the position in the local variable table and can only be 0,1,2,3. The other opcode used for cases with values greater than 3 is istore W, which puts an operand in the appropriate place in the local variable array, more on that later! .

The above code executes in memory as follows:

The class file also contains a local veribale table for each method. If this code is included in a method, you will get the following entry in the class file’s local veribale table for that method:

LocalVariableTable:
    Start  Length  Slot  Name   Signature
      0      1      1     i         I
Copy the code

Member variables (class variables)

A member variable (field) is stored on the heap as part of a class instance (or object). Information about this member variable is defined in the field_info[] array in the class file class bytecode as follows:

ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; Cp_info contant_pool [constant_pool_count - 1); u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; }Copy the code

In addition, if the variable is initialized, the bytecode for the initialization is added to the instance constructor.

When the following code is compiled:

public class SimpleClass{
    public int simpleField = 100;
}
Copy the code

An additional summary will use the Javap command to demonstrate adding member variables to the field_info array.

public int simpleField;
Signature: I
flags: ACC_PUBLIC
Copy the code

The bytecode for initialization is added to the constructor as follows:

public SimpleClass();
  Signature: ()V
  flags: ACC_PUBLIC
  Code:
    stack=2, locals=1, args_size=1
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: aload_0
       5: bipush        100
       7: putfield      #2                  // Field simpleField:I
      10: return
Copy the code

aload_0

Push an object reference in the local variable array slot to the top of the operand stack.

Although the above code shows that there is no constructor to initialize a member variable, the compiler actually creates a default constructor to initialize a member variable.

  • The first local variable actually refers to this.

  • The aload_0 opcode pushes the reference variable this onto the operand stack.

  • Aload_0 is one of a set of operands of the format aload_ that push an object reference onto the operand stack.

    1. Where n refers to the location of the object reference in the accessed local variable array, and can only be 0,1,2, or 3.
    2. Similar opcodes are iload_,lload_,fload_, and dload_, but these opcodes are used to load values rather than object references, where I is int, L is long, F is float, and D is double.
    3. Local variables with indexes greater than 3 can be loaded using iload,lload,fload,dload, and ALOad, all of which require a single operand to specify the index of the local variable to be loaded.

invokespecial

The Invokespecial directive is used to call instance methods, private methods and methods of the parent class of the current class, constructors, and so on.

Method to call a part of the method’s opcode:

  • Invokedynamic (MethodHandle, Lamdba)
  • Invokeinterface (Interface method)
  • Invokespecial (constructor, superclass method, private method)
  • Invokestatic (static method)
  • Invokevirtual (Instance method)

The Invokespecial directive is used in this code to invoke the constructor of the parent class.

bipush

Push a byte as an integer onto the operand stack. In this case 100 is pushed onto the operand stack.

putfield

This is followed by operand #2, which is a reference to a member variable in the runtime constant pool (cp_info), in this case called simpleField. Assign a value to the member variable, and the object containing the member variable is popped off the operand stack.

The previous ALOad_0 directive pushes the object containing this member variable and the previous bipush directive pushes 100 to the top of the operand stack, respectively. Putfield then removes them all from the top of the operand stack (pops). The end result is that the value of the member variable simpleFiled on this object is updated to 100.

The above code executes in memory as follows:

java_class_variable_creation_byte_code

Putfield opcodes have a single operand that points to the second location in the constant pool.

The JVM maintains a constant pool, a run-time data structure similar to a symbol table, but containing more data.

Bytecodes in Java require data, which is usually too large to be stored directly in bytecodes, but in a constant pool, which holds a reference to the constant pool. When a class file is created, part of it is a constant pool, as shown below:

Constant pool:
   #1 = Methodref          #4.#16         //  java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#17         //  SimpleClass.simpleField:I
   #3 = Class              #13            //  SimpleClass
   #4 = Class              #19            //  java/lang/Object
   #5 = Utf8               simpleField
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               SimpleClass
  #14 = Utf8               SourceFile
  #15 = Utf8               SimpleClass.java
  #16 = NameAndType        #7:#8          //  "<init>":()V
  #17 = NameAndType        #5:#6          //  simpleField:I
  #18 = Utf8               LSimpleClass;
  #19 = Utf8               java/lang/Object
Copy the code

Constants (like constants)

Variables that are modified by final are called constants and identified in class files as ACC_FINAL.

Such as:

public class SimpleClass {
    public final int simpleField = 100;
    public  int simpleField2 = 100;
}
Copy the code

ACC_FINAL parameter added to variable description:

public static final int simpleField = 100;
Signature: I
flags: ACC_PUBLIC, ACC_FINAL
ConstantValue: int 100
Copy the code

However, initialization operations in the constructor are not affected:

4: aload_0
5: bipush        100
7: putfield      #2                  // Field simpleField2:I
Copy the code

A static variable

Variables decorated with static, which we call static class variables, are identified as ACC_STATIC in the class file, as follows:

public static int simpleField;
Signature: I
flags: ACC_PUBLIC, ACC_STATIC
Copy the code

The bytecode used to initialize static variables is not found in the instance constructor. Static variables are initialized in the class constructor, using putStatic opcodes instead of putfield bytecodes as part of the class constructor.

static {};
  Signature: ()V
  flags: ACC_STATIC
  Code:
    stack=1, locals=0, args_size=0
       0: bipush         100
       2: putstatic      #2                  // Field simpleField:I
       5: return
Copy the code

Conditional statements

Conditional flow control, such as if-else and switch statements, works at the bytecode level by using an instruction to compare two values and branches with other bytecodes.

  • The for and while loops are implemented in a similar manner, except that they usually include a goto instruction for the purpose of the loop.

  • Do-while loops do not require any GOto instructions because their conditional branch is at the end of the bytecode. For more details on loops, see the Loops section.

Some opcodes can compare two integers or two references and then perform a branch in a single instruction. Comparisons between other types such as double,long, or float need to be implemented in two steps.

First, after the comparison, push 1,0 or -1 to the top of the operand stack. Next, a branch is performed based on whether the value on the operand stack is greater than, less than, or equal to zero.

First, we’ll use if-else statements as an example. Other different types of instructions for branching will be included in the following sections.

if-else

The following code shows a simple if-else statement that compares the size of two integers.

public int greaterThen(int intOne, int intTwo) {
    if (intOne > intTwo) {
        return 0;
    } else {
        return 1; }}Copy the code

This method compiles to the following bytecode:

0: iload_1
1: iload_2
2: if_icmple        7
5: iconst_0
6: ireturn
7: iconst_1
8: ireturn
Copy the code
  • First, the two parameters are pushed onto the operand stack using iloAD_1 and iload_2.
  • Then, use if_icmple to compare the two values at the top of the operand stack.
  • If intOne is less than or equal to intTwo, the operand branch becomes bytecode 7, jumping to bytecode instruction line 7line.

Note that the test in the if condition in Java code is the exact opposite of the test in bytecode, because in bytecode the else statement block is executed if the test in the if conditional succeeds, whereas in Java code the if statement block is executed if the test in the if conditional succeeds.

In other words, the if_ICmple directive tests to skip the if block if the condition is not true. The body of the if code block is bytecode with numbers 5 and 6, and the body of the else code block is bytecode with numbers 7 and 8.

java_if_else_byte_code

The following code example shows a slightly more complex example that requires a two-step comparison:

public int greaterThen(float floatOne, float floatTwo) {
    int result;
    if (floatOne > floatTwo) {
        result = 1;
    } else {
        result = 2;
    }
    return result;
}
Copy the code

This method produces the following bytecode:

 0: fload_1
 1: fload_2
 2: fcmpl
 3: ifle          11
 6: iconst_1
 7: istore_3
 8: goto          13
11: iconst_2
12: istore_3
13: iload_3
14: ireturn
Copy the code

In this example, the two parameters are first pushed to the top of the operand stack using fload_1 and fload_2. This example differs from the previous one in that it requires a two-step comparison. FCMPL first compares floatOne and floatTwo and then pushes the result to the top of the operand stack. As follows:

floatOne > floatTwo -> 1

floatOne = floatTwo -> 0

floatOne < floatTwo -> -1 floatOne or floatTwo= Nan -> 1
Copy the code

Next, ifle is used to jump to bytecode at index 11 if FCMPL results in <=0.

  • This example also differs from the previous one in that there is only a single return statement at the end of the method, and a goto directive at the end of the if block to prevent the else block from being executed.

  • The goTO branch, corresponding to bytecode iloAD_3 at number 13, is used to push the result stored in the third slot of the local variable table to the top of the sweep operand stack so that it can be returned by a return statement.

java_if_else_byte_code_extra_goto

Just as there are opcodes that compare values, there are opcodes that compare references to equality, such as ==, to null, such as == null and! = null, tests the type of an object such as instanceof.

  • if_cmp eq ne lt le gt ge This set of opcodes is used for the two integers at the top of the operand stack and jumps to a new bytecode. Desirable values are:
Eq - equal to ne - Not equal to lt - Less than LE - Less than or equal to GT - Greater than ge - Greater than or equal toCopy the code
  • The if_acmp eq ne opcodes are used to test whether two references are equal (eq) or unequal (NE), and then jump to a new new bytecode specified by the operand.

  • Ifnonnull/ifNULL these two bytecodes are used to test whether two references are null or not, and then jump to a new bytecode specified by the operand.

  • The LCMP opcode is used to compare two integers at the top of the operand stack, and then push a value to the operand stack as follows:

If value1 > value2 -> Push 1 If Value1 = value2 -> Push 0 If Value1 < value2 -> push -1

The FCMP l g/DCMP l g opcodes are used to compare two float or double values and then push one value onto the operand stack, as shown below:

If value1 > value2 -> push 1 If Value1 = value2 -> push 0 If value1 < value2 -> push -1

The difference between l or G operand endings is how they handle NaN.

  • FCMPG and DCMPG push the int value 1 to the operand stack while FCMPL and DCMPL push -1 to the operand stack. This ensures that if either of the two values is NaN (Not A Number) at test time, the test will Not succeed.

    • For example, if x > y (where x and y are both doube types), and if either x or Y is NaN, the FCMPL instruction pushes -1 to the operand stack.

    • The next opcode will always be an IFle instruction, and if this is the top value of the stack less than zero, a branch will occur. As a result, if either x or y is a NaN, ifle skips the if block to prevent the code in the if block from being executed.

  • Instanceof if the object at the top of the operand stack is an instanceof a class, this opcode pushes an int value of 1 onto the operand stack. The operands of this opcode are used to specify the class by providing an index in the constant pool. If the object is null or not an instance of the specified class the int value 0 is pushed to the operand stack.

If eq ne lt le gt ge All of these opcodes are used to compare the value at the top of the operand stack with 0, and then jump to the bytecode at the position specified by the operand.

If successful, these instructions are always used for more complex conditional logic that cannot be done with a single instruction, such as testing the result of a method call.

switch

A Java switch allowed types for char, byte, short, int, Character, byte, short, Integer, String or an enum type. To support switch statements.

Java VMS use two special commands: tableswitch and LookupSwitch, which are implemented using integer values. Using only integer values is not a problem because char,byte,short, and enum types can all be internally promoted to int.

The addition of String support in Java7 is also supported by integers. Tableswitch passes data faster, but typically consumes more memory.

Tableswitch works by listing all possible case values between the minimum and maximum case values. Minimum and maximum values are also provided, so if the switch variable is not within the range of the listed case values, the JVM immediately jumps to the default statement block. Values for case statements not provided in the Java code are also listed, but point to the default statement block, ensuring that all values between the minimum and maximum are listed.

For example, execute the following swicth statement:

public int simpleSwitch(int intOne) {
    switch (intOne) {
        case 0:
            return 3;
        case 1:
            return 2;
        case 4:
            return 1;
        default:
            return -1;
    }
Copy the code

This code produces the following bytecode:

0: iload_1
1: tableswitch   {
         default: 42
             min: 0
             max: 4
               0: 36
               1: 38
               2: 42
               3: 42
               4: 40
    }
36: iconst_3
37: ireturn
38: iconst_2
39: ireturn
40: iconst_1
41: ireturn
42: iconst_m1
43: ireturn
Copy the code

The tableswitch directive has values 0,1, and 4 to match case statements provided in Java code, with each value pointing to the bytecode of their corresponding code block. The tableswitch directive also has values 2 and 3, which are not provided as case statements in the Java code and both point to the default code block. When these instructions are executed, the values at the top of the operand stack are checked to see if they are between the maximum and minimum values. If the value is not between the minimum and maximum, code execution jumps to the default branch, which in the above example is at bytecode number 42. To ensure that the value of the default branch can be found by the tableswitch directive, it is always at the first byte (after any required alignment padding). If the value is between the minimum value and the maximum value, it is used to index tablesWitch and search for appropriate bytecode for branch forwarding.

For example, if the value is, code execution jumps to bytecode at serial number 38. The following image shows how this bytecode is executed:

java_switch_tableswitch_byte_code

If the values in a case statement are “too far apart” (i.e. too sparse), this approach is undesirable because it takes up too much memory. If there are few cases in the switch, you can use lookupswitch to replace tablesWitch. Lookupswitch instantiates the bytecode for each case statement, but does not list all possible values.

  • When lookupswitch is executed, the value at the top of the operand stack is compared with each value in lookupswitch to determine the correct branch address. With lookupSwitch, the JVM looks for the correct match in the list of matches, which is a time-consuming operation. With tablesWitch, the JVM can quickly locate the correct value.

  • When a selection statement is compiled, the compiler must make a trade-off between memory and performance in deciding which selection statement to choose. In the following code, the compiler uses lookupswitch:

public int simpleSwitch(int intOne) {
    switch (intOne) {
        case 10:
            return 1;
        case 20:
            return 2;
        case 30:
            return 3;
        default:
            return -1; }}Copy the code

This code produces bytecode as follows:

0: iload_1
1: lookupswitch  {
         default: 42
           count: 3
              10: 36
              20: 38
              30: 40
    }
36: iconst_1
37: ireturn
38: iconst_2
39: ireturn
40: iconst_3
41: ireturn
42: iconst_m1
43: ireturn
Copy the code

For a more efficient search algorithm (more efficient than linear search), lookupSwitch provides a number of matches and sorts the matches. The following diagram shows how the above code is executed:

java_switch_lookupswitch_byte_code

String switch

In Java7, the switch statement added support for string types. Although the existing opcodes implementing the switch statement only support int, no new opcodes have been added. The string switch statement is completed in two parts. First, compare the hash value between the top of the operand stack and the value corresponding to each case statement. This step can be done using lookupswitch or tablesWitch (depending on the sparsity of the hash value).

This also causes the bytecode corresponding to a branch to call String.equals() for an exact match. A tableswitch directive will use the result of String.equlas() to jump to the code of the correct case statement.

public int simpleSwitch(String stringOne) {
    switch (stringOne) {
        case "a":
            return 0;
        case "b":
            return 2;
        case "c":
            return 3;
        default:
            return 4; }}Copy the code

The string switch statement will produce the following bytecode:

0: aload_1 1: astore_2 2: iconst_m1 3: istore_3 4: aload_2 5: invokevirtual #2 // Method java/lang/String.hashCode:()I 8: tableswitch { default: 75 min: 97 max: 99 97: 36 98: 50 99: 64 } 36: aload_2 37: ldc #3 // String a 39: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;) Z 42: ifeq 75 45: iconst_0 46: istore_3 47: goto 75 50: aload_2 51: ldc #5 // String b 53: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;) Z 56: ifeq 75 59: iconst_1 60: istore_3 61: goto 75 64: aload_2 65: ldc #6 // String c 67: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;) Z 70: ifeq 75 73: iconst_2 74: istore_3 75: iload_3 76: tableswitch { default: 110 min: 0 max: 2 0: 104 1: 106 2: 108 } 104: iconst_0 105: ireturn 106: iconst_2 107: ireturn 108: iconst_3 109: ireturn 110: iconst_4 111: ireturnCopy the code

This class contains the bytecode, as well as the following constant pool values referenced by the bytecode. To learn more about constant pools, see the runtime constant pools section of the JVM Internals article.

Constant pool: #2 = Methodref #25.#26 // java/lang/String.hashCode:()I #3 = String #27 // a #4 = Methodref #25.#28 // java/lang/String.equals:(Ljava/lang/Object;) Z #5 = String #29 // b #6 = String #30 // c #25 = Class #33 // java/lang/String #26 = NameAndType #34:#35 // hashCode:()I #27 = Utf8 a #28 = NameAndType #36:#37 // equals:(Ljava/lang/Object;) Z #29 = Utf8 b #30 = Utf8 c #33 = Utf8 java/lang/String #34 = Utf8 hashCode #35 = Utf8 ()I #36 = Utf8 equals #37 = Utf8 (Ljava/lang/Object;) ZCopy the code

Note that the amount of bytecode required to perform the switch includes two tableswitch instructions and several Invokevirtual instructions to call String.equals(). For more details on Invokevirtual, see the method calls section of the next article. The following illustration shows how the code executes when “b” is entered:

If different cases match the same hash value, for example, the string “FB” and “Ea” both have a hash value of 28. This can be handled by tweaking the eQulas method flow slightly like the following. Note the bytecode at serial number 34: ifeg 42 calls another String.equals() to replace the lookupsswitch opcode in the previous example where there was no hash conflict.

public int simpleSwitch(String stringOne) {
    switch (stringOne) {
        case "FB":
            return 0;
        case "Ea":
            return 2;
        default:
            return 4; }}Copy the code

The code above produces the following bytecode:

0: aload_1 1: astore_2 2: iconst_m1 3: istore_3 4: aload_2 5: invokevirtual #2 // Method java/lang/String.hashCode:()I 8: lookupswitch { default: 53 count: 1 2236: 28 } 28: aload_2 29: ldc #3 // String Ea 31: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;) Z 34: ifeq 42 37: iconst_1 38: istore_3 39: goto 53 42: aload_2 43: ldc #5 // String FB 45: invokevirtual #4 // Method java/lang/String.equals:(Ljava/lang/Object;) Z 48: ifeq 53 51: iconst_0 52: istore_3 53: iload_3 54: lookupswitch { default: 84 count: 2 0: 80 1: 82 } 80: iconst_0 81: ireturn 82: iconst_2 83: ireturn 84: iconst_4 85: ireturnCopy the code

cycle

  • Conditional flow control, such as if-else statements and switch statements, is implemented by using an instruction to compare two values and then jumping to the corresponding bytecode. For more details on conditionals, check out conditionals section.

  • Loops including the for loop and the while loop are implemented in a similar way except that they usually have a goto instruction that implements the bytecode loop. Do-while loops do not require any GOto instructions because their conditional branches are at the end of the bytecode.

  • Some bytecodes can compare two integers or two references and then perform a branch using a single instruction. Comparisons between other types such as double,long, or float require two steps. First, the comparison is performed, pushing either 1,0, or -1 to the top of the operand stack. Next, a branch is performed based on whether the value at the top of the operand stack is greater than, less than, or equal to 0. For more details about the instruction to branch, see above.

The while loop

While loop A conditional branch instruction such as if_fcMPge or IF_ICmplt (as described above) and a GOto statement. The conditional branch instruction is understood to execute after the loop, and the loop is terminated if the condition is not true. The last instruction in the loop is goto, which jumps to the beginning of the loop code until the conditional branch fails, as shown below:

public void whileLoop(a) {
    int i = 0;
    while (i < 2) { i++; }}Copy the code

Compiled into:

0: iconst_0
 1: istore_1
 2: iload_1
 3: iconst_2
 4: if_icmpge       13
 7: iinc            1, 1
10: goto            2
13: return
Copy the code

The if_cmpge instruction tests whether the local variable at position 1 is equal to or greater than 10. If it is greater than 10, the instruction skips to bytecode numbered 14 to complete the loop. The goto instruction ensures that bytecode loops until the if_ICmpge condition is true at some point. Once the loop ends, the program execution branch immediately jumps to the return instruction. The iINC directive is one of the few that can update a local variable directly on the operand stack without loading and storing values. In this example, Iinc increments the value of the first local variable by 1.

The for loop

The for loop and the while loop use exactly the same pattern at the bytecode level. This is not surprising because all while loops can be rewritten with the same for loop. The simple while loop example above can be overridden with a for loop and produce exactly the same bytecode, as shown below:

public void forLoop(a) {
    for(int i = 0; i < 2; i++) {
    }
}
Copy the code

The do while loop

Do-while loops are also very similar to for and while loops, except that they do not require the goto instruction as the conditional branch to be the last instruction to fall back to the beginning of the loop.

public void doWhileLoop(a) {
    int i = 0;
    do {
        i++;
    } while (i < 2);
}
Copy the code

The resulting bytecode is as follows:

0: iconst_0
 1: istore_1
 2: iinc     1, 1
 5: iload_1
 6: iconst_2
 7: if_icmplt   2
10: return
Copy the code