Deep understanding of Java bytecode instructions

Java bytecode instructions are a tough nut to crack in the JVM architecture, and I expect some readers to be wondering, “Is Java bytecode hard to learn? Can I learn?”

To be honest, I’m not being modest, but AT the beginning OF learning Java bytecode and Java virtual machine knowledge I also felt big! But after a while of hard work, it suddenly dawned on me, and I felt so much fun, especially when I realized that Java code was being executed this way at the bottom. I felt bloated and confident.

I played for the NuggetsMore than 100 articles on Java, a total of more than 300,000 words, humorous content, easy to understand, gained a lot of beginner’s recognition and support, including Java syntax, Java collection framework, Java concurrent programming, Java virtual machine and other core content.In an effort to help more Java beginners, I reorganized and open-source these articles on GitHub in a fit of pique.Teach the younger sister to learn JavaDoesn’t that sound like fun?

GitHub (welcome to star) : github.com/itwanger/jm…

Hotspot, Java’s official virtual machine, is stack-based, not register-based.

The advantages of stack based are better portability, shorter instructions, simple implementation, but can not randomly access the elements in the stack, the number of instructions required to complete the same function is more than the register, need frequent stack and stack.

Registrie-based advantage is fast, is conducive to program speed optimization, but the operands need to be specified explicitly, instructions are relatively long.

Java bytecode consists of opcodes and operands.

Opcode: A byte length (0-255, meaning that the total number of opcodes in an instruction set cannot exceed 256) that represents a particular operation meaning.
Operands: Zero or more, immediately after the opcode, representing the parameters required for the operation.

Because the Java virtual machine is based on a stack rather than a register structure, most instructions have only one opcode. Aload_0, for instance, has only opcodes without operands, while Invokespecial #1, which calls a member method or constructor and passes a constant with subscript 1 in the constant pool, consists of opcodes and operands.

01. Load and store instructions

Load and store related instructions are the most frequently used instructions for passing data back and forth between the local variable table and operand stack of a stack frame.

1) Push variables in the local variable table into the operand stack

Xload_ (x is I, L, f, d, a, and n defaults to 0 to 3) pushes the NTH local variable onto the operand stack.
Xload (x is I, L, f, d, a) pushes local variables into the operand stack by specifying parameters. When used, this instruction indicates that the number of local variables may exceed four

Explain.

X is an opcode mnemonic indicating which data type it is. See the table below.

Like the ArrayLength directive, there are no opcode mnemonics. It has no special characters to represent data types, but the operands can only be objects of an array type.

Most directives do not support byte, short, or char, and none even support Boolean. The compiler extends byte and short data with sign-extend to int, and Boolean and char zero-extend to int.

For example.

private void load(int age, String name, long birthday, boolean sex) {
    System.out.println(age + name + birthday + sex);
}
Copy the code

Take a look at the bytecode instructions for the load() method (four arguments) with jclasslib.

Iload_1: Pushes an int subscript 1 in the local variable table onto the operand stack.
Aload_2: Pushes a reference datatype variable (String at this time) with subscript 2 in the local variable table onto the operand stack.
Lload_3: Pushes long variables with subscript 3 in the local variable table onto the operand stack.
Iload 5: pushes a local variable with subscript 5 int (Boolean) onto the operand stack.

You can relate them by looking at the local variable scale.

2) Push constants from the constant pool onto the operand stack

Depending on the data type and what is pushed, this class can be subdivided into const series, push series, and Idc instructions.

Const series, used for special constant pushes that are implicit in the instruction itself.

The push family mainly includes bipush, which accepts 8-bit integers as arguments, and sipush, which accepts 16-bit integers.

The Idc directive, which takes an 8-bit parameter pointing to an index in the constant pool, comes into play when const and push are not enough.

Idc_w: Receives two 8-digit numbers with a larger index range.
If the argument is long or double, useIdc2_wThe instructions.

For example.

public void pushConstLdc(a) {
    / / range (1, 5)
    int iconst = -1;
    / / range [128127]
    int bipush = 127;
    / / range (67-32768327)
    int sipush= 32767;
    / / other int
    int ldc = 32768;
    String aconst = null;
    String IdcString = "Silent King II.";
}
Copy the code

Take a look at the bytecode instructions for pushConstLdc() via jclasslib.

Iconst_m1: pushes -1 to the stack. Range (1, 5).
Bipush 127: pushes 127 onto the stack. The range of [128127].
Sipush 32767: pushes 32767 onto the stack. Scope – 32768327, 67.
LDC #6 <32768> : push constant 32768 with subscript 6 in the constant pool.
Aconst_null: pushes null onto the stack.
LDC #7 < Silent King 2 > : pushes the constant “Silent King 2” with subscript 7 from the constant pool.

3) Remove the data at the top of the stack and load it into the local variable table

It is mainly used to assign values to local variables, and this type of instruction mainly exists in the form of store.

Xstore_ (x is I, L, f, d, a, n defaults to 0 to 3)
Xstore (x = I, l, f, d, a)

Once you understand xload_ and xload, it’s a lot easier to look at xstore_ and xstore, just the reverse.

Let me ask you a question. Why xstore_ and xload_? Don’t they work the same as xStore n and xload N?

The difference between xstore_ and xstore n is that xstore_ is equivalent to only an opcode and takes up 1 byte. The latter is equivalent to the opcode and operand, the opcode is 1 byte, the operand is 2 bytes, a total of 3 bytes.

Since the first few positions in the local variable table are always very common, the size of the bytecode is smaller, although xstore_

and xload_

increase the number of instructions!

For example.

public void store(int age, String name) {
    int temp = age + 2;
    String str = name;
}
Copy the code

Take a look at the bytecode instructions for the store() method through jclasslib.

Istore_3: Pops an integer from the operand and assigns it to the variable with index 3 in the local variable table.
Astore 4: Ejects a reference datatype from the operand and assigns it to the variable with index 4 in the local variable table.

You can relate them by looking at the local variable scale.

02. Arithmetic instructions

Arithmetic instructions are used to perform a particular operation on the values on two operand stacks and push the result back onto the operand stack. It can be divided into two types: integer data operation instructions and floating point data operation instructions.

It is important to note that data operations can cause overflows, such as adding two large positive integers, which is likely to produce a negative number. However, the Java Virtual Machine specification does not give a specific result for this case, so the program does not explicitly report an error. Therefore, we must pay attention to the addition and multiplication of large data in the development process.

When an overflow occurs, it is denoted by the signed Infinity; NaN values are used if the result of an operation is not mathematically defined. And all arithmetic operations that use NaN as an operand return NaN as a result.

For example.

public void infinityNaN(a) {
    int i = 10;
    double j = i / 0.0;
    System.out.println(j); // Infinity

    double d1 = 0.0;
    double d2 = d1 / 0.0;
    System.out.println(d2); // NaN
}
Copy the code

Any non-zero number divided by floating point 0 (note that it is not an int) can be imagined to be Infinity.
When you replace this non-zero number with 0, and the result is not very well defined, you use NaN values.

The Java VIRTUAL machine provides two computing modes:

To the closest number rounding: in the floating-point operation, all the results must be rounding to an appropriate precision, not very accurate results must be rounding is closest to the precise value can be said, if there are two can be expressed in the form of close to the value, it will give preference to the least significant bit is zero (similar to round up).
Rounding to zero: When converting a floating point number to an integer, this mode selects the nearest but no greater number of the target value type as the most accurate rounding result (similar to rounding).

Let me list all the arithmetic instructions:

Add instructions: iadd, ladd, fadd, dadd
Subtraction instructions: ISub, LSUB, fsub, dsub
Multiplication instruction: IMul, LMUl, FMUl, dMUl
Division instructions: IDIV, Ldiv, fdiv, ddiv
Redundant instructions: IREM, LREM, frem, DREM
Increment instruction: iinc

For example.

public void calculate(int age) {
    int add = age + 1;
    int sub = age - 1;
    int mul = age * 2;
    int div = age / 3;
    int rem = age % 4;
    age++;
    age--;
}
Copy the code

Take a look at the bytecode instructions for the calculate() method through jclasslib.

Iadd, addition
Isub, subtraction
Imul, multiplication
Idiv, division
Irem, take over
Iinc, plus 1 when it increases, minus 1 when it decreases

03. Type conversion instructions

It can be divided into two categories:

Int — >long — >float — >double, i2l, i2f, i2D, L2F, l2d, f2d.

There is no loss of precision from int to long, or int to double;
Loss of precision can occur from int, long to float, or long to double;
Wide-type conversions from byte, char, and short to int actually occur implicitly, thus reducing the number of bytecode instructions, which are, after all, only 256 for one byte.

2) narrow, convert a large type to a small type, such as int to byte, short, or char, corresponding instructions are: i2b, i2s, i2c; From long to int, the corresponding instructions are: l2I; From float to int or long, f2i, f2l; From double to int, long, or float, the corresponding instructions are d2i, d2l, d2f.

Narrowing is likely to result in loss of precision, which is of a different order of magnitude;
The Java virtual machine does not throw a runtime exception because of this.

For example.

public void updown(a) {
    int i = 10;
    double d = i;
    
    float f = 10f;
    long ong = (long)f;
}
Copy the code

Take a look at the bytecode instructions for the updown() method through jclasslib.

I2d, int width to double
F2l, float Narrows to long

Object creation and access instructions

Java is an object-oriented programming language, so how does the Java Virtual machine support the bytecode level?

1) Create instructions

An array is also an object, but it creates bytecode instructions that are different from ordinary objects. There are three kinds of instructions for creating arrays:

Newarray: Creates an array of primitive data types
Anewarray: Creates an array of reference types
Multianewarray: Creates multidimensional arrays

A normal object has only one creation instruction, new, which takes an operand that points to an index in the constant pool to indicate the type to be created.

For example.

public void newObject(a) {
    String name = new String("Silent King II.");
    File file = new File("The wanton man of the Wanton River. Book.");
    int [] ages = {};
}
Copy the code

Take a look at the bytecode instructions for the newObject() method through jclasslib.

new #13 <java/lang/String>Create a String object.
new #15 <java/io/File>Create a File object.
newarray 10 (int)Create an array of type int.

2) Field access instruction

Fields can be divided into two types: member variables and static variables. So field access directives can be divided into two types:

Access static variables: getStatic, putStatic.
Access member variables: getField and putfield. You need to create objects to access them.

For example.

public class Writer {
    private String name;
    static String mark = "The author";

    public static void main(String[] args) {
        print(mark);
        Writer w = new Writer();
        print(w.name);
    }

    public static void print(String arg) { System.out.println(arg); }}Copy the code

Take a look at the bytecode instructions for the main() method through jclasslib.

getstatic #2 <com/itwanger/jvm/Writer.mark>To access the static variable mark
getfield #6 <com/itwanger/jvm/Writer.name>To access the member variable name

Method calls and return directives

There are five method call instructions, each for different scenarios:

Invokevirtual: A member method used to invoke an object, dispatching it according to the actual type of the object, and supporting polymorphism.
Invokeinterface: used to invokeinterface methods, which are searched at run time for interface methods implemented by a particular object.
Invokespecial: Used to call methods that require special processing, including constructors, private methods, and parent methods.
Invokestatic: used to invokestatic methods.
Invokedynamic: Used to dynamically resolve the method referenced by the call point qualifier at run time and execute it.

For example.

public class InvokeExamples {
    private void run(a) {
        List ls = new ArrayList();
        ls.add("Hard roof");

        ArrayList als = new ArrayList();
        als.add("I can't learn anymore.");
    }

    public static void print(a) {
        System.out.println("invokestatic");
    }

    public static void main(String[] args) {
        print();
        InvokeExamples invoke = newInvokeExamples(); invoke.run(); }}Copy the code

Let’s decompile with javap -c invokeexamples.class.

Compiled from "InvokeExamples.java" public class com.itwanger.jvm.InvokeExamples { public com.itwanger.jvm.InvokeExamples(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return private void run(); Code: 0: new #2 // class java/util/ArrayList 3: dup 4: invokespecial #3 // Method java/util/ArrayList."<init>":()V 7: Astore_1 8: aload_1 9: LDC #4 invokeinterface #5, 2 // InterfaceMethod java/util/List.add:(Ljava/lang/Object;) Z 16: pop 17: new #2 // class java/util/ArrayList 20: dup 21: "<init>":()V 24: astore_2 25: aload_2 26: LDC #6: invokevirtual #7 // Method java/util/ArrayList.add:(Ljava/lang/Object;) Z 31: pop 32: return public static void print(); Code: 0: getstatic #8 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #9 // String invokestatic 5: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;) V 8: return public static void main(java.lang.String[]); Code: 0: invokestatic #11 // Method print:()V 3: new #12 // class com/itwanger/jvm/InvokeExamples 6: dup 7: invokespecial #13 // Method "<init>":()V 10: astore_1 11: aload_1 12: invokevirtual #14 // Method run:()V 15: return }Copy the code

The InvokeExamples class has four methods, including the default constructor.

1) InvokeExamples() constructor

The default constructor internally calls the initialization constructor of the superclass Object:

`invokespecial #1 // Method java/lang/Object."<init>":()V`
Copy the code

2) in the member method run()

invokeinterface #5, 2 // InterfaceMethod java/util/List.add:(Ljava/lang/Object;) ZCopy the code

Since the reference type of the ls variable is interface List, ls.add() invokes the invokeInterface directive, and the add() method of ArrayList, the implementation object of interface List, will be determined at runtime.

invokevirtual #7 // Method java/util/ArrayList.add:(Ljava/lang/Object;) ZCopy the code

The als.add() method calls the Invokevirtual directive because the als variable’s reference type has been determined to be ArrayList.

3) in main() method

invokestatic  #11 // Method print:()V
Copy the code

The print() method is static, so the invokestatic directive is invoked.

Method return instructions are distinguished according to the return value type of the method. The common return instructions are shown in the following figure.

Operand stack management instructions

Common operand stack management instructions are POP, DUP, and swap.

Pop one or two elements off the top of the stack and discard them directly, e.g. Pop, pop2;
Copy one or two values from the top of the stack and push them back to the top, such as DUP, DUp2, DUp_ ×1, DUp2_ ×1, DUp_ ×2, dup2_×2;
Swap values in the two slots at the top of the stack, such as swap.

These instructions do not need to specify data types because they are pressed and ejected by position.

For example.

public class Dup {
    int age;
    public int incAndGet(a) {
        return++age; }}Copy the code

Look at the bytecode instructions for the incAndGet() method through jclasslib.

Aload_0: pushes this onto the stack.
Dup: Copies this at the top of the stack.
Getfield #2: loads a constant with subscript 2 from the constant pool onto the stack, and pushes a this off the stack.
Iconst_1: pushes constant 1 to the stack.
Iadd: Adds the two values at the top of the stack and puts the result back on the stack.
Dup_x1: Copies the element at the top of the stack and inserts it under this.
Putfield #2: Take the top two elements off the stack and assign them to the field age.
Ireturn: Returns the top element off the stack.

07. Control transfer instruction

Control transfer instructions include:

Compare instruction that compares the size of the two elements at the top of the stack and pushes the result of the comparison onto the stack.
The conditional jump instruction is usually used together with the comparison instruction. Before the conditional jump instruction is executed, the comparison instruction is generally used to compare the top elements of the stack, and then the conditional jump is carried out.
Comparison conditional instruction is similar to the combination of comparison instruction and conditional jump instruction, which combines the two steps of comparison and jump into one.
Multi-conditional branch jump instruction, designed for switch-case statements.
The unconditional jump instruction is mainly goto instruction at present.

1) Compare instructions

DCMPG, DCMPL, FCMPG, FCMPL, LCMP, DCMPG, FCMPL, LCMP Notice that there is no int.

For double and float, there are two versions of the comparison instruction because of NaN. In the case of float, there are FCMPG and FCMPL, the difference being that FCMPG pushes 1 if NaN is encountered and FCMPL pushes -1.

For example.

public void lcmp(long a, long b) {
    if(a > b){}
}
Copy the code

Take a look at the bytecode instructions for the LCMP () method via jclasslib.

LCMP is used to compare data of two longs.

2) Conditional jump instruction

Each of these instructions receives a two-byte operand, and their unified meaning is to pop the top element on the stack, test if it meets a certain condition, and then jump to the corresponding location.

For conditional branch comparisons of long, float, and double, the comparison instruction returns an integer value to the operand stack before the conditional jump instruction of int is executed.

For Boolean, byte, char, short, and int, the conditional jump instruction is used directly.

For example.

public void fi(a) {
    int a = 0;
    if (a == 0) {
        a = 10;
    } else {
        a = 20; }}Copy the code

Take a look at the bytecode instructions for the fi() method through jclasslib.

3 Ifne 12 (+9) means that if the element at the top of the stack is not equal to 0, jump to line 12 (3+9) 12 bipush 20.

3) Compare conditional transfer instructions

Instructions starting with the character “I” after the prefix “if_” operate on int integers, and instructions starting with the character “A” compare objects.

For example.

public void compare(a) {
    int i = 10;
    int j = 20;
    System.out.println(i > j);
}
Copy the code

Take a look at the bytecode instructions for the compare() method via jclasslib.

11 if_icmple 18 (+7) means that if two values of type int are compared at the top of the stack, if the former is less than the latter, the stack jumps to line 18 (11+7).

4) Multi-conditional branch jump instruction

Tableswitch and lookupswitch are available. The former requires that multiple conditional branch values are consecutive and stores only the start and end values as well as several forward offsets. You can locate the forward offset immediately by using the specified operand index. The latter stores various discrete case-offset pairs internally. Each execution needs to search all case-offset pairs, find the matching case value, and calculate the jump address according to the corresponding offset, so the efficiency is low.

For example.

public void switchTest(int select) {
    int num;
    switch (select) {
        case 1:
            num = 10;
            break;
        case 2:
        case 3:
            num = 30;
            break;
        default:
            num = 40; }}Copy the code

Take a look at the bytecode instructions for the switchTest() method through jclasslib.

In case 2, there is no break. Therefore, tableswitch is used in case 2 and Case 3. If equal to 1, jump to line 28; If it is equal to 2 and 3, jump to line 34, if it is default, jump to line 40.

5) Unconditional jump instruction

The goto instruction takes two bytes of operands, which together form a signed integer that specifies the offset of the instruction. The purpose of the instruction is to jump to the given offset.

Goto appears in all of the previous examples, which makes sense. If the instruction has a particularly large offset beyond the two-byte range, the instruction goto_w can be used to receive 4-byte operands.

Shoulders of giants:

Segmentfault.com/a/119000003…

In addition to these instructions, there are exception handling instructions and synchronization control instructions, I’m going to keep you guessing, you can expect a wave of ~~

(SAO operation)

The way ahead is so long without ending, yet high and low I’ll search with my will unbending.

To go further, the Java bytecode piece must be hard to understand, I hope the second brother’s share can help everyone ~

Where that push past

The second brother wrote a lot of Articles on Java in nuggets, including Java core syntax, Java set framework, Java IO, Java concurrent programming, Java virtual machine, etc., which is also a complete system.

In order to help more Java beginners, the second brother open source his serialized “Teach Younger sister to Learn Java” to GitHub. Although he only sorted out 50 articles, he found that the number of words had reached more than 100,000, and the content was easy to understand, humorous and illustrated.

GitHub (welcome to star) : github.com/itwanger/jm…

If it is helpful, please give a thumbs-up to the second brother, which will be the strongest motivation for me to continue sharing!

Deep understanding of Java bytecode instructions

01. Load and store instructions

02. Arithmetic instructions

03. Type conversion instructions

Object creation and access instructions

Method calls and return directives

Operand stack management instructions

07. Control transfer instruction

Where that push past

Related Posts

Transport layer (1) Reliable transport and TCP/UDP

String creation details & The use of StringBuffer and StringBuilder

Performance Analysis Tool Flame Diagram (1) – CPU context switch