This article is a summary from Chapter 6 of “Getting into the Java Virtual Machine 3rd Edition.” You can’t really understand Java’s cross-platform features until you understand bytecode files. It is also a prerequisite for the virtual machine class loading mechanism and the virtual machine bytecode execution engine.

The core content of this paper can be divided into two parts. The first part is (need to zoom in) :

The second part:

1. Irrelevant cornerstone

“Write Once, Run anywhere.” This was the slogan of the birth of Java. If all computers had only one instruction set, x86, and only one operating system, Windows, Perhaps Java would never have been born. In fact, we don’t want any one company in the world to monopolize the IT industry. Different hardware interfaces and different operating systems will inevitably coexist and develop for a long time. Oracle and other virtual machine publishers have launched Java virtual machines that run on all major operating systems, enabling “write once, run anywhere” at the application level.

Java’s platform independence pushes high-level languages to a new level. What is the future of programming languages?

In fact, the designers of Java and its JVM have long seriously considered that programs running on the JVM may not necessarily be developed in the Java language in the future. As a result, they intentionally split the Java specification into the Java Language Specification and the Java Virtual Machine Specification. Back in 1997, the first edition of the Java Virtual Machine Specification promised that “in the future, we will extend the Java Virtual Machine appropriately to support other languages running on top of the Java Virtual Machine.”

Java virtual machine development today, this promise can be said to have been fulfilled. A number of languages have evolved to run on top of the Java virtual machine, such as Scala (the language I’m learning), Kotlin, Jython, Groovy, and so on. Regardless of the language, at compile time they are uniformly translated into the bytecode format that follows the Java Virtual Machine Specification, and the Java virtual machine doesn’t care what programming language the bytecode comes from.

Whether platform-independent or language-independent, bytecode files play an important role. Turing-complete bytecode file no strong binding with any language, it is only for those who want to run in the language of the Java virtual machine provides the norms and constraints, these couched in a variety of grammar, keywords, constant variables, operators, such as semantic will eventually be converted to a unified format of the bytecode instruction to express.

2. Class file structure

Why wasn’t the Class file implemented in a more readable description language like XML or some other type of character file?

<class language = "Java" version = "1.8.241">
    <className>HelloJava</className>
    <contant_info>.</contant_info>
</class>
Copy the code

First, character files always have encoding concerns, and second, even though the Java Virtual Machine Specification has unified the character set format for Class files, there is also the context in which Java was born: In the 1990s, when the transfer of network resources was slow and Class files themselves contained more internal information to ensure that the source code could be interpreted and run correctly, people didn’t want a compiled Class file to be a behemoth. This defeats the purpose of running everywhere — a single byte could be used to represent 256 instructions, or values between 0x00 and 0xFF, whereas a single character (such as a Chinese character) would take up to three bytes (24 bits) of space in a character file.

Naturally, the Java virtual machine designers used the more compact form of a byte file. A Class file is a binary stream file based on 8-byte units. Each data item is arranged in a compact order in the file. Which byte represents what information and how long is strictly regulated.

2.1 Data structure of Class files

According to the Java Virtual Machine Specification, the Class file format uses C-like pseudo-structures to store data, so there are only two types of data in these pseudo-structures: “unsigned numbers” and “tables.” Unsigned numbers are basic data types, with u1, u2, u4, u8 representing 1, 2,4,8 bytes in length. Unsigned numbers themselves can describe numbers, index references, or utF-8 encoded strings.

Tables, on the other hand, are compound data types consisting of multiple unsigned numbers or subtables, all of which end with the _info suffix. The entire Class file can also be viewed as a summary table that describes all the information about a Class. The structure of the Class file is shown in the following table. From the naming, you can guess that it describes the information about the Class: file version number, constant pool, access flag, Class information, parent Class information, interface information, field field, method field, and “properties”. We will discuss the meaning of each data item later.

type The name of the The number of
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count-1
u2 access_flags 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
field_info fields fields_count
u2 methods_count 1
method_info methods methods_count
u2 attributes_count 1
attribute_info attributes attributes_count

2.2 the magic number

The first four bytes of magic represent the Class file type, and the hexadecimal representation of this value happens to be 0xCAFE BABE (which seems to hint at the origin of the Java name and its “coffee” icon, Java is actually the name of Peet’s Coffee, a famous Coffee brand. The virtual machine will not easily classify a file with a.class suffix as a class file, but will rely on this magic number to classify the file, since users are free to change the file name suffix. In fact, most software doesn’t rely too much on suffixes to distinguish between file types, so it’s safer to hide file types in byte sequences.

The next four bytes describe the version number of the Class, where minor_version is the minor version number and major_version is the major version number. The Java Virtual Machine Specification has strict rules about version numbers: older JDKS must be backward-compatible with older Class files, but not run older Class files. The virtual machine must refuse to execute even if the older Class files are all forward compatible.

The corresponding version number of JDK 5 is 49, and the corresponding version number is increased by one for each major upgrade. The following code will be compiled in JDK 8:

public class TestJava {
	public static void main(String[] args){
		int a = 1;
		int b = 2;
		intc = a + b; System.out.println(c); }}Copy the code

The following is the result of opening the compiled Class file using Notepad ++ with the hex-Editor plugin loaded:

The value of the major version number is 0x0034, which in decimal notation is 52, which corresponds exactly to the JDK 8 version number. In other words, this is a Class file that can run in JDK 8 and below. In addition, the second version is 0x0000 before JDK 1.3 to JDK 12.

2.3 constant pool

A constant pool is the equivalent of a Class file’s own data warehouse. Since the number of constants is not fixed, we first use 2 bytes of length for counting, and unlike other tables, we start counting from 1. Thus, the actual number of constants is equal to constant_pool_count – 1. Each constant is followed by a separate constant_pool entry. Subscript 0 is retained because it is used to mean “do not reference any constant pool items”.

A constant pool holds two things:

The first type is literals, which refer to constant values that are declared final as text strings.

The second category is Symbolic References, which refer to:

  • Packages exported or exposed by modules;
  • Fully Qualified Name of class and interface;
  • The name of the field or Descriptor;
  • The name or descriptor of the method;
  • Method handles and Method types (Method Handle, Method Type, Invoke Dynamic);
  • Dynamic Call points and dynamic constants (Dynamically-Computed Call Sites, Dynamically-Computed Constant).

Unlike C/C++, the virtual machine does dynamic concatenation only when the Class file is loaded, and symbolic references only get real memory addresses at runtime.

Each constant in the constant pool is stored as a table named CONSTANT_XXX_info, with XXX representing the type to which the constant belongs. As of JDK 13, there are 17 types of constant pool tables. Each table has a different structure and describes different things, but each table is distinguished by at least one flag bit of type U1.

The class type The volunteers describe
CONSTANT_Utf8_info 1 The character string is utF-8 encoded
CONSTANT_Integer_info 3 Integer literals
CONSTANT_Float_info 4 Floating point literals
CONSTANT_Long_info 5 Long integer literals
CONSTANT_Double_info 6 A double – precision floating-point literal
CONSTANT_Class_info 7 Symbolic reference to a class or interface
CONSTANT_String_info 8 String type literals
CONSTANT_Fieldref_info 9 Symbolic reference to a field
CONSTANT_Methodref_info 10 Symbolic reference to a method
CONSTANT_InterfaceMethodref_info 11 The method symbol reference of the interface
CONSTANT_NameAndType_info 12 A partial symbolic reference to a field or method
CONSTANT_MethodHandle_info 15 Represents a method handle
CONSTANT_MethodType_info 16 Identify method types
CONSTANT_InvokeDynamic_info 18 Represents a dynamic method call point

For example, in one of the CONSTANT_Utf8_info tables, the names of various methods and fields in a Class file will be utF-8. What is a thumbnail? The form is preserved in it.

type The name of the The number of
u1 tag 1
u2 length 1
u1 bytes length

If a method name or field name is larger than 64KB, the class file will not compile due to length restrictions. In addition, string constants are saved using another CONSTANT_String_info.

However, even with the Notepad++ tool, parsing Class files by byte stream item by item is a struggle. Here you can use the Javap tool directly to parse the TestClass from the previous section:

Constant pool: #1 = Methodref #5.#14 // java/lang/Object."<init>":()V #2 = Fieldref #15.#16 // java/lang/System.out:Ljava/io/PrintStream; #3 = Methodref #17.#18 // java/io/PrintStream.println:(I)V #4 = Class #19 // TestJava #5 = Class #20 // java/lang/Object  #6 = Utf8 <init> #7 = Utf8 ()V #8 = Utf8 Code #9 = Utf8 LineNumberTable #10 = Utf8 main #11 = Utf8 ([Ljava/lang/String;)V #12 = Utf8 SourceFile #13 = Utf8 TestJava.java #14 = NameAndType #6:#7 // "<init>":()V #15 = Class #21 // java/lang/System #16 = NameAndType #22:#23 // out:Ljava/io/PrintStream; #17 = Class #24 // java/io/PrintStream #18 = NameAndType #25:#26 // println:(I)V #19 = Utf8 TestJava #20 = Utf8 java/lang/Object #21 = Utf8  java/lang/System #22 = Utf8 out #23 = Utf8 Ljava/io/PrintStream; #24 = Utf8 java/io/PrintStream #25 = Utf8 println #26 = Utf8 (I)VCopy the code

The constant pool also stores “extra” constants such as

, ()V, (I)V. These constants come from the field table (field_info), method table (method_info), and attribute table (attribute_info). For example, field names and method names cannot simply be described by flag bits, so these variable values are stored in the CONSTANT_Utf8_info entry.

2.4 Access Flags

The next two bytes in the constant pool are access_flags, which are used to mark hierarchical information about a Class or interface, such as whether the Class is a Class or an interface, whether it is modified by final, abstract, public, etc. The javap -v command displays the following information:

public class TestJava
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool: ....
Copy the code

The ACC_PUBLIC of flags indicates that the class is a public class. In addition, all Class files compiled in JDK 1.2 and above come with the ACC_SUPER flag bit.

2.5 Class index, parent index and interface index collection

The class index (this_class) and super_class index (super_class) are both a U2-type data structure, while the interface index is a u2-type data set. These three describe the Class’s inheritance. Java does not allow multiple inheritance, so classes other than java.lang.Object have only one non-zero index (index 0 can be used only by java.lang.object). Java allows you to implement multiple interfaces, which are arranged in left-to-right order in a collection of class interface indexes. If the value of interfaces_count is 0, the class does not implement any interface.

2.6 Collection of field tables

The method table collection records all fields inside the Class file. This includes class-level and instance-level variables, but does not include local variables defined inside the method. The modifiers allowed for a field include class permission modifiers, static, final, volatile, and TRANSIENT. In addition to modifiers, fields have descriptions of data types and names.

Each method is stored in a separate method table. Modifiers are fixed and therefore can be represented simply by flag bits, but data types and names are not fixed and therefore need to be described by referring to constants of a constant scale.

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Each field can add an attributes_count and attributes table to describe the additional attributes of the variable. The optional flag bits of access_flags are:

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the field is public
ACC_PRIVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Whether the field is final
ACC_VOLATILE 0x0040 Whether the field is volatile
ACC_TRANSIENT 0x0080 Whether the field is TRANSIENT
ACC_SYNTHETIC 0x1000 Whether the field is automatically generated by the compiler
ACC_ENUM 0x4000 Whether the field is enum

Due to the constraints of the syntax rules, only one of the three access modifier flags can be selected. Also, name_index and Descriptor_index are references to the constant pool. Name_index accesses the simple name of the field, which describes the data type of the field.

The data types used to describe fields and method parameters/return values are simply represented by a capital English letter, such as the “V” above to indicate that the return value of this method is void:

Identification character meaning
B Basic data typesbyte
C Basic data typeschar
D Basic data typesdouble
F Basic data typesfloat
I Basic data typesint
J Basic data typeslong
S Basic data typesshort
Z Basic data typesboolean
V Special type Void
L{fully qualified name} Object type

Here’s the idea of a descriptor:

For a normal field, the identifier character is its descriptor. For methods, descriptors refer to a compact arrangement of ([input identifier character])< return identifier character >. Such as:

//([Ljava.lang.String)V
public static void main(String[] args){... }//(ILjava.lang.Integer)I
public int add(int i1,Integer i2){... }Copy the code

Where the actual representation of OBject types is fully qualified names prefixed with “L”, such as ljava.lang. String, ljava.lang.object. The symbol “V” is listed as a “VoidDescriptor” in the Java Virtual Machine Specification, which obviously only appears as the return value of a method. Arrays in Java are independent object types, and n-dimensional arrays are preceded by n [symbols, such as [ljava.lang. String, [[ljava.lang.string.

The collection of field tables does not contain fields inherited from the parent class/parent interface, but may contain fields that do not exist in the source code, such as a field that the compiler adds to an instance of an external class to give the inner class free access to the external class. Class files are more tolerant of field duplication, as long as the descriptors are not identical, but the Java language itself does not support field reloading.

2.7 Set of method tables

The method table collection records all the methods inside the Class file, and each method is stored in a separate method table.

type The name of the The number of
u2 access_flags 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Methods also contain access flags, name indexes, descriptor indexes, and additional included property tables, so the method and field tables are structurally identical. Only access flags differ partly because some modifiers are unique to methods or fields.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the method is public
ACC_PRIVATE 0x0002 Whether the method is private
ACC_PROTECTED 0x0004 Whether the method is protected
ACC_STATIC 0x0008 Whether the method is static
ACC_FINAL 0x0010 Whether the method is final
ACC_SYNCHRONIZED 0x0020 Whether the method is synchronized
ACC_BRIDGE 0x0040 Method is a bridge method generated by the compiler
ACC_VARAGES 0x0080 Whether a method accepts an indefinite parameter
ACC_NATIVE 0x0100 Method is a local method
ACC_ABSTRACT 0x0400 Whether the method is abstract
ACC_STRICT 0x0800 Strictfp (Strict Float Point)
ACC_SYNTHETIC 0x1000 Method generated by the compiler or not

If a subclass does not explicitly Override a method of its parent Class, that method does not appear in the method table collection of the current Class file. But again, some methods are generated directly by the compiler, such as the default no-parameter constructor

or the class constructor

.

In the Java Language Specification, signature refers to the method name, parameter type and parameter order in the parameter list, but does not include the return value. This is why Java cannot judge method Overload based on the return value. However, the characteristic signatures in the Java Virtual Machine Specification are broader. As long as the descriptors are not identical, they are considered to be different.

2.8 Property Table Set

Class files, field tables, and method tables can all carry a set of attributes_count and Attributes to describe one or more additional information, each of which is a separate Attribute. The following table structure is used to describe each attribute:

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u1 info attribute_length

Each attribute has a unique name, expressed by referring to a constant pool value of type CONSTANT_Utf8_info. All you need to do for any attribute is to have an attribute_length that identifies the INFO part of the attribute. The internal content is defined by the attribute itself. In addition, a collection of property tables can be nested inside the property, such as the Code property.

In the Java SE 12 version of the Java Virtual Machine Specification, predefined properties have been increased to 29. Property tables are numerous and miscellaneous, and can appear anywhere else (field tables, method tables, even in class files) where additional information is needed to describe them. Here I will introduce only three important attributes that are relevant to the description method.

2.8.1 Code attributes

The access tags, descriptors, and names of a method are stored as separate entries in the method table collection, and the Code portion is recorded in the form of bytecode instructions in the Code property of the method table. Abstract methods have no Code attribute.

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u2 max_stack 1
u2 max_locals 1
u4 code_length 1
u1 code code_length
u2 exception_table_length 1
exception_info exception_table exception_table_length
u2 attributes_count 1
attribute_info attributes attributes_count

Attribute_name_index is a value pointing to a string constant that records the name of the Code attribute “Code”. After removing the four bytes taken up by attribute_length, the valid content of the Code attribute takes up the table length of -6 bytes.

Max_stack represents the maximum depth of the operand stack when this method is executed. The virtual machine assigns Stack Frame depth based on this value when loading the Class file and various methods.

Max_locals represents the space occupied by a local variable table (in slots). It is the smallest unit of memory that a virtual machine can allocate to local variables at runtime. Use 1 variable slot for data types up to 4 bytes, such as byte, CHAR, float, int, short, Boolean, Reference and returnAddress. For long and double types, two variable slots are used.

In addition to local variables inside a method, method parameters (any instance method contains the hidden parameter this), and parameters for explicit exception handling (exceptions passed inside a catch in a try-catch block) need to be saved in the local variable table. The javac compiler uses a variable slot reuse strategy to save stack depth and memory, so the max_locals size is not a simple sum of the space taken up by various variables.

Code_length and code store bytecode instructions converted from source code in byte streams. The Java Virtual Machine Specification specifies about 200 basic bytecode instructions in 1 byte, and can theoretically specify up to 256 instructions.

Bytecode instructions 2 can be read directly as mnemonics using javap tools. Code_length is a value of type U4, and although in theory a method can contain 2^32 instructions, in practice the Java Virtual Machine Specification only allows u2 length.

As an example of TestJava class parsing, the following is the result of the main function parsing using javap commands:

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=4, args_size=1
         0: iconst_1
         1: istore_1
         2: iconst_2
         3: istore_2
         4: iload_1
         5: iload_2
         6: iadd
         7: istore_3
         8: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
        11: iload_3
        12: invokevirtual #3                  // Method java/io/PrintStream.println:(I)V
        15: return
      LineNumberTable:
        line 9: 0
        line 10: 2
        line 11: 4
        line 12: 8
        line 13: 15
}
Copy the code

Lines 0 to 1 and 2 to 3 of the bytecode push local variables A and B into the operand stack, respectively, while lines 4 to 6 perform loading, adding, and saving the computed results with the istore_3 instruction. On lines 8 to 12, you can see from the comment that the system.out ::println method is called and the result is printed to the console. (For the specific meaning of each instruction, please refer to the loading and storage instruction later.)

The main method passes an argument of type [Ljava/lang/String, so args_size=1. For instance methods, the minimum value of args_size is 1, since the compiler always implicitly passes in the this reference to the current object so that instance attributes are freely accessible within instance methods, while for class methods that take no arguments, args_size is 0.

LineNumberTable records the line number correspondence between Java source code and bytecode instructions, which we’ll cover later.

If the source Code attempts to catch an exception through a try-catch statement, the compiled bytecode instruction is followed by Exception_table, which is part of the Code’s internal information. Use the following code to demonstrate:

public static void main(String[] args) {
    try {
        int a = 1;
        int b = 2;
        int c = a / b;
        System.out.println(c);
    } catch(Exception e) { e.printStackTrace(); }}Copy the code

It compiles as follows:

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=4, args_size=1
         0: iconst_1
         1: istore_1
         2: iconst_2
         3: istore_2
         4: iload_1
         5: iload_2
         6: idiv
         7: istore_3
         8: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
        11: iload_3
        12: invokevirtual #3                  // Method java/io/PrintStream.println:(I)V
        15: goto          23
        18: astore_1
        19: aload_1
        20: invokevirtual #5                  // Method java/lang/Exception.printStackTrace:()V
        23: return
      Exception table:
         from    to  target type
             0    15    18   Class java/lang/Exception
Copy the code

The Exception table shows the following information: If 0 ~ 15 line throws Java/lang/Exception Exception, jump to line 18 instruction continue (invokes the Java. Lang. Exception: : printStackTrace), Otherwise jump directly to line 23 with the goto instruction and return.

Each line from, to, target and type corresponds to the following table structure of Exception_table:

type The name of the The number of
u2 start_pc 1
u2 end_pc 1
u2 handler_pc 1
u2 catch_type 1

2.8.2 Exceptions properties

The Exceptions attribute is an independent attribute equal to the Code attribute. Note that it is different from the Exception_table attribute above. Exception_table records exceptions that may be thrown by a method explicitly declared with the throws Exception_table keyword, which is different from the above code. Here is a method that throws IOException:

public void io(a) throws IOException{}
Copy the code

It compiles as follows:

  public void io() throws java.io.IOException;
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=0, locals=1, args_size=1
         0: return
      LineNumberTable:
        line 23: 0
    Exceptions:
      throws java.io.IOException
Copy the code

Throws allows multiple Exceptions to be thrown, all of which are contained in the Exceptions attribute table. Each exception is an Exception_INDEx_table, which is an index to a constant of type CONSTANT_Class_info indicating the type of exception being checked.

type The name of the The number of
u2 attribute_name_index 1
u4 attribute_length 1
u2 number_of_exceptions 1
u2 exception_index_table number_of_exceptions

2.8.3 LineNumberTable properties

It is a “child property” of the Code property. I just introduced the LineNumberTable attribute, which is an optional attribute that the compiler always carries by default. If you ungenerate this information, the biggest impact is that when an exception is thrown at runtime, the stack cannot provide the source line number of the error, and the IDE cannot set breakpoints from the source line.

type The name of the The number of
u2 attribute_name_index 1
u2 attribute_length 1
u2 attribute_table_length 1
line_number_info line_number_table line_number_table_length

Each corresponding piece of information is a data structure of type LINe_number_info, which contains both start_PC (the line number, or offset, that records bytecode instructions) and LINe_number (the number of corresponding lines of source code).

2.9 summary

The mapping between Java files and Class files can be expressed roughly as follows:

Even this intricate diagram only describes a small part of the Class file structure. There are other types of property sheets, and they vary in scope and function. Here are just a few reasons for the length. Such as:

The SourceFile attribute records the name of the SourceFile that generated this Class file.

The InnerClass property records the association between the InnerClass and the host class.

The Signature attribute allows the user to dynamically retrieve the type of a generic at runtime;

The Synthetic attribute marks whether the methods, fields, or classes themselves are directly generated by the compiler;

The BootstrapMethods attribute is related to the invokeDynamic bytecode instruction;

The StackMapTable attribute is used for new type checking to verify it during the bytecode verification phase of the virtual machine class load.

Among them, the last two attributes BootstrapMethods and StackMapTable involve complex functions, which the author will mention in future articles.

3. Bytecode instructions

A bytecode instruction consists of a one-byte Opcode and zero or more Operand required by the instruction. Because opcodes are limited to one byte in length, this limits the virtual machine to a maximum of 256 bytecode instructions. In addition, because Java is an architecture for operand stacks rather than registers, most bytecodes contain only one Opcode.

The work of the Java Virtual Machine interpreter can be explained by the following basic execution model:

Do {PC +1 fetch the opcode indicating the position of PC value if(this opcode contains operands) fetch the operand; Execute the bytecode instruction; }while (bytecode stream remaining length > 0)Copy the code

Most bytecode instructions specify the data type of the operand, for example, the I of the mnemonic ILoad (0x15) means loading int data from a local variable table into the operand stack. L for long, S for short, B for byte, C for char, F for float, D for double, and a for a reference.

Due to the limited length of bytecode instructions, some bytecode instructions that require both Opcode and multiple Operands may not be represented, so the instruction set is deliberately designed to be non-independent (meaning that not every data type and operation has corresponding operation instructions). Thus, some instructions convert unsupported types to supported data types when necessary.

In fact, most instructions don’t operate directly on byte, char, and short, and none even supports Boolean. The compiler converts byte and short data into corresponding ints by sign-extend 3; Char and Boolean data is converted to int by zero-extend 4.

Thus, bytecode instructions can actually be thought of as operating on only five types of data: I, L, F, D, and A. Here are the bytecode instructions by function.

3.1 Loading and storing instructions

These instructions are responsible for transferring data back and forth from the local variable table of the stack frame to the operand stack. Include:

To load local variables onto the operand stack? Load,? Load_ < n > instructions;

Load constants into the operand stack bipush, sipush, LDC, LDC_w, LDC2_W, aconst_NULL, iconst_M1, iconst_< I >, LCONST_ < L >, fCONST_

, dCONST_ < D >;

To store values from the operand stack to a local variable table? Store,? Store_ < n > command.

Among them,? Represents the possible operation types I, L, F, D, a. The _

suffix represents _0, _1, such as iload_0, which means that the first int value in the local variable table is added to the operand stack. Such opcodes already implicitly express operands, so there is no need to provide additional operands.

3.2 Operation instruction instruction

Operation instructions mainly include: four operations, to take the inverse, bit operation and some other operations. These directives include:

Addition instruction? Add, subtraction instruction? Sub, multiplication instruction? Mul, division instruction? Div, complementary instruction? Rem, take the counter instruction? Neg, where? Contains I,; , f, D;

Displacement instruction? SHR,? SHL,? Ushr, by bit or instruction? Or, by bit and instruction? And, bitwise xor instruction? Xor, where? Contains only l and I;

Local increment instruction Iinc;

Compare instructions DCMPG, DCMPL, FCMPG, FCMPL, LCMP.

Data operations can overflow, as in the following code:

int a = Integer.MAX_VALUE;
int b = Integer.MAX_VALUE;
// The result is -2
System.out.println(a + b);
Copy the code

This is because a finite number of bits does not fully represent the calculated value, so a mathematically incorrect result is obtained. The solution is to store data in longer bits:

int a = Integer.MAX_VALUE;
int b = Integer.MAX_VALUE;

long la = a;
long lb = b;
// 2^32 = 4294967294
System.out.println(la + lb);
Copy the code

The Java Virtual Machine Specification does not require an exception to be thrown when a data overflow occurs. Only when executed? Div and? In REM, if the divisor is 0, the ArithmeticException is thrown, and the rest of the ArithmeticException should not be thrown. When the VM computs Floating Point numbers, it strictly complies with IEEE 754’s Denormalized Floating-point Number (5) and “cascading downflow” calculation rules and rounds the calculation result to an appropriate precision.

3.3 Type conversion instructions

This type of instruction is used to convert different numeric types to and from each other, usually because the user has used explicit conversions in the source code, or to solve the problem that some instructions do not support this data structure. VMS directly support data conversion from a small range to a large range, that is:

  • int= >long.float.double;
  • long= >float.double;
  • float= >double;

Instead, Narrowing type Conversion is played by means of directives, which are in the form of? 2? , namely i2b, I2c, i2s, L2I, F2i, F2L, D2L, D2F. The specific process of narrowing is carried out in accordance with IEEE 754 standards. The conversion process will inevitably cause precision loss, even upper limit overflow and lower limit overflow, but the VIRTUAL machine will not throw runtime exceptions.

3.4 Object creation and access instructions

The various class instances and arrays in Java are objects, but the virtual machine chooses to create them using different bytecode instructions (the details of object creation and array creation are also different). In addition, you need to specify instructions to access class fields, access array elements, and so on.

Create class instance new;

Newarray, anewarray, multianewarray;

Getfield, putField, getStatic, putStatic;

The instruction to load an array element onto the operand stack? Store,? Here it refers to any of eight data types;

Arraylength; arrayLength;

Check the instructions instanceof, checkcast for the class type.

3.5 Operand stack instructions

The virtual machine provides instructions to directly control the operand stack, including:

Pop1, pop2; pop1, pop2;

Dup, duP2, dup1_X1, dup1_X2, dup2_X1, dup2_x2;

Swap the two numbers at the top of the stack swap.

3.6 Control transfer instruction

The program counter PC is always executed sequentially without interference, and the instructions in this part can cause the PC to jump directly to a location in the instruction stream conditionally or unconditionally, including:

The conditional branch (compare) instruction if? , ificmp? , ifacmp? ;

Note: CMP stands for compare. Besides, here? Eq (equal), ne (not equal), lt(litter than), le (litter or equal), gt (greater than), Ge (greater or equal), refer to the comparison command in the Shell script.

Conditional branch instruction ifNULL, ifnonNULL;

Compound conditional branch instructions tableswitch, lookupswitch;

Unconditional branches goto, GOTO_W, JSR, jSR_W, RET.

3.7 Method calls and return instructions

Only the instructions related to method calls and returns are covered here, and the mnemonics of these instructions are prefixed with invoke. Include:

Invokevirtual directives: instance methods used to dynamically invoke objects;

Invokeinterface: Invokes an interface method. The virtual machine searches for an object that implements the interface and invokes the corresponding method.

Invokespecial: Call some methods that require special processing, including instance initialization methods, private methods and parent methods;

Invokestatic instruction, which invokes class-static methods;

The Invokedynamic directive, or inDy for short, was originally used in other dynamic languages running on the JVM, but was not officially introduced in Java until JDK 8, where it is associated with the Java λ expression. This directive basically implements the dynamic invocation of a Method (called the Bootstrap Method). A Java source code containing the λ expression is given below:

import java.util.function.Function;

public class JavaTest{
    
	public static int iHof(int a,Function<Integer,Integer> op) {
		return op.apply(a);
	}
	
	public static void main(String[] args) {
		System.out.println(iHof(1,(a) -> 2* a)); }}Copy the code

Before calling the iHof method, the compiler inserts an Invokedynamic instruction to “dynamically generate” (a) -> 2 * a expressions, and at the same time, The BootstrapMethods attribute also appears in the class file (not mentioned in this article).

public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC Code: stack=3, locals=1, args_size=1 0: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 3: iconst_1 4: invokedynamic #7, 0 // InvokeDynamic #0:apply:()Ljava/util/function/Function; 9: invokestatic #8 // Method iHof:(ILjava/util/function/Function;)I 12: invokevirtual #9 // Method java/io/PrintStream.println:(I)V 15: return LineNumberTable: line 11: 0 line 12: 15 ... BootstrapMethods: 0: #35 invokestatic java/lang/invoke/LambdaMetafactory.metafactory:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invo ke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invok e/CallSite; Method arguments: #36 (Ljava/lang/Object;)Ljava/lang/Object; #37 invokestatic JavaTest.lambda$main$0:(Ljava/lang/Integer;)Ljava/lang/Integer; #38 (Ljava/lang/Integer;)Ljava/lang/Integer;Copy the code

The details of this command will be described in the following study (see Chapter 8 of Understanding the Java Virtual Machine).

Method invocation directives are independent of the data type, but method return directives depend on the return value of the method. Instructions are divided into? Return and return, the former? Can refer to I, L, f, D, a. Char, byte, short, Boolean are converted to int data processing. The latter is used when the method returns a value of void.

3.8 Exception Handling Instructions

All exceptions thrown in Java code are implemented by athrow instructions of bytecode instructions, such as:

public void iflt0(int i){
    if(i < 0) throw new ArithmeticException("need a num greater or equal 0");
}
Copy the code

If the condition is correct, the ifge instruction on line 1 will execute the return instruction on line 14 directly, otherwise it will execute in sequence until the athrow instruction on line 13 throws an exception.

public void iflt0(int); descriptor: (I)V flags: ACC_PUBLIC Code: stack=3, locals=2, args_size=2 0: iload_1 1: ifge 14 4: new #6 // class java/lang/ArithmeticException 7: dup 8: ldc #7 // String need a num greater than 0 10: invokespecial #8 // Method java/lang/ArithmeticException."<init>":(Ljava/lang/String;) V 13: athrow 14: return LineNumberTable: line 10: 0 line 11: 14Copy the code

Exception handling in the form of try-catch is recorded by the Exception_table in the Code attribute. VMS used JSR and RET instructions a long time ago, but they are no longer used.

3.9 Synchronization Command

Synchronization can be divided into synchronization between methods and synchronization of a code block within a method.

Synchronization between methods can be implemented directly at the source level with the synchronized keyword, which the virtual machine will check by accessing the ACC_SYNCHRONIZED flag in the flag, without requiring additional bytecode instructions to maintain. With such a method, the thread always obtains a pipe (the equivalent of a “lock” on the method) while executing it, and does not release the pipe until the execution is complete. If an exception occurs inside a synchronized method that cannot be handled, the pipe is automatically released as the exception is thrown outside the method boundary.

If the synchronization is for a block of code (for the compiler, a bytecode instruction range), the compiler does this by inserting monitorenter (” into the critical section “) and Monitorexit (” out of the critical section “). Here is a simple synchronization code:

public static void main(String[] args){
    Integer a = 0;
    synchronized(a){ a++; System.out.println(a); }}Copy the code

The main method compiles the following Code properties:

Code: stack=2, locals=4, args_size=1 0: new #9 // class java/lang/Object 3: dup 4: invokespecial #1 // Method java/lang/Object."<init>":()V 7: astore_1 8: aload_1 9: dup 10: astore_2 11: monitorenter 12: getstatic #10 // Field java/lang/System.out:Ljava/io/PrintStream; 15: aload_1 16: invokevirtual #11 // Method java/lang/Object.toString:()Ljava/lang/String; 19: invokevirtual #12 // Method java/io/PrintStream.println:(Ljava/lang/String;) V 22: aload_2 23: monitorexit 24: goto 32 27: astore_3 28: aload_2 29: monitorexit 30: aload_3 31: athrow 32: return Exception table: from to target type 12 24 27 any 27 30 27 anyCopy the code

The monitorenter directives appear on line 11 and monitorexit directives appear on line 23. The directives wrapped in the middle are the ones that need to be synchronized. Under the condition of normal execution of this synchronization instruction, return can be directly after the execution of the 24th line instruction; otherwise, it jumps to the 27th line bytecode to continue execution according to the instruction of Exception_table and waits for the exit from the synchronization area to throw an exception and return.

To ensure that the synchronized sequence of bytecode instructions can safely exit the critical section through Monitorexit in case of an exception, the compiler always automatically generates a try-catch bytecode instruction for it and declares that the Athrow instruction can throw an exception of type ANY.

4. Reference links

The various complete tables covered in Chapter 6 of Into the Virtual Machine JVM are available: Java Class file formats, types of constant pool items, and table structures _ Turn -CSDN blog


  1. I’ve talked about Java’s cross-platform nature from the perspective of computer language development: Miscellany: Why Java can be cross-platform? ↩ (juejin. Cn)
  2. The difference between utF-8 and normal UTF-8 encoding is: from\u0001\u0071The characters between are stored in 1 byte,\u0080\u07ffThe characters between are stored using 2 bytes, and the remaining characters are stored using normal UTF-8 encoding rules.↩
  3. The complete encoding and corresponding mnemonic instructions are available at CSDN: bytecode instruction set ↩
  4. Symbol extension, that is, the digit 0 or 1 that is filled in the high position is the same as the original symbol digit. ↩
  5. Zero expansion, that is, the high position is filled with all zeros. ↩
  6. For details about non-standard floating point numbers, see IEEE754 floating point notation ↩