The transition from native machine code to bytecode as a result of code compilation is a small step in the development of storage formats, but a giant leap in the development of programming languages.

Class file structure

As the JVM is a general-purpose, machine-independent execution platform, implementors of any other language can use the Java Virtual Machine as the basis for their language and Class files as the delivery medium for their products. For example, Java compilers can compile Java code into Class files that store bytecodes, and compilers in other languages such as JRuby can compile their source code into Class files. The virtual machine doesn’t care what language the Class comes from.

Each Class file corresponds to a unique Class or interface definition, but conversely, classes or interfaces do not always have to be defined in a file (for example, classes or interfaces can also be generated dynamically and fed directly into the Class loader).

A Class file is a set of binary streams in 8-byte units. Each data item is arranged in a compact and strict order in the file, without adding any delimiters in the middle. This makes the entire Class file store almost all the data necessary for the program to run, and there is no gap.

The Class file format uses a pseudo-structure similar to the C-language structure to store data, with only two data types: “unsigned number” and “table.”

Unsigned number

It is a basic data type. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.

table

Is a compound data type consisting of multiple unsigned numbers or other tables as data items, all tables are customarily named with “_info” to make them easier to distinguish. Tables are used to describe hierarchical composite structures of data, and the entire Class file can be viewed as essentially a table.

Win10 use powershell Format- hex-path. / demo1. class to view hexadecimal class files

The magic number

The first four bytes of each Class file are called the Magic Number, whose sole purpose is to determine whether the file is an acceptable Class file for the virtual machine. The value is 0 xcafebabe

The version number

Second version number

The fifth and sixth bytes representing the sub-version number are 0x0000

The major version number

The seventh and eighth byte values representing the major version number are 0x0034, which is 52 in decimal. The virtual machine refuses to execute Class files that are older than its version number. The mapping between major version numbers and JDK versions can be found in the following table:

Constant pool

A constant pool can be likened to a repository of resources in a Class file. It is the data that is most associated with other items in a Class file structure, and is usually one of the data items that occupy the largest space in a Class file.

Constant pool capacity

The ninth and tenth byte values are 0x001E, which is 30 in decimal. This means that there are 30 constants in the constant pool with indexes ranging from 1 to 29. In the Class file structure, only constant pool capacity counts start at 1. For other collection types, including interface index collections, field table collections, method table collections, and so on, capacity counts start at 0, as is customary.

Constant pool (constant_pool)

Immediately following the constant pool counter is the constant pool contents of the.class file. Data stored in the constant pool is generally classified into two types: literals and symbolic references

  • Literals: text strings, constant values declared final, and so on

  • Symbolic reference:

  • Packages exported or exposed by modules

  • Fully Qualified Name of class and interface

  • Field name and Descriptor

  • The name and descriptor of the method

  • Method handles and Method types (Method Handle, Method Type, Invoke Dynamic)

  • Dynamic call points and dynamic constants

Each constant in the constant pool is a table. As of JDK 13, there were 17 different types of constants in the table. The first bit at the beginning of the table structure is a flag bit of type U1 (see the flag column below), which represents the current constant type.

For example, the value 0x0A after the constant pool capacity of the class file represents the first constant pool flag bit (tag). In decimal 10, it represents CONSTANT_Methodref_info

Tag is the flag bit, which is used to distinguish constant types, and index is the index value of the constant pool. The first index(0x0005 decimal 5), which points to a constant of type CONSTANT_Class_info with index 5 in the constant pool, The second index(0x001A is 26 in decimal notation) points to a constant of type CONSTANT_NameAndType with index 26 in the constant pool.

Access flags (access_flags)

This flag is used to identify some Class or interface level access information, including whether the Class is a Class or an interface; Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final;

A collection of class indexes, parent indexes, and interface indexes

Class index (this_class)

The class index is used to determine the fully qualified name of the class

Superclass index (super_class)

The superclass index is used to determine the fully qualified name of the class’s parent. Since the Java language does not allow multiple inheritance, there is only one superclass index. All Java classes except java.lang.Object have a superclass, so none of the Java classes except java.lang.Object have a superclass index of zero

Interface index set

Describes which interfaces are implemented by the Class. The implemented interfaces are arranged left to right in the index collection after the implements keyword (or extends keyword if the Class file represents an interface)

Set of field tables

Used to describe variables declared in an interface or class, including class-level and instance-level variables, but not local variables declared inside a method.

A field can include modifiers such as the field’s scope (public, private, and protected modifiers), whether it is an instance variable or a class variable (static modifiers), variability (final), and concurrency visibility (volatile modifiers, Whether to force reads and writes from main memory), whether to serialize (transient modifier), field data type (base type, object, array), field name.

Method table collection

The structure of a method table is the same as that of a field table, consisting of access_flags, name_index, descriptor_index, and Attributes

Method table structure

                                

                                     

Where is the code inside the method? The Java Code in the method is compiled into bytecode instructions by the Javac compiler and stored in a property called “Code” in the method property sheet collection

Property sheet collection

Class files, field tables, and method tables can all carry their own set of property tables to describe information specific to certain scenarios

Tool to view

You can view the class structure information using the IDEA plug-in Jclasslib Bytecode Viewer or using javap-verbose demo. class to output the bytecode content

Bytecode instruction

Java virtual machine instructions consist of a byte number representing the meaning of a particular operation (called Opcode) followed by zero or more parameters representing the operation (called Operand, Operand).

Java virtual machines use a stack-oriented rather than register-oriented architecture, so most instructions contain no operands, only one opcode, and instruction parameters are stored in the operand stack. See the Java Virtual Machine specification for more instructions

Load and store instructions

Load and store instructions are used to transfer data back and forth between the local variable table and the operand stack in a stack frame. These instructions include:

  1. Load a local variable onto the action stack: ILoad, lload, FLOad, dload, ALOad
  2. Store a value from the operand stack into a local variable table: ISTore, lstore, fstore, dstore, astore
  3. Load a constant onto the operand stack: bipush, sipush, LDC, LDC_w, LDC2_w, aconST_NULL, iconst_M1
  4. Instruction that extends the access index of a local variable table: wide

Arithmetic instructions

Used to perform a specific operation on two operand stacks and store the result back to the top of the stack

  1. Subtraction instructions: ISub, LSUB, fsub, dsub
  2. Multiplication instruction: IMul, LMUl, FMUl, dMUl
  3. Division instructions: IDIV, Ldiv, fdiv, ddiv
  4. Redundant instructions: IREM, LREM, frem, DREM
  5. Fetch counter instruction: ineG, Lneg, fNEg, dNEg
  6. Displacement commands: ISHL, ISHR, IUSHR, LSHL, LSHR, LUShr
  7. Bitwise or instruction: IOR, LOR
  8. Bit and instruction: IAND, LAND
  9. Xor instruction by bit: IXOR, LXOR
  10. Local variable increment instruction: iinc
  11. Comparison commands: DCMPG, DCMPL, FCMPG, FCMPL, LCMP

Object creation and access directives

The Java virtual machine uses different bytecode instructions to create and manipulate class instances and arrays. Once the object is created, you can retrieve fields or array elements from the object instance or array instance through the object access instruction.

  1. Directive to create class instances: new
  2. Instructions for creating arrays: newarray, anewarray, multianewarray
  3. Directives that access class fields (static fields, or class variables) and instance fields (non-static fields, or instance variables) : getField, putfield, getStatic, putStatic
  4. The instruction to load an array element into the operand stack: baload, caload, Saload, iaload, laload, faload, daload, aaload
  5. Instructions to store the values of an operand stack in an array element: Bastore, Castore, sastore, iastore, fastore, dastore, aastore
  6. The instruction to take the length of an array: arrayLength
  7. Directives to check class instance types: instanceof, checkcast

Operand stack management instructions

The Java virtual machine provides instructions for manipulating the operand stack directly, just as it would in a normal data structure.

  1. Remove one or two elements from the top of the operand stack: pop, POP2
  2. Duplicates one or two values from the top of the stack and pushes the duplicates or double duplicates back to the top: dUP, DUP2, DUp_X1, DUp2_X1, DUp_x2, dup2_x2
  3. Swap the top two values of the stack: swap

Control transfer instruction

A control transfer instruction allows the Java virtual machine to conditionally or unconditionally proceed from the next instruction at a specified location (instead of the control transfer instruction). From a conceptual model, a control instruction can be thought of as modifying the value of a PC register conditionally or unconditionally.

  1. Conditional branch: Ifeq, IFLT, IFLE, IFNE, IFGT, IFGE, IFNULL, IFnonNULL, IF_ICMPEQ, IF_ICMPNE, IF_ICMPLT, If_icmpgt, IF_ICmPLE, if_ICMPGE, if_ACMPEq, and if_ACMPne
  2. Compound condition branches: Tableswitch and LookupSwitch
  3. Unconditional branches: GOTO, GOTO_W, JSR, jSR_W, ret

Method calls and return directives

Method call directives are independent of the data type, while method return directives are differentiated by the type of the return value, including iReturn (used when the return value is Boolean, byte, CHAR, short, and int), LReturn, freturn, dreturn, and Areturn, There is also a return directive for methods declared as void, instance initializers, and class initializers of classes and interfaces.

  1. The Invokevirtual directive: An instance method used to invoke an object is dispatched based on the actual type of the object (virtual method dispatch), which is the most common method dispatch method in the Java language.
  2. Invokeinterface directive: Invokes an interface method, which searches at runtime for an object that implements the interface method and finds the appropriate method to invoke.
  3. The Invokespecial directive is used to call instance methods that require special processing, including instance initialization methods, private methods, and parent methods.
  4. Invokestatic directive: Used to invoke class static methods (static methods).
  5. Invokedynamic directive: Used to dynamically resolve the method referenced by the call point qualifier at run time. And execute the method.

Exception handling instruction

All operations that explicitly throw an exception in a Java program (athrow statement) are implemented by the athrow directive, except when an exception is explicitly thrown by athrow statement.

In the Java Virtual machine, catch statements are handled not by bytecode instructions, but by exception tables.

Synchronization instructions

The Java virtual machine can support method-level synchronization and synchronization of a sequence of instructions within a method, both of which are implemented using a pipe procedure (Monitor, more commonly referred to as a “lock”).

When a method is invoked, the calling instruction checks to see if the ACC_SYNCHRONIZED access flag of the method is set. If so, the thread of execution requires that it successfully hold the pipe before executing the method, and finally release the pipe when the method completes.

During method execution, the executing thread holds the pipe, and no other thread can retrieve the same pipe. If a synchronized method throws an exception during execution and cannot handle the exception inside the method, the pipe held by the synchronized method is automatically released when the exception is thrown outside the synchronized method boundary.

Synchronizing a sequence of instructions is typically represented by a block of synchronized statements in the Java language. Monitorenter and Monitorexit in the Java VIRTUAL machine command set support the semantics of synchronized.

How is bytecode executed

Java Virtual machines (VMS) use methods as the most basic execution unit, and Stack frames are the data structures behind method invocation and method execution supported by VMS. They are also Stack elements of the Virtual Machine Stack in the data area when VMS are running. A stack frame stores information about a method’s local variogram, operand stack, dynamic linkage, and method return address

Local Variables Table

A local variable table is a storage space for a set of variable values used to store method parameters and local variables defined within a method. The capacity of a local Variable table is in Variable Slot. Each Variable Slot can store a Boolean, byte, CHAR, short, int, float, Reference, or returnAddress data

Operand Stack

The operand stack, often referred to as the operation stack, is a Last In First Out (LIFO) stack. As with the local variable table, the maximum depth of the operand stack is written into the max_stacks data item of the Code attribute at compile time.

Dynamic connection

Each stack frame contains a reference to the method in the runtime constant pool to which that stack frame belongs, which is held to support Dynamic Linking during method invocation. The Class file has a large number of symbolic references in the constant pool, and the method invocation instructions in the bytecode take symbolic references to methods in the constant pool as arguments.

Some of these symbolic references are converted to direct references during class loading or the first time they are used, which is called static resolution. The other part, which is converted to a direct reference at each run, is called the dynamic join.

Method return address

Once a method is executed, there are only two ways to exit the method. After a method exits, it must return to where it was when the original method was called in order for the program to continue. When a method returns, it may need to store some information in the stack frame to help restore the execution state of its upper calling method.

  • One is when the execution engine encounters a bytecode instruction returned by either party, in which case a return value may be passed to the upper method caller.
  • One exit is when an exception is encountered during method execution and the exception is not handled properly within the method body.

Bytecode execution process

Let’s take the simple arithmetic code above as an example

1. Execute the offset address 0 instruction. The Bipush instruction pushes the integer constant value (-128 ~ 127) of a single byte to the top of the operand stack, followed by an argument specifying the constant value pushed, in this case 100.

2. Execute the instruction whose offset address is 2. Istore_1 removes the integer value from the top of the operand stack and stores it in the first local slot. The next four instructions (up to offset 11) all do the same thing, which is to assign variables A, B, and C to 100, 200, and 300 in the corresponding code.

Iload_1 copies the integer value in the first slot of the local variable table to the top of the operand stack.

4. Execute the instruction whose offset address is 12. Iload_2 is similar to ILoAD_1 by pushing the integer value of the second slot. The main purpose of this illustration is to show the stack before the next IADD instruction is executed.

Iadd removes the top two elements of the operand stack, adds integers, and pushes the result back onto the stack. After the iADD command completes, the original 100 and 200 are removed from the stack, and their sum and 300 are reloaded.

6. Execute the offset 14 instruction, iloAD_3 to the operand stack 300 stored in the third local slot. The operand stack is two integers, 300. The next instruction, IMul, takes the first two top elements of the operand stack off the stack, multiplies them by integer, and pushes the result back onto the stack, exactly like IADD

7. Execute the instruction whose offset address is 16. The iReturn instruction is one of the method return instructions that terminates the method execution and returns the integer value at the top of the operand stack to the caller of the method. That’s it. This method is done

The Class file format is platform neutral (independent of specific hardware and operating system), compact, stable and extensible, which is an important pillar of Java technology system to achieve platform independent and language independent characteristics.

reference

Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 3)