Understanding bytecode and JVM execution can help us solve problems in daily development. This time, let’s look at how to read bytecode first.
The bytecode file is an xxx.class file that we compiled using Javac XXx. Java and contains an 8-bit stream of bytes. The data types in class are U1, U2, and U4. U1 means that the data occupies 1 byte, U2 means 2 bytes, and so on.
Each class file contains a class, interface, and module definition. The content structure represented by these byte streams is defined by a data structure similar to the C language structure, which is given in the virtual machine specification as follows. Using this structure, we can decompile the content of class byte stream.
Excerpt from Java Virtual Machine Specification.
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count- 1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Copy the code
1. Sample code
This time we will use simple Demo.java to introduce the process of reading bytecode and how to understand bytecode. Take a look at the following code,
public class Demo {
private int mThisIsInt = 1024;
public static void main(String[] args) {
System.out.println("hello world");
}
private int getThisIsInt(a) {
returnmThisIsInt; }}Copy the code
2. Compiled intoclass
The bytecode
To get the demo.class file, execute javac demo. Java. This is the target document that we are going to read for this analysis.
3. Read the bytecode
usehexdump
The hexdump plug-in is provided in VSCode to display the hexadecimal data of bytecode. After installing the plug-in, right-click the decompiled demo. class file in VSCode and select the Show Hexdump option to see the following.
We can start to interpret this content by referring to the ClassFile Structure mentioned above.
magic
Part of the
The ClassFile structure tells us that the first part is magic, u4 means four bytes, corresponding to CAFEBABE. This part is used to identify the Java bytecode file format. You can see the source of the Java icon here.
version
Part of the
The contents of parts 2 and 3 are minor_version and major_version, occupying 2 bytes respectively, corresponding to parts 0000 and 003A. Indicates that the major version of the class file is 58 and the minor version is 0. This version identifier is associated with the JDK version. For details, see Java Virtual Machine Specification Table 4.1-a.
constant_pool
Part of the
Parts 4 and 5 are constant_pool_count and constant_pool. Constant_pool_count specifies the number of constant pools, equal to the number of items in constant_pool. Constant_pool is used to store each constant. Its structure is cp_info. The content is as follows. This consists of two parts, a 1-byte tag representing the cp_INFO type, such as a 7 for Class and a 9 for Fieldref.
cp_info {
u1 tag;
u1 info[];
}
Copy the code
Take Methodref for example. The structure of a method is described below.
CONSTANT_Methodref_info {
u1 tag;
u2 class_index;
u2 name_and_type_index;
}
Copy the code
tag
为10
On behalf ofMethodref
Type;class_index
It points to the constant poolCONSTANT_Class_info
The index of the type data, representing which class the current method belongs to;name_and_type_index
It points to the constant poolCONSTANT_NameAndType_info
An index of type data that represents the name of the method and a descriptor that describes the parameter and return value types of the method.
At this point, reading bytecode hexadecimal data directly becomes tedious. Here, we borrow another tool to further explain the way constant pools and their indexes work.
usejavap
Run javap -v -p demo. class to obtain a more readable decompiler file. Let’s take a look at part of the Constant Pool.
// omit content... Constant pool: #1 = Methodref #2.#3 // java/lang/Object."<init>":()V #2 = Class #4 // java/lang/Object #3 = NameAndType #5:#6 // "<init>":()V // omit content...Copy the code
A constant pool uses arrays to hold the contents of each constant. Each constant has its own index, which is represented in the javap decompiled file using #n.
#1 represents a Methodref method type constant, which identifies itself by referring to constant #2 #3.
#2 represents the Object class.
#3 uses constants #5 and #6 to represent itself. #5 #6 are both of type CONSTANT_Utf8 with data contents of
and ()V, respectively. Means the name of the method is
, there are no parameters, and the return type is void.
From this we can get the following structure.
methods
Part of the
Access_flags, this_class, etc., respectively represent the access identifier, parent class, interface, field and other information of the class. The interpretation method is similar to that of the constant pool and will not be listed in detail.
It’s worth looking at the method_info section. This contains all the methods in the class. The method is described in the same way as above, where the attributes attribute is structured as follows.
attribute_info {
u2 attribute_name_index;
u4 attribute_length;
u1 info[attribute_length];
}
Copy the code
That’s where the code part of our method is stored. Its attribute_name_index points to a constant whose content is Code, indicating that this is a Code attribute. Attribute_length is the length of the attribute, followed by the info array that stores the bytecode corresponding to the method code. Let’s take the getThisIsInt method in Java code, whose Java source code and corresponding bytecode are shown below.
Private int getThisIsInt() {return mThisIsInt; } private int getThisIsInt(); descriptor: ()I flags: (0x0002) ACC_PRIVATE Code: stack=1, locals=1, args_size=1 0: aload_0 1: getfield #7 // Field mThisIsInt:I 4: ireturn LineNumberTable: line 11: 0Copy the code
Let’s focus on the Code section.
Stack =1 indicates that the maximum depth of the operand stack is 1.
Locals =1 indicates that the maximum number of slots in the local variable table is 1.
Args_size =1 indicates that the number of method arguments is 1. Note that the getThisIsInt method does not display arguments. However, this method is an instance method and has direct internal access to this, so when translated to bytecode, this is added by default to provide access to the current object within the method. That is, this is accessible within a method by adding a default parameter to the method.
0: aload_0
1: getfield #7 // Field mThisIsInt:I
4: ireturn
Copy the code
Next comes the bytecode corresponding to the code, which loads the class field constant #7 and returns it. The specific execution process involves the JVM instruction set, but we will have time to talk about it later.
The n at the beginning of the line represents the bytecode offset, which is equal to the storage size occupied by all instructions before the current instruction. For example, the aload_0 instruction before getField #7 takes up 1 byte, so its offset is 1.
The LineNumberTable section is the mapping between the bytecode offset and the Java source code. Line 11: 0: return mThisIsInt; The bytecode portion that corresponds to the code bytecode starting with offset ALOad_0.
Using jclasslib
In addition to hexdump and Javap, we can also use the Jclasslib tool to read bytecode. It is more intuitive and convenient to use. For example, it provides the ability to jump between constant pool indexes, view JVM specifications, and replace opcodes.
Jclasslib provides both a standalone app and an IDEA plug-in. Check out jclassLib Github and the IDEA plugin. The following figure shows a screenshot of the IDEA plug-in.
conclusion
This time we saw how to interpret Java bytecode using hexdump, Javap, and Jclasslib tools in conjunction with ClassFile Structure in the JVM specification.
In the process of reading, we also learned how to access this in the method with the bytecode.
For the JVM instruction set and Code execution process, we will have time to talk about it for space reasons.