Outline of this paper:Computers only recognize zeros and ones, so we write programs that need to be compiled by a compiler into a binary format of zeros and ones before computers can execute them. However, instead of compiling our code into binary native machine code for the computer to read, the virtual machine now compiles it into OS – and platform-independent bytecodes called Class files. The Java virtual machine compiles Java files into Class files that the computer recognizes and runs. The Java virtual machine is associated only with a Class file, which is a specific binary file format that contains the Java Virtual machine instruction set and symbol table, as well as several other auxiliary information.

For example, the following code, compiled by the Java Virtual machine, yields a binary file.

public class Person{ public int work(){ int x = 1; int y = 2; int z = (x+y)*10; return z; } public static void main(String[] args){ Person person = new Person(); person.work(); }}Copy the code

The javac command is used to compile the Class file: use the Hex in NodePad++ to convert the Hex to hexadecimal.

Identify the Class file structure

1. The concept

Each Class file corresponds to a unique Class or interface definition information, but on the other hand, classes or interfaces do not always have to be defined in a file (for example, classes or interfaces can also be generated directly through the Class loader). In this chapter, the format that any valid Class or interface should meet is colloquially referred to as the “Class file format,” which does not necessarily exist as a disk file. The “Class file” should be a binary stream of bytes in any form. A Class file is a group of binary streams based on 8-bit bytes. Each data item is arranged in a tight sequence in the Class file. When a data item needs to occupy more than 8-bit space, it will be divided into several 8-bit bytes for storage in a big-endian manner. Unsigned data types have a maximum of 8 bytes.

To analyze a Class file, you need to understand a few concepts. According to the Java Virtual Machine specification, the Class file format uses a pseudo-structure similar to the C-language structure to store data with only two data types: unsigned numbers and tables. Unsigned number: a number of basic data types. U1, U2, U4, and U8 represent the unsigned number table of 1 byte, 2 byte, 4 byte, and 8 byte: a composite data type consisting of multiple unsigned numbers or other tables as data items. All tables tend to end with info.

2. Start exploring the Class file byte by byte

A Class file is a set of 8-byte binary streams in which data items are arranged in a tight sequence without any delimiters.

2.1 the magic number (magic)

The first four bytes in the Class file above, u4, correspond to: “CA FE BA BE” these four bytes are called magic numbers. This determines whether the file is an acceptable Class file for the virtual machine. Many file storage standards use magic numbers for identification.

2.2 Version Of a Class File (Minor Version, Major Version)

The first two “00 00 00 “versions, and the last two “00 34” versions, are the main versions. The Class file is opened and viewed by Hex. The hexadecimal 0x0034 is translated into decimal to 52, which corresponds to the JDK version 1.8. The following table shows the Class version numbers and hexadecimal values for each JDK version.

JDK version number Class version number Hexadecimal value
1.1 45 00 00 00 2D
1.2 46 00 00 00 2E
1.3 47 00 00 00 2F
1.4 48 00 00 00 30
1.5 49 00 00 00 31
1.6 50 00 00 00 32
1.7 51 00 00 00 33
1.8 52 00 00 00 34

2.3 constant pool

Immediately after the version number is the constant pool entry. A constant pool, you can think of as a repository of resources in a Class file. The first u2, immediately following this version number, represents the count of the constant pool capacity (constant_pool_count), which in the Class file above is “00 14”. This capacity count starts at 1, 0x0014, which in decimal is 20. This means there are 19 constants in the constant pool. There are two main types of constants in the constant pool: literals and symbolic references. A literal is simply the value to the right of the “=” in an expression, such as int I = 3, which is a literal. To be official: literals are close to the Java language concept of constants, such as strings, constant values declared final, and so on. Symbolic references contain constants in three parts:

Fully qualified names of classes and interfaces (e.g. Java /lang/Object) field names and descriptors (e.g. Private /public) method names and descriptors (e.g. Private /public)

Next, start analyzing the contents of the constant pool. In the figure below, the purple selected parts are the contents of the constant pool.

Two tables are used to analyze the contents of a constant pool. The first table is the item flag for the constant pool

Serial number type mark describe
1 CONSTANT_Utf8_info 1 Utf-8 Encoding character string
2 CONSTANT_Integer_info 3 Integer literals
3 CONSTANT_Float_info 4 Floating point literals
4 CONSTANT_Long_info 5 Long integer literals
5 CONSTANT_Double_info 6 A double – precision floating-point literal
6 CONSTANT_Class_info 7 Symbolic reference to a class or interface
7 CONSTANT_String_info 8 String type literals
8 CONSTANT_Fieldref_info 9 Symbolic reference to a field
9 CONSTANT_Methodref_info 10 Symbolic references to methods in a class
10 CONSTANT_InterfaceMethodref_info 11 Symbolic references to methods in the interface
11 CONSTANT_NameAndType_info 12 A partial symbolic reference to a field or method
12 CONSTANT_MethodHandle_info 15 Identifies the method handle
13 CONSTANT_MethodType_info 16 Identify method types
14 CONSTANT_InvokeDynamic_info 18 Dynamic method call points

The constant pool is the most tedious data because each of the 14 constant types has its own structure, which leads to the second table: the structure table of constant entries in the constant pool

With these two tables, you can analyze the constant pool contents in the Class file based on the contents of the table. First, analyze the hexadecimal data in the Class file.

00 14 -> Constant pool capacity count, counting from 1 0a -> convert to decimal value 10, find table 1 marked 10 is CONSTANT_Methodref_info, a symbolic reference to the method in the class. CONSTANT_Methodref_info = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 Explain the index entry to the class descriptor CONSTANT_Class_info for the life method and the index entry to the field descriptor CONSTANT_NameAndType_info. 00 05 to decimal is 5 to decimal is #5 00 10 to decimal is 16 to decimal is #16 Next analysis: 07 -> to decimal value 7, find table 1 is marked with 7 is CONSTANT_Class_info, class or interface symbol reference. CONSTANT_Class_info is a set of u1 and u2 (u1 is the flag bit 07), and u2 (00 11) is the index of the fully qualified constant. 00 11 is 17 in decimal notation, written as #17.... All subsequent data in the constant pool is parsed in this manner.Copy the code

This section describes how to parse a constant in the Class file’s constant pool, but it also presents a problem. Javap -v: javap -v: javap -v: javap -v: javap -v: javap -v: javap -v: javap -v

javap -v Person.class
Copy the code

In the figure above, red boxes are the contents of the method body, and blue boxes are the contents of the constant pool. So let’s look at them separately

Start by analyzing the first method Person(), the default constructor Person(), and how the constant pool data is used when the method is executed.

Then analyze the main() method

As for the work() method, these are all basic types of data computations that involve local variables stored in a local variable table in the virtual stack. As explained above, there are no values associated with the constant pool, so we won’t analyze them here.

The constant pool mainly stores some method names, class names, return value names, return value types, etc. These data are stored in the constant pool by the VIRTUAL machine as a kind of metadata (that is, the data describing the class).

How do you tell where the constant pool ends?

The last value in the constant pool, #19, is a uFT8 type, Java /lang/Object, and we can find it in our hexadecimal Class file:See table 2 for data types of constants of type UTF8:Preceding the string is an unsigned number of type U2 describing the length of the string, followed by a flag bit. “01” is the flag bit, indicating that the next constant is a UFT8-encoded string.” 00 10″ is the length of the string, in bytes, converted to base 10, which is 16 bytes, followed by the contents of the string of length 16 utF8: Java/lang/object. This is the end of the constant pool. Next, analyze the access flags immediately following the constant pool.

2.4 Access Flags (Class Access flags)

The next u2, following the constant pool, is the access flag (access_flags), which identifies some class or interface level access information. This includes whether the Class is a Class or an interface, whether it is a Public type, and whether it is defined as an abstract type. The specific meanings of flag bits are shown in the following table

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the type is Public
ACC_FINAL 0x0010 Only the class can set whether or not to be declared final
ACC_SUPER 0x0020 Whether the new semantics of the Invokespecial bytecode instruction are allowed is true after JDK1.0.2
ACC_INTERFACE 0x0200 Flag this is an interface
ACC_ABSTRACT 0x0400 Whether it is of the abstract type. For interfaces or abstract classes, the second flag value is true and the other types are false
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code
ACC_ANNOTATION 0x2000 This is a note
ACC_ENUM 0x4000 Flag This is an enumeration

This value is in the hexadecimal Class file above, it’s “00 21”, it’s not in the table above, it should be

0 x0001 | 0 x0020 = 0 x0021 or operations, needs to be converted to binary arithmetic: 0000 0000 0001 | 0010 = 0010 0001, the result is: 00100001 convert hexadecimal: 21.Copy the code

The Person class, which is public and therefore 0x0001 is true, is compiled using a compiler after JDK1.0.2, so 0x0020 is true. The calculated value is 0x0021.

Attached table: binary and hexadecimal conversion

binary hexadecimal
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 A
1011 B
1100 C
1101 D
1110 E
1111 F

2.5 Set of Class Indexes, parent Indexes, and interface Indexes

The Class index and the parent index are u2-type data, and the interface index set is a set of U2-type data. The Class file uses these three data to determine the inheritance relationship of this Class. The class index is used to determine the fully qualified name of the class, and the superclass index is used to determine the fully qualified name of the class’s parent. In the hexadecimal Class file above, the two U2 following the access flag are the Class index and the parent index, respectively “00 02” and “00 05”, corresponding to the constants #2 and #5 in the constant pool, respectively: Person and Java /lang/Object

Java does not allow multiple inheritance, so there is only one parent class index. All Java classes except java.lang.Object have a parent class, so none of the Java classes except java.lang.Object has a parent class index of zero.

For a collection of interface indexes, the first entry, U2, is the interface counter, which represents the capacity of the index table. If the class does not implement any interface, the index counter value is 0, and the index table of the following interface does not occupy any bytes. The demo code above does not inherit any interfaces, so the first item in the interface index set, U2, has the value “00 00”. If you change the code slightly and recompile it, you will get the following result.

public abstract class Person implements Comparable{

  }
Copy the code

The resulting hexadecimal Class file is:The purple mark in the figure above is the set of interface indexes. The first one, U2: “00 01”, is an interface counter that implements one interface. Next up is U2: “00 04”, which corresponds to constant #4 in the constant pool, which is highlighted in red in the image below.

2.5 Collection of field tables

Following the interface collection is the field table, which describes the variables declared in the interface or class. Fields include class-level variables as well as instance-level variables, but do not include local variables declared inside methods. The structure of the field table is shown below:

type The name of the The number of meaning
u2 access_flags 1 Field modifier
u2 name_index 1 Simple name index of the field
u2 descriptor_index 1 Descriptor indexes for fields and methods
u2 attributes_count 1 Property sheet length
attribute_info attributes 1 Property sheet

Change the above code to add a global variable and look at the Class file

public abstract class Person implements Comparable{
   private int i;
  }
Copy the code

Compile the above code to get the result shown below

The first u2: field_count, a capacity counter, records how many field table data this class has. The second, U2, is the field access flag, which has the same meaning as the class access flag, which is clearly described in the following table.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the field is Public
ACC_PROVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Is the field final
ACC_VOLATILE 0x0040 Whether the field is volatie
ACC_TRANSIENT 0x0080 Whether the field is transient
ACC_SYNTHETIC 0x1000 Whether the field is automatically generated by the compiler
ACC_ENUM 0x4000 Field No Enum

In the above example, the field access flag is “00 02”, indicating that the field is private, followed by the simple name index that points to the constant #5 in the constant pool, followed by the method and field descriptor index that points to the constant #6 in the constant pool.

The #5 constant is a utf8 string I, which is the simple name of the field. Simple name A simple name refers to the name of a method or field that has no type or parameter. For example, the simple name of a field is I. The #6 constant is also a string I of type UTF8. What does this I mean? The function of the descriptor is to describe the data type of the field, the method parameter list (including the number, type and order), and the return value. According to the descriptor rules, Basic data types (byte, char, double, float, int, long, short, Boolean) and void types representing no returned values are represented by an uppercase character, while object types are represented by the character L plus the fully qualified name of the object. See the following table

Identification character meaning
B Basic type byte
C Base type char
D Base type double
F Base type float
I Basic int
J Base type long
S Base type shory
Z Basic type Boolean
V Special type Void
L Object types, such as Ljava/lang/Object

For array types, each dimension will be described by a leading “[” character. For example, an array defined as “java.lang.string[][]” will be denoted as “[[Ljava/lang/ string;”, and an integer array “int[]” will be denoted as “[I”.

When describing methods with descriptors, they are described in the order of the argument list followed by the return value. The argument list is enclosed in a set of parentheses in the exact order of the arguments. If the method returns “Void”, then the descriptor is “()V”, java.lang.string toString() is “() Ljava/lang/String “, int indexOf(char[]source,int SourceOffset,int sourceCount,char[]target,int targetOffset,int targetCount,int fromIndex) the descriptor is “([CII[CIII)I”.

00 01 00 02 00 05 00 06 = private int I = private int I;

The fixed data items contained in the field table end here. However, the method and field descriptor indexes are followed by a collection of property sheets to store additional information, which will be discussed later in the introduction to property sheets.

2.6 Collection of method tables

Following the field table is the collection of method tables. The description of methods in Class is almost identical to the description of fields. All are access mark + name index + descriptor index composition. The only difference among the three is that access flags, such as the volatile keyword, cannot modify methods, and method access flags are summarized in the table below.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the method is Public
ACC_PROVATE 0x0002 Whether the method is private
ACC_PROTECTED 0x0004 Whether the method is protected
ACC_STATIC 0x0008 Whether the method is static
ACC_FINAL 0x0010 Is the method final
ACC_SYNCHRONIZED 0x0020 Whether the method is synchronized
ACC_BRIDGE 0x0040 Method is a bridge method generated by the compiler
ACC_VARARGS 0x0080 Whether a method accepts an indefinite parameter
ACC_NATIVE 0x0100 Is the method native
ACC_ABSTRACT 0x0400 Is the method abstract
ACC_STRICTFP 0x0800 Whether the method is strictFP
ACC_SYNTHETIC 0x1000 Whether a method is generated automatically by the compiler

The Java Code in the method is compiled into bytecode instructions by the compiler and stored in the Code property in the method property table. The “Code” content is explained in the next section.

2.7 property sheet

Class files, field tables, and method tables can all have their own set of property tables to describe information specific to certain scenarios. The Java Virtual Machine specification defines some attributes for vm recognition, as shown in the following table. The structure of the property table:

For each attribute in the property list, there is its own structure, its name needs to reference a utF8 type constant from the constant pool to represent, and the structure of the attribute value is completely customized, only through a U4 length attribute to indicate the number of bits occupied by the attribute value. The structure of the property table is as follows:For details: Look at the following code

public class Test{ final static long m = 1L; static int n = 2; static int i = 2; public void desc(){} public int inc(){ int x; try{ x = 1; return x; }catch(Exception e){ x = 2; return x; }finally{ x = 3; return x; }}}Copy the code

Compile to a Class file:

2.7.1 Code attributes

The Code in the method body of a Java program is processed by the Javac compiler and eventually stored as bytecode instructions in the Code property. The Code attribute appears in a collection of method tables, but not all method tables must have this attribute. Methods in interfaces or abstract classes do not have the Code attribute. If a Code attribute exists in a method table, its structure looks like this:

  1. Attribute_name_index is an index to a constant of type CONSTANT_Utf8_info, which is fixed to Code.
  2. Attribute_length indicates the length of the attribute.
  3. Max_stack represents the maximum depth of the operand stack, which cannot be greater than at any point in the method’s execution.
  4. Max_loca represents the storage space required by local variables in LS. Max_locals is in slot, which is the minimum unit used by the VM to allocate memory for local variables. For byte, char, and float, int, short, Boolean, reference, return Address and length of no more than a 32-bit data types, each local variables using a slot, The two 64-bit data types, double and long, use two slots. Note that slot can be reused, and when code executes outside the scope of a local variable, the slot occupied by that local variable can be used by other local variables.
  5. Code_length and code are used to store bytecode instructions generated after compilation of Java source programs. Code_length represents the bytecode length. Code is a series of byte streams used to store bytecode instructions. Code is defined by U1, and the virtual machine knows how to interpret a bytecode, what parameters to take, and so on. The value of U1 ranges from 0 to 255, which means that there are 255 instructions.
  6. Code is somewhat like an instruction set on the CPU, such as the + sign being compiled into iADD virtual machine bytecode instructions.
  7. Code_length is represented by a U4 and has a theoretical maximum of 232-1, but the virtual machine specification limits the method to 65535.

There are three methods in this code: constructor, desc(), inc(), and the property list is recognized by analyzing the inc() method with the most code.In the figure above, the method table contents of Inc () are highlighted in purple. The Javap tool displays the constant pool and method body for analysis. The explanation is clearly marked in red on the right.

View the contents of the constant pool using Javap: the bytecode above needs to be compared with the contents of the constant pool.

View the INC method using Javap: The bytecode analysis shown in the figure above needs to be compared with the bytecode in the method. (More on bytecode in a later section)

There are a few things to note here: max_statcks, the maximum stack depth, in which a stack frame is generated for each call to an external method, is described above. Inc () calls no other methods, so the maximum stack depth is 1. Max_locals, the number of local variables for the method. Args_size = 1, the argument to a non-static method that takes this by default.

In the diagram of the structure of the property table, it can be clearly seen that U2 after code is the length of the exception table, followed by the content of the exception table. Recognize the structure of the exception table:Using Class files to analyze exception tables:The explanation is clearly indicated on the right side of the figure.

2.7.2 Exceptions properties

Exceptions are an attribute in the method table that is level with the Code attribute, unlike the exception table, which is a subordinate attribute of Code. The Exceptions attribute is used to list checked Exceptions that may be thrown by a method, that is, throws Exceptions on the post-keyword list. Its structure table is as follows:

That is, the property immediately after the Code property. The number_of_Exceptions item indicates how many exceptions the method may throw. Each exception is represented by an Exception_INDEx_TABLE item, which is an index pointing to type UTF8 in the constant pool.

2.7.3 LineNumberTable properties

The LineNumberTable property describes the mapping between Java source line numbers and bytecode line numbers. It is not a required property, and if it is not generated, the line number of the error is not displayed on the stack after an exception is raised, and breakpoints cannot be set in the source code during debugging.Line_number_table is a set of line_number_table_length and type line_number_info. The line_number_info table contains the u2 data items start_PC and line_number. The former is the bytecode line number and the latter is the Java source line number.

2.7.4 LocalVariableTable properties

LocalVariableTable is used to describe the relationship between variables in the LocalVariableTable in the frame and variables defined in the Java source code. It doesn’t have to be. It has the following structure:The local_variable_info project represents an association between a stack frame and local variables in the source code, structured as followsThe start_PC and length attributes represent the bytecode offset of the local variable’s life cycle and the length of its scope coverage, respectively, which together represent the scope of the local variable within the bytecode. Name_index and Descriptor_index are constants pointing to type UTF8 in the constant pool, representing the local variable name and the local variable descriptor, respectively. Index is the slot position of this local variable in the frame local variable table. If it is of 64-bit type, it occupies both index and index+1 slots.

2.7.5 SourceFile properties

The SourceFile property is used to record the SourceFile name of the Class file.

2.7.6 Other Attributes

There are other attributes in the property table, such as the ConstantValue attribute and InnerClasses attribute, which are not explained in detail. The way to find a Class file is to look for a steed, one table at a time, according to the specified structure of each table. See the CLASSIC JVM book, Understanding the Java Virtual Machine, for more details.

2.8 Analyze the memory of the code at runtime with the Class file structure

In the previous section, you learned about LCODER’s JVM series: Runtime data area. You learned that the JVM divides memory into stack area, heap area, method area, and so on at runtime. Now that we know the file structure of Class, we can better analyze how the code changes in memory at runtime. Take this simple code for example.

class Person{ public void sayHello(){ System.out.println("Hello"); } } public class Test{ public static void main(String[] args){ Person person = new Person(); person.sayHello(); }}Copy the code

This code is executed like this:

When mian() is executed, the stack frame of the main method is pushed into the stack area, where person is a local variable of main(), which is a reference. After executing new Person(), a Person instance is created in the heap. This instance contains the address of the Person class metadata in the method area, which holds the address of the sayHello() method and executes it.

To verify the above conclusion, execute javap -v to get the following file:

First, find the metadata of main() in the method area and find 9 lines:

As you can see, line 9 points to position #4 in the constant pool. Find #4 in the constant pool

This is a Methodref, method index, pointing to #2 and #17 #2 pointing to the simple name of class #16: Person, #17 is a NameAndType (method’s simple name and return value), pointing to #20: sayHello and #8 return value ()v.

That’s where you find Person.sayHello ().