A class file is a binary stream based on 8-byte units, and the data items are sorted in strict and compact order, with no extra separators in between. Data items are collections of classes, field tables, method tables, property tables, and so on

The class file format uses a pseudo-structure similar to the STRUCTURE of C language to store data. There are two main types: unsigned numbers and tables. Unsigned numbers use U1, U2, u4, and U8 to represent 1 byte, 2 byte, 4 byte, and 8 byte respectively, and their values are used to indicate how many data items will follow. A table is a compound data type that has multiple unsigned numbers or other tables as data items and traditionally ends with “_info”.

The byte order of the data store is represented by the big-endian method: the most significant byte is first, and the corresponding small-endian method, but these do not affect our understanding of the structure of the class file, just mentioned here.

The following examples use software tools: jdk8, notepad++ (including the Hex_Editor plug-in), javap, and bytecode-viewer. Throughout this article, we will use the following demo.

Whether it is an unsigned number or a table, when it is necessary to describe multiple data of the same type but with an indefinite number of data, it is often used in the form of a front-loaded capacity counter plus a number of consecutive data items, which is to call these consecutive columns of a certain type of data as a certain type of set. This is actually pretty straightforward, for example, if you have multiple methods in a class, and you have a data structure describing the methods of that class, you want to have a field to count how many methods there are, and then you have a description of each method. This field is the unsigned number above.

The above figure lists all the data items that constitute the description of all the meta-information of a class.

Magic numbers and versions

The first four bytes of each class file are called the magic number, and the four bytes following the magic number are the version numbers of the class file. The version numbers are shown below, and the fifth and sixth bytes are the second version numbers

The hexadecimal 34 is converted to decimal, which is 52, the main version number of jdK8.

Constant pool

The constant pool mainly stores two types of constants: literals and symbolic references. Literals can be understood as constant concepts at the Java language level, such as string constant “ABnDDD”. Symbolic references include class constants, fully qualified names of classes and interfaces, field names and descriptors, and method names and descriptors.

Constants are marked by an unsigned number, such as the flag bit 1, which represents the UTF-8 encoded string CONSTANT_utf8_info.

Just after the primary and secondary version numbers is the entry to the constant pool, which can be interpreted as a repository of class files, such as strings containing class names, method names, and so on. The number of constants in the constant pool is not fixed. You need to place a U2-type data at the entry to indicate how many constants there are. This value starts at 1, and 0 is used as a special case to indicate that no constant pool data items need to be referenced.

As shown, there are 13 constants (converted to 19 in decimal).

Here is an intuitive look at the decompiled class file information, as shown below

The constant pool here is preceded by a number that represents the index of the constant in the constant pool, and the index can be understood as the index of the array. This index will be referenced frequently in subsequent parses, so I’ll mention it first.

The common structure types of constant pools are as follows. Each constant structure will be analyzed in detail later.

1. Now parse the first constant from the constant pool:

For the first flag bit 0x0A (decimal bit 10), look at the constant structure of flag bit 10:

The first item is the index to the constant pool, of type U1, and the second item is the index to the constant pool, of type 0x0004 (4 in decimal).

Point to CONSTANT_Class_info fully qualified class name, value of org/javasoft/clazz/TestClass.

The third item is of type U2 and refers to 0x000F (15 in decimal), which is the index to the constant CONSTANT_NameAndType. According to the constant structure, the second item of the CONSTANT_NameAndType constant is the index to the constant item of the field or method name, and the third item is the index to the constant item of the field or method descriptor. As you can see from the figure, the constant CONSTANT_NameAndType refers to the seventh and eighth constants in the constant pool, which will be known when the seventh and eighth constants are resolved.

2. Next, parse the second constant from the constant pool:

In essence, the method of interpretation is the same, which will not be repeated here. The second constant is CONSTANT_Feild_info.

3. Next, parse the third constant from the constant pool:

The third constant is CONSTANT_Class_info.

4. Parse the fourth constant from the constant pool:

The length of the string is 0x0001, which is one byte long, 0x6d, and the content is “m”. Methods, fields, and so on in Class files need to refer to CONSTANTS_Utf8_info constants to describe their names

I’m not going to break it down here, but I’m going to draw the 18 constants, and I’m going to differentiate between the rest of the constant pool and the rest of the constant pool, which ends at 0xb0.

Class, superclass, and interface index collection

A Class has access flags, such as public and final. The Class file structure uses two bytes to represent access flags, which are used to identify some access information at the Class or interface level, including: whether the Class is a Class or interface; Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final, etc.

Next, continue parsing:

0x21, which means false except for ACC_PUBLIC and ACC_SUPER. Recall that our class is defined as “public class TestClazz”.

If you describe a class definition, you need to describe the fully qualified name of the class, the fully qualified name of the parent class, and the implementation of the interface, there is only one parent class, that Java only single inheritance, and there are multiple interfaces, so the description of the interface needs to use a collection

  • Class index: type U2, used to determine the fully qualified name of this class
  • Parent index: type U2, used to determine the fully qualified name of the parent of this class (all Java classes except java.lang.Object have a parent index that is not 0)
  • Interface index: A set of u2-type data describing which interfaces this class implements. The first entry (type U2) represents the capacity of the index table

Explain the method, as mentioned above.

Set of field tables

Field tables are used to describe variables declared in interfaces or classes (including class variables and instance variables, but not local variables declared inside methods). Field description: Field scope (public, protected, and private modifiers), whether an instance variable or a class variable (static modifiers), variability (final), concurrent visibility (volatile modifiers), serialization (transient modifiers), and field data type (primitive types) , object, data), field name. Each modifier is a Boolean; Field data types and names cannot be fixed and can only be described by referring to constants from the constant pool

Field tables refer to property tables, which are used to describe the contents of fields, more on that below.

Above is the field table access flag, similar to the class access flag.

The descriptor

The data type used to describe the field, the argument list (including quantity, type, and order) of the method, and the return value the basic and void types are all represented by a capital letter, and the object is represented by L with a fully qualified name.

  • Array representation: Each dimension is represented by a prefixed “[“, such as “java.lang.string [][]” recorded as “[[Ljava/lang/String” and an integer array “int[]” recorded as “[I”

  • Method representation: Void inc(), for example, records “()V”, int indexOf(char[] source, int sourceOffset, int sourceCount,char[] target, int targetOffset, int targetCount,int fromIndex)

Recorded as “([CII [CIII) I ‘

To put it bluntly, descriptors are meant to save space.

Continue parsing:

Private int m;

At the Java language level, the same name of two fields is illegal, but in a bytecode file, the same name of two fields is legal as long as their descriptors do not match.

Method table collection

The structure is identical to the description of the field table

Continue to parse:

As with field tables, if a subclass does not overwrite a parent method, the parent’s method information is not reincluded in the subclass’s method table collection.

At this point, we can’t help but wonder:

Methods can be defined by access flags, name indexes, and descriptor indexes, but where is the code inside the method? The Java Code in the method is compiled by the compiler into bytecode instructions and stored in a property called “Code” in the method property sheet collection.

Property sheet collection

Class files, field tables, and method tables can carry their own set of property tables to describe information specific to certain scenarios. For each attribute, the name refers to a constant of type CONSTANT_utf8_info from the constant pool. The structure of the attribute value is completely customized, with only a u4 length attribute specifying the number of bits occupied by the attribute value.

The property table structure is as follows:

The Java Virtual Machine specification predefined 23 attributes:

The most important one, and the one we’re going to talk about, is the Code property sheet

Code attributes

The Code in the Java method body is processed by the JavAC compiler and eventually becomes bytecode instructions stored in the Code attribute. The interface and abstract class methods do not have the Code attribute. If the information in a program is divided into Code (the Java Code inside the method) and metadata (the classes, fields, method definitions, and other information), the Code attribute is used to describe the Code in the Class file, and all other data items are used to describe the metadata.

  • Attribute_name_index: indicates the index to CONSTANTS_utf8_info constant. The constant value is Code, which indicates the attribute name of the attribute

  • Attribute_length: Indicates the value length of the attribute. Since the index and length of the attribute account for 6 bytes, the length of the attribute value is fixed to the entire attribute list length minus 6 bytes

  • Max_stack: specifies the maximum operand stack depth. The operand stack depth does not exceed this value at any time when the method is executed. This value is used to allocate the operation stack depth in the VM running frame

  • Max_locals: The storage space required by the local variable table, in Slot (the minimum unit used by the VIRTUAL machine to allocate memory for local variables). Slots in the local variable table can be reused when code execution is outside the scope of a local variable. The Javac compiler assigns slots for use by variables based on scope, and then calculates the size of max_locals

  • Code_length: indicates the byte code length

  • Code: A series of byte streams of bytecode instructions, one of which is of type U1 and ranges 0x00 0xFF (0 255)

  • Exception_table_length: Indicates the length of the abnormal table

  • Exceptin_table: abnormal table

Continue to parse:

  • Attribute_name_index: 0x0009 points to Code
  • Attribute_length: 0x0000001D, which is 29 in decimal
  • max_stack: 0x0001
  • max_locals:0x0001
  • code_length:0x00000005
  • Code: 0x2AB70001B1, bytecode instruction
  • Exception_table_length :0x0000, there is no exception information

The rest of the property sheets are parsed one by one in this way, but the most important Code property sheets are shown here.

conclusion

The Class file structure is not difficult to understand. We know the constants in the Class file structure, and then we can parse them one by one according to the Class file structure for Class, method, field arrangement order, and then according to the counter and index, against the structure type diagram of the constant. At this point, you should have an overview of the class file structure. So what’s the benefit of knowing the class file structure.

Understanding the structure of class files is an important prerequisite for understanding the class loading process.