A Class file is a set of binary streams based on 8-bit bytes. Each data item is arranged in a Class file in strict order and compact, without adding any delimiters in the middle. This makes the entire Class file store almost all the data necessary for the program to run, and there is no gap. When a data item needs to occupy more than 8 bits of space, it will be divided into several 8 bits for storage in the way of the highest order.
According to the Java Virtual Machine specification, the Class file format uses a c-like structure to store data, with only two data types: unsigned numbers and tables:
-
Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.
-
A table is a composite data type consisting of multiple unsigned numbers or other tables as data items, all of which habitually end with “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file is essentially a table.
Compile the following code to output the Class file:
public class TestClass {
private int m;
public int inc(a){
return m + 1; }}Copy the code
Install a hexdump plugin for VScode and open the Class file.
The magic number
The first four bytes of each Class file are called the Magic Number, whose sole function is to determine whether the file is a Class file acceptable to the virtual machine. Many file storage standards use magic numbers for identification. Image formats such as GIF or JPEG have magic numbers in their headers. The use of magic numbers rather than extensions for identification is mainly for security reasons, since file extensions can be changed at will. The author of the file format is free to choose the magic value, as long as the magic value has not been widely adopted and does not cause confusion. The magic value of the Class file is 0xCAFEBABE.
Second version number
The fifth and sixth bytes representing the sub-version number are 0x0000.
The major version number
The major version number has the value 0x0034, which is 53 in decimal.
Constant pool
Immediately after the version number of primary and secondary is constant pool entrance, constant pool can be understood as a Class file repository, it is most associated with other projects in the Class file structure data type, is also one of the largest data project takes a Class file space, at the same time it is also the first in the Class files of data table type projects.
Constant pool capacity counts
Since the number of constants in a constant pool is not fixed, you need to place an entry of type U2 in the constant pool, representing the constant pool capacity count (constant_pool_count). Contrary to language custom in Java, the capacity count starts at 1 instead of 0.
As shown in the figure above, the constant pool capacity is the hexadecimal number 0x0013, which is 19 in decimal, which means that there are 18 constants in the constant pool with indexes ranging from 1 to 18. At the time of the Class file format specification, the designer will be a constant 0 vacant is special consideration, the aim is to satisfy some point to the constant pool behind the index values of the data in certain circumstances to express “no reference a constant pool project”, the meaning of this kind of situation can put index value of 0. In the Class file structure, only constant pool capacity counts start at 1. For other collection types, including interface index collections, field table collections, method table collections, and so on, capacity counts start at 0, as is common practice.
There are two main types of constants in the constant pool: Literal and Symbolic References. Literals are close to the Java language level concepts of constants, such as text strings, constant values declared final, and so on. Symbolic references are a compilation principle and include the following types of constants:
- Packages exported or exposed by modules
- Fully Qualified Name of class and interface
- Field name and Descriptor
- The name and descriptor of the method
- Method handles and Method types (Method Handle, Method Type, Invoke Dynamic)
- Dynamic Call Points and Dynamic constants (Dynamically-Computed Call Site, Dynamically-Computed Constant)
Java code does not have a “wire” step for Javac compilation like C and C++, but rather dynamically wires when the virtual machine loads the Class file. That is, the final memory layout of each method or field is not stored in the Class file, so symbolic references to these fields or methods cannot be used directly by the virtual machine without running time conversion to the actual memory entry address. When the virtual machine is running, symbolic references need to be retrieved from the constant pool and then parsed and translated into specific memory addresses at class creation or runtime.
As of JDK 13, there are 17 different types of constants in the regular scale. These 17 types of tables all have a common feature. The first bit at the beginning of the table structure is a flag bit of type U1, which represents the constant type of the current constant. The specific meanings of the 17 constant types are shown in the table.
The first constant in the constant pool, whose flag bit is 0A, is of type CONSTANT_Methodref_info.
The second constant in the constant pool, the flag bit is09
, the constant belongs toCONSTANT_Fieldref_info
Type.
The third and fourth entries are both identified as 07 and belong to CONSTANT_Class_info.
Item 5 CONSTANT_Utf8_info. The number of occupied bytes is 1,6D is 109 in decimal notation, corresponding to the ASCII value M.
- ASCII table
Item 6 CONSTANT_Utf8_info. The number of bytes occupied is 1,6D is 73 in decimal notation, corresponding to the ASCII value I.
Item 7 CONSTANT_Utf8_info. The number of occupied bytes is 6.
8CONSTANT_Utf8_info
. The number of occupied bytes is 3.
Item 9 CONSTANT_Utf8_info. The number of occupied bytes is 4.
Item 10 CONSTANT_Utf8_info. The number of occupied bytes is 15.
Item 11 CONSTANT_Utf8_info. The number of occupied bytes is 3.
The twelfth itemCONSTANT_Utf8_info
. The number of occupied bytes is 3.The first ten threeCONSTANT_Utf8_info
. The number of occupied bytes is 10.
Item 14 CONSTANT_Utf8_info. The number of occupied bytes is 14.
Items 15 and 16 CONSTANT_NameAndType_info.
Item 17 CONSTANT_Utf8_info. The number of occupied bytes is 9.
18 itemsCONSTANT_Utf8_info
. The number of occupied bytes is 16.
In the JDK bin directory, Oracle has prepared a tool for analyzing Class files bytecode: Javap.
Access tokens
After the constant pool ends, the next two bytes represent access_flags, which are used to identify access information at the Class or interface level. Whether it is defined as public; Whether to define an abstract type; If it is a class, whether it is declared final; And so on. The specific flag bits and meanings are shown in the table.
There are 16 flag bits available for access_flags, of which only nine are currently defined. All unused flag bits must be zero. TestClass is a plain Java class. It is not an interface, enumeration, annotation, or module. It is decorated with the public keyword but is not declared final or abstract, and is compiled using a compiler after JDK 1.2. So its ACC_PUBLIC and ACC_SUPER flags should be true, ACC_FINAL, ACC_INTERFACE, ACC_ABSTRACT, ACC_SYNTHETIC, ACC_ANNOTATION, ACC_ENUM and ACC_MODULE should be false. So it’s access_flags value should be: 0 x0001 x0021 | 0 x0020 = 0.
A collection of class indexes, parent indexes, and interface indexes
This_class and super_class indexes are a U2-type data, while interfaces are a set of U2-type data, which are used to determine the inheritance relationship of this type in the Class file.
The class index is used to determine the fully qualified name of the class, and the superclass index is used to determine the fully qualified name of the class’s parent. Because the Java language does not allow multiple inheritance, there is only one parent class index. All Java classes except java.lang.Object have a parent class, so none of the Java classes except java.lang.Object have a parent class index of zero.
The interface index collection is used to describe the interfaces implemented by the Class. The implemented interfaces are arranged from left to right after the implements keyword (or extends keyword if the Class file represents an interface). The class index, superclass index, and interface index collections are arranged in order after the access flags. The class index and superclass index are represented by two index values of type U2, each pointing to a class descriptor constant of type CONSTANT_Class_info. A fully qualified name string defined in a constant of type CONSTANT_Utf8_info can be found by the index value in a constant of type CONSTANT_Class_info.
The three values of u2 type are 0x0003, 0x0004, and 0x0000 respectively, that is, the class index is 3, the parent index is 4, and the size of the interface index set is 0.
Set of field tables
The field table (field_info) is used to describe variables declared in an interface or class. Fields in the Java language include class-level variables as well as instance-level variables, but not local variables declared inside methods.
The modifiers that a field can include are
- Field scope (public, private, protected modifiers)
- Is it an instance variable or a class variable (static modifier), final,
- Concurrency visibility (volatile modifier, whether to force reads and writes from main memory),
- Can be serialized (transient modifier),
- Field data types (primitive types, objects, arrays),
- Field name.
Each of the modifiers in the above information is a Boolean value that either has a modifier or does not have one, which is a good place to use flag bits. The name of the field and the data type defined by the field are not fixed. They can only be described by referring to constants in the constant pool.
The field modifier is placed in the access_flags project, which is very similar to the access_flags project in the class. It is a U2 data type, and the flag bits and meanings that can be set are shown in the table.
Obviously, due to the constraints of syntax rules, only one of ACC_PUBLIC, ACC_PRIVATE, and ACC_PROTECTED flags can be selected at most. ACC_FINAL and ACC_VOLATILE cannot be selected simultaneously. The fields in the interface must have the ACC_PUBLIC, ACC_STATIC, and ACC_FINAL flags, which are caused by the Java language rules.
Following the access_flags are two index values: name_index and descriptor_index. They are references to constant pool entries that represent the simple name of the field and the field and method descriptors, respectively.
Fully qualified and simple names are easy to understand. “Java /lang/String” is the fully qualified name of the String class, simply putting the “in the full name of the class. Instead of “/”, a “; “is usually added at the end of the usage to avoid confusion between consecutive fully qualified names. A sign indicates the end of a fully qualified name. A simple name is a method or field name that has no type or parameter modification. The inc() method and m field in TestClass are simply “inc” and “m”, respectively.
Descriptors for methods and fields are more complex than fully qualified and simple names. Descriptors are used to describe the data type of a field, the parameter list of a method (including the number, type, and order), and the return value. According to the descriptor rules, basic data types (byte, char, double, float, int, long, short, Boolean) and void types representing no returned values are represented by an uppercase character, while object types are represented by the character L plus the fully qualified name of the object.
For array types, each dimension will be described by a prefacing “[” character, such that a two-dimensional array defined as” java.lang.string [][] “will be recorded as” [[Ljava/lang/String; “and an integer array” int[] “will be recorded as” [I “.
When describing methods with descriptors, they are described in the order of the argument list followed by the return value. The argument list is placed within a set of parentheses () in strict order. For example, the descriptor of void inc() is “()V”, and the descriptor of java.lang.string toString() is “()Ljava/lang/String; IntindexOf (char[]source, int sourceOffset, int sourceCount, char[]target, inttargetOffset, inttarget, Int fromIndex) has the descriptor “([CII[CIII)I”.
The first u2-type data is the capacity counter fields_count, which has a value of 0x0001, indicating that this class has only one field table data. The capacity counter is followed by the access_flags flag with a value of 0x0002, which indicates that the ACC_PRIVATE flag bit of the private modifier is true (the value of the ACC_PRIVATE flag is 0x0002) and the other modifiers are false. The value for name_index representing the field name is 0x0005. The fifth constant from the constant table listed in Listing 6-2 is a string of type CONSTANT_Utf8_info with the value “M” and the value 0x0006 for descriptor_index representing the field descriptor. The string “I” pointing to the constant pool. From this information, we can infer that the field defined by the original code is “private int m;” .
Descrip-tor_index is followed by a set of attribute tables to store additional information. Field tables can be added to attribute tables to describe zero or more items of additional information. For field M in this example, its property sheet counter is 0, meaning that there is no additional information to describe, but if you change the declaration of field M to “Final static int m=123;” , there might be an attribute called ConstantValue, whose value points to the constant 123.
The set of field tables does not list fields inherited from a parent class or interface, but it is possible to have fields that do not exist in the original Java code. For example, in order to maintain access to the external class, the compiler automatically adds fields pointing to the external class instance. In addition, fields cannot be reloaded in the Java language. Two fields must have different names, regardless of whether their data types and modifiers are the same. In Class file formats, however, fields are legally named as long as their descriptors are not identical.
Method table collection
The Class file storage format describes methods in almost exactly the same way that fields are described. The method table is structured like a field table. Includes access_flags, name_index, Descriptor_index, and Attributes, as shown in the table. The meaning of these data items is also very similar to that in the field table, with the only differences in the options for accessing the flags and property table collections.
Because volatile and TRANSIENT keywords do not modify methods, the ACC_VOLATILE and ACC_TRANSIENT flags are missing from the access flags in the method table. In contrast, synchronized, native, StrictFP and abstract keywords can modify methods. ACC_SYNCHRONIZED, ACC_NATIVE, ACC_STRICTFP and ACC_ABSTRACT flags were added to the access flags of the method table. For the method table, all flag bits and their values can be seen in the table:
Methods can be defined in terms of access flags, name indexes, and descriptor indexes, but where is the code inside the method? The Java Code in the method is compiled into bytecode instructions by the Javac compiler and stored in a property called “Code” in the method property sheet collection.
The first u2-type data (counter capacity) has a value of 0x0002, representing two methods in the collection, the instance constructor
added by the compiler and the method inc() defined in the source code. The first method has an access flag of 0x0001, that is, only the ACC_PUBLIC flag is true, the name index is 0x0007, the constant pool method is called
, the descriptor index is 0x0008, and the corresponding constant is ()V. If the value of attributes_count is 0x0001, the attribute set of this method has one attribute. The index value of the attribute name is 0x0009, and the corresponding constant is Code, indicating that this attribute is the bytecode description of the method.
Property sheet collection
Class files, field tables, and method tables can all carry their own collection of Attribute_info to describe information specific to certain scenarios.
With other data items in the Class files require strict order, length, and different content, attribute table limits set loose a bit, no longer table for each attribute has a strict sequence, and the code for the Java virtual machine allows, so long as you don’t repeat with existing attribute name anyone to realize the compiler can be to write their attributes in the table definition of attribute information, The Java virtual machine runs ignoring properties it does not recognize. In order to properly parse Class files, the Java Virtual Machine Specification originally predefined nine properties that should be recognized by all Java Virtual machine implementations. In the latest Java SE 12 version of the Java Virtual Machine Specification, the number of predefined properties has increased to 29.
For each attribute, its name is represented by a constant of type CONSTANT_Utf8_info from the constant pool, and the structure of the attribute value is completely customized, requiring only a U4 length attribute to specify the number of bits the attribute value takes. A compliant property list should satisfy the structure defined in the table.
1. The Code attribute
The Code in the method body of a Java program is processed by the Javac compiler and eventually converted into bytecode instructions stored in the Code property. The Code attribute appears in the property set of the method table, but not all method tables must have this attribute. For example, methods in interfaces or abstract classes do not have the Code attribute. If a method table has the Code attribute, its structure will look like the table.
Attribute_name_index is an index to a constant of type CONSTANT_Utf8_info. This constant value is fixed to “Code”, which represents the name of the attribute, and attribute_length, which indicates the length of the attribute value. Since the attribute name index and attribute length are six bytes, the length of the attribute value is fixed to the entire attribute sheet length minus six bytes.
Max_stack represents the maximum depth of the Operand Stack. The operand stack is never deeper than this at any point in the method’s execution. This value is used to allocate Stack depth in the Stack Frame when the VM is running.
Max_locals represents the storage space required by the local variables table. In this case, max_locals units are variable slots, which are the smallest units used by the VIRTUAL machine to allocate memory for local variables. For byte, CHAR, float, int, short, Boolean, and returnAddress data types up to 32 bits in length, each local variable occupies one variable slot, while two 64-bit data types, double and Long, require two variable slots. Method parameters (including the hidden Parameter “this” in the instance method), explicit Exception Handler parameters (Exception Handler parameters defined in the catch block of a try-catch statement), and local variables defined in the method body all depend on the local variable table. Note that max_locals is not the sum of the number of slots used in the method. The operand stack and the local variable table directly determine the amount of memory used for a stack frame of the method. Unnecessary operand stack depth and number of slots can result in wasted memory. What the Java virtual machine does is to reuse the variable slots in the local variable table. When code executes outside the scope of a local variable, the variable slots occupied by that local variable can be used by other local variables. The Javac compiler allocates variable slots to each variable based on the scope of the variable. Calculates the size of max_locals based on the maximum number and type of local variables that live simultaneously.
Code_length and code are used to store bytecode instructions generated after compilation of Java source programs. Code_length represents the bytecode length, and code is a series of byte streams used to store bytecode instructions. Since called bytecode instructions, that just as its name implies each instruction is a type of u1 single-byte, when the virtual machine, while reading to the code of a bytecode can correspond to figure out what the bytecode represents the instruction, and you can know whether to follow this instruction behind parameters, and the subsequent parameters should be how to parse. We know that the value of a U1 data type ranges from 0x00 to 0xFF, which corresponds to 0 to 255 in decimal notation, i.e., 256 instructions can be expressed. At present, the Java Virtual Machine Specification has defined the meanings of instructions corresponding to about 200 coding values. For the corresponding relationship between codes and instructions, please refer to Appendix C “Virtual Machine Bytecode Instructions Table”.
One thing to note about code_length is that although it is a length value of type U4, with a theoretical maximum of 32 powers of 2, the Java Virtual Machine Specification explicitly limits a method to 65535 bytecode instructions, meaning that it actually only uses u2’s length. If this limit is exceeded, The Javac compiler will reject the compilation. In general, it is very unlikely that you can exceed this maximum limit when writing Java code without deliberately writing a super-long method to embarrass the compiler. However, some special cases, such as when compiling a complex JSP file, where some JSP compilers merge the JSP content and the page output information into a single method, can cause the compilation to fail because the method generates too much bytecode.
The Code attribute is the most important attribute in a Class file. If the information in a Java program is divided into Code (the Java Code in the method body) and Metadata (the classes, fields, method definitions, and other information), then the entire Class file can be divided into two parts. The Code attribute is used to describe Code, and all other data items are used to describe metadata. Understanding the Code attribute is a necessary foundation for learning the bytecode execution engine content in the following two chapters. Being able to read bytecode directly is also a necessary tool and basic skill for analyzing the semantics of Java Code in work. For this purpose, the author has prepared a detailed example to explain how virtual machines use this attribute.
The maximum depth of the operand stack and the capacity of the local variable table are both 0x0001, and the space occupied by the bytecode region is 0x0005. After reading the length of the bytecode area, the VIRTUAL machine reads the next five bytes in sequence and translates the corresponding bytecode instructions according to the bytecode instruction table. The translation process of “2A B7000A B1” is as follows:
-
Read 2A, look up the table and the corresponding instruction 0x2A is ALOad_0. This instruction means to push the local variable of reference type in the 0th variable slot to the top of the operand stack.
-
Read B7, look up the table and get the corresponding instruction of 0xB7 is invokespecial. The function of this instruction is to call the instance constructor method, the private method or the method of its parent class of the object pointed to by the data of reference type at the top of the stack as the method receiver. This method has a u2 parameter specifying which method to call, which points to a constant of type CONSTANT_Methodref_info in the constant pool, which is a symbolic reference to this method.
-
Read 000A, which is the argument of the Invokespecial command, and represents a symbol reference. The constant 0x000A is the symbol reference of the instance constructor “()” method by checking the constant pool.
-
Read B1, check table 0xB1 corresponding instruction is return, meaning return from method, and return value is void. After this instruction is executed, the current method ends normally.
This bytecode is short, but you can see that the data exchanges, method calls, and so on are all stack based (operand stack). We can take a tentative guess that Java virtual machines should execute bytecode on a stack-based architecture. However, it is different from the common stack-based instruction set that has no parameters. Some instructions (such as Invokespecial) will also have parameters.
Again, we use the Javap command to calculate the bytecode instructions for another method in this Class file
If you notice the value of Args_size printed in Javap, you might wonder why Args_size is 1 when the class has two methods, instance constructor () and inc(), both of which obviously have no arguments. If there are no local variables defined either in the argument list or in the method body, what if Locals equals 1? If you’re wondering, you’re probably ignoring an unspoken rule of the Java language: within any instance method, you can access the object to which the method belongs through the “this” keyword. This access mechanism is important to Java programming, and it is simply implemented by converting access to the this keyword at compile time by the Javac compiler into access to a normal method parameter, which is then passed in automatically when the virtual machine calls the instance method. Therefore, there will be at least one local variable pointing to the current object instance in the local variable table of the instance method, and the first variable slot will be reserved in the local variable table to store the reference of the object instance, so the parameter value of the instance method starts from 1. This only works for instance methods. If the inc() method in the code is declared static, Args_size will not be equal to 1, but will be equal to 0.
Following the bytecode instructions is the collection of explicit exception handling tables (hereinafter referred to as “exception tables”) for this method. Exception tables are not required for the Code attribute, as shown in Listing 6-4, no exception tables are generated. If an exception table exists, its format is shown in Table 6-16 and contains four fields. The meanings of these fields are as follows: If an exception of type catch_type or a subclass of catch_type occurs between the bytecode line start_PC and line end_PC (excluding line end_PC), Go to the handler_PC line to continue processing. If the value of catch_type is 0, any exception must be sent to handler_PC for processing.
Exception tables are actually part of Java code, and although there are jump instructions in the bytecode originally designed to handle exceptions, the Java Virtual Machine Specification explicitly requires that the Compiler of the Java language should choose to use exception tables instead of jump instructions to implement Java exception and finally handling mechanisms.
2. Exceptions properties
The Exceptions attribute here is an attribute on a level with the Code attribute in the method table; the reader should not be confused with the exception table just described. The Exceptions attribute lists the Checked Excepitons that may be thrown by a method. They are the Exceptions listed after the throws keyword in the method description.
The number_OF_exceptions field in this property indicates that the method may throw number_OF_exceptions checked, each of which is represented by an EXCEPtion_INDEx_table item. Exception_index_table is an index to a constant of type CONSTANT_Class_info in the constant pool, representing the type of exception checked.
3. LineNumberTable properties
The LineNumberTable property describes the mapping between the Java source line number and the bytecode line number (the offset of the bytecode). It is not a required property at runtime, but is generated by default in a Class file. You can disable or require this information to be generated in Javac using the -g: None or -g: lines options. If you choose not to generate the LineNumberTable property, the main effect on the program is that when an exception is thrown, the line number of the error is not displayed on the stack, and breakpoints cannot be set from the source line when debugging the program. The structure of the LineNumberTable attribute is shown in the table.
Line_number_table is a set of line_number_table_length and type line_number_info. The line_number_info table contains two u2-type data items, start_PC, which is the bytecode line number, and line_number, which is the Java source line number.
4.LocalVariableTable and LocalVariableTypeTable properties
The LocalVariableTable attribute is used to describe the relationship between variables in a stack frame that are local to the table and variables defined in Java source code. It is also not a runtime requirement, but is generated in a Class file by default. You can use -g: None or -g: The Vars option to cancel or require this information to be generated. If this attribute is not generated, the biggest impact is that when someone else references this method, all the parameter names are lost. For example, the IDE will replace the parameter names with placeholders such as arg0 and arg1. This has no effect on the program execution, but it will make writing code very inconvenient. Moreover, parameter values cannot be obtained from the context based on parameter names during debugging. The structure of the LocalVariableTable attribute is shown in the table.
The local_variable_info item represents the association of a stack frame with a local variable in the source code, as shown in the following table.
The start_PC and length attributes represent, respectively, the bytecode offset at the beginning of the local variable’s life cycle and the length of its scope coverage, which together represent the scope of the local variable within the bytecode.
Name_index and Descriptor_index are indexes to CONSTANT_Utf8_info constants in the constant pool, representing the name of the local variable and the descriptor of the local variable, respectively. Index is the slot position of this local variable in the local variable table of the stack frame. When the variable data type is 64-bit (double and long), it occupies both index and index+1 slots.
Incidentally, after the introduction of generics in JDK 5, the LocalVariableTable property added a “sister property” — LocalVariableTypeTable. This new attribute structure is very similar to LocalVariableTable, only replacing the record’s field descriptor with the field’s Signature. For non-generic types, descriptors and signature descriptors can describe the same information, but after the introduction of generics, because the parameterized types of generics in descriptors are erased, descriptors can not accurately describe generic types. Hence the LocalVariableTypeTable property, which uses the characteristic signature of the field to complete the description of the generic.
5.SourceFile and SourceDebugExtension properties
The SourceFile attribute records the name of the SourceFile that generated the Class file. This property is also optional and can be turned off or required to generate this information using Javac’s -g: None or -g: source options. In Java, class names and file names are the same for most classes, with some exceptions (such as inner classes). If this property is not generated, when an exception is thrown, the file name of the error code is not displayed on the stack. This property is a fixed-length property with the structure shown in the table.
The sourcefile_index data item is an index to a constant of type CONSTANT_Utf8_info in the constant pool. The constant value is the file name of the sourcefile. To make it easier to add custom content for programmers to the compiler and dynamically generated classes, the SourceDebugExtension property was added in JDK5 to store additional code debugging information. A typical scenario is that during JSP file debugging, the line number of the JSP file cannot be located through the Java stack. The JSR 45 proposal provides a standard mechanism for debugging programs written in non-Java languages that need to be compiled into bytecode and run in the Java virtual machine. The SourceDebugExtension property can be used to store debugging information added to the standard. For example, allowing the programmer to quickly locate the line number in the original JSP from the exception stack. The structure of the SourceDebugExtension attribute is shown in the table.
Where debug_Extension stores additional debugging information, it is a set of strings represented in variable-length UTF-8 format. At most one SourceDebugExtension attribute is allowed in a class.
6. ConstantValue properties
The ConstantValue property notifies the VIRTUAL machine to automatically assign values to static variables. Only variables (class variables) decorated with the static keyword can use this property. Variable definitions such as “intx=123” and “static intx=123” are very common in Java programs, but the way and time that virtual machines assign to these two variables varies. Assignment to variables of non-static type (that is, instance variables) is done in the instance constructor
() method; For class variables, there are two options: in the class constructor
() method or using the ConstantValue attribute. The choice of the Javac compiler currently implemented by Oracle is that if you use both final and static to modify a variable (” constant “is more appropriate) and the variable’s data type is primitive or java.lang.String, The ConstantValue property will be generated for initialization; If the variable is not final, or is not a primitive type and string, it will be initialized in the
() method instead.
The Java Virtual Machine Specification does not require the ACC_FINAL flag to be set for fields with the ConstantValue attribute, but only for fields with the ConstantValue attribute. The requirement for the final keyword is a limitation imposed by the Javac compiler itself. The fact that ConstantValue is limited to primitive types and strings is not really a limitation, as it should be. Because the attribute value of this attribute is just the index number of a constant pool, and because the Class file format has only literals that correspond to the base attribute and string, the ConstantValue attribute cannot support any other type if it wants to. The structure of the ConstantValue attribute is shown in the table.
From the data structure, you can see that the ConstantValue attribute is a constant length attribute, and its attribute_length value must be fixed to 2. The constantvalue_index data item represents a reference to a literal constant in the constant pool, depending on the field type, Literals can be one of the constants CONSTANT_Long_info, CONSTANT_Float_info, CONSTANT_Double_info, CONSTANT_Integer_info, or CONSTANT_String_info.
7. InnerClasses attribute
The InnerClasses attribute is used to record the association between the inner class and the host class. If an inner class is defined in a class, the compiler will generate the InnerClasses attribute for it and the InnerClasses it contains. The structure of the InnerClasses attribute is shown in the table.
The data item number_OF_CLASSES represents how many inner classes information needs to be logged. Each inner class information is described by an inner_classes_info table. Table 6-25 shows the structure of the inner_classes_info table.
Inner_class_info_index and outer_class_info_index are indexes to CONSTANT_Class_info constants in the constant pool, representing symbolic references to the inner class and host class, respectively. Inner_name_index is the index pointing to a constant of type CONSTANT_Utf8_info in the constant pool. The inner_name_index represents the name of the inner class, or 0 if it is an anonymous inner class. Inner_class_access_flags is the access flag of an inner class. Similar to the access_flags of a class, the value range is shown in the table.
8.Deprecated and Synthetic attributes
The Deprecated and Synthetic attributes are both Boolean attributes of the token type. There is only a distinction between have and have not, and there is no concept of attribute value. The Deprecated attribute is used to indicate a class, field, or method that has been Deprecated by the program author and can be set in code using the “@deprecated” annotation. The Synthetic attribute indicates that the field or method is not directly generated by Java source code, but is added by the compiler itself. After JDK 5, it indicates that a class, field or method is automatically generated by the compiler, or that it can be set to access the ACC_SYNTHETIC flag bit in the Synthetic flag. The compiler achieves unauthorized access (bypassing private modifiers) or other features that circumvent language constraints by generating Synthetic methods, fields, or even entire classes that do not exist in the source code, an early optimization trick. The most typical examples are the automatically generated array of enumeration elements in enumeration classes and the Bridge Method of nested classes. All classes, methods and fields generated from non-user code should have at least one of the Synthetic attributes or ACC_SYNTHETIC flag bits set, with the exception of instance constructor “()” and class constructor “()” methods. The structure of the Deprecated and Synthetic attributes is very simple, as shown in the table.
The value of the attribute_LENGTH data item must be 0x00000000, because no attribute value needs to be set.
9. StackMapTable properties
The StackMapTable property, which was added to the Class file specification in JDK 6, is a fairly complex variable-length property in the property sheet of the Code property. This property is used by the New Type Checker during the bytecode validation phase of the virtual machine class load, which is intended to replace the performance-intensive Type derivation validator based on data flow analysis.
This type checking validator was originally developed by Sheng Liang (a Chinese member of the virtual machine team) for Java ME CLDC. On the premise of ensuring the validity of Class files, the new validator omits the step of verifying the behavior logic of bytecode through data flow analysis at runtime, and directly records a series of Verification types in the Class file at compile stage. Bytecode validation performance is greatly improved by checking these validation types instead of type derivation. This validator was first provided in JDK 6 and forced in JDK 7 to replace the bytecode validator that was originally based on type inference. The Java Virtual Machine Specification has a full 120 page description of how this validator works in Java SE 7, which uses a large and complex formulaic language to analyze and demonstrate the rigor of the new validation method.
The StackMapTable attribute contains zero StackMap frames, each of which explicitly or implicitly represents a bytecode offset that represents the validation type of the local variable table and operand Stack when executed to that bytecode. The type-checking validator determines whether a bytecode instruction complies with logical constraints by examining the local variables of the target method and the types required by the operand stack. The structure of the StackMapTable attribute is shown in the table.
In the Java Virtual Machine Specification after Java SE version 7, it was explicitly stated that for Class files with version numbers greater than or equal to 50.0, if the method does not have StackMapTable attached to the Code attribute, that means it has an implicit StackMap attribute. This StackMap attribute is equivalent to the StackMapTable attribute with number_of_entries being 0. The Code attribute of a method can have at most one StackMapTable attribute, otherwise ClassFormatError will be thrown.
The Signature attribute
The Signature property, added to the Class file specification in JDK 5, is an optional fixed-length property that can appear in the property tables of Class, field, and method table structures. After the Java language syntax was greatly enhanced in JDK 5, the generic signature of any class, interface, initializer, or member that contains Type Variable or Parameterized Type, The Signature attribute records generic Signature information for it. This attribute is used specifically to record generic types because Java language generics are pseudo-generics implemented by erasing, and all generic information compiled in bytecode attributes (type variables, parameterized types) is erased after compilation. The benefits of using erase are simple implementation (mainly changes to the Javac compiler, with few internal changes to the virtual machine), easy implementation of Backport, and runtime memory savings for some types. The downside, however, is that the runtime does not treat generic types as normal user-defined types in the same way that languages such as C# do with true generics support, such as the runtime reflection does not get generic information. The Signature attribute was added to address this shortcoming, and is now the ultimate data source for generic types that Java’s reflection API can retrieve. A more concrete example of Java generics, Signature attributes, and type erasure will be covered in Chapter 10 when we talk about compiler optimization. The structure of the Signature attribute is shown in the table.
The value of signature_index must be a valid index to the constant pool. The constant pool entry at this index must be a CONSTANT_Utf8_info structure, representing the class signature or method type signature or field type signature. This structure represents the class Signature if the current Signature attribute is a property of the class file, the method type Signature if the current Signature attribute is a property of the method table, and the field type Signature if the current Signature attribute is a property of the field table.
11. The BootstrapMethods attribute
The BootstrapMethods property, which was added to the Class file specification in JDK 7, is a complex variable-length property that resides in the property sheet of the Class file. This property is used to hold the bootstrap method qualifier referenced by the InvokeDynamic directive.
According to the Java Virtual Machine Specification (Java SE version 7), if a constant of type CONSTANT_InvokeDynamic_info has ever been present in the constant pool of a class file structure, There must be an explicit BootstrapMethods attribute in the class file’s property list, and even if a constant of type CONSTANT_InvokeDynamic_info appears multiple times in the constant pool, There can be at most one BootstrapMethods attribute in the property table of a class file. The BootstrapMethods attribute is closely related to the InvokeDynamic directive in JSR-292 and the java.lang.Invoke package, so it is important to explain how the InovkeDynamic directive works.
Although InovkeDynamic directives are already available in JDK 7, this version of the Javac compiler does not yet support InvokeDynamic directives and generate BootstrapMethods attributes, requiring some unconventional means to use them. It wasn’t until the advent of Lambda expressions and interface default methods in JDK 8 that the InvokeDynamic directive found a place in Class files generated by the Java language. The structure of the BootstrapMethods attribute is shown in the table.
The bootSTRap_method structure referenced here is shown in the table.
In the BootstrapMethods property, the num_bootSTRap_METHODS field gives the number of bootmethod qualifiers in the bootSTRap_methods [] array. Each member of the bootSTRAP_methods [] array contains an index value pointing to the constant pool CONSTANT_MethodHandle structure, which represents a bootstrap method. It also contains a (possibly empty) sequence of static arguments to the bootstrap method. Each member of the bootSTRAP_methods [] array must contain the following three things:
- Bootstrap_method_ref: The value of the bootSTRap_method_ref entry must be a valid index to the constant pool. The constant pool value at this index must be a CONSTANT_MethodHandle_info structure.
- Num_bootstrap_arguments: The value of num_bootstrap_arguments gives the number of members of the array bootstrap_Argu-ments [].
- Bootstrap_arguments [] : Each member of the bootSTRap_arguments [] array must be a valid index to the constant pool. The constant pool in this index must be one of the following structures: CONSTANT_String_info, CONSTANT_Class_info, CONSTANT_Integer_info, CONSTANT_Long_info, CONSTANT_Float_info, CONSTANT_Double_in Fo, CONSTANT_MethodHandle_info or CONSTANT_MethodType_info.
12. MethodParameters property
MethodParameters, which was added to the Class file format in JDK 8, is a variable-length property used in method tables. MethodParameters is used to record the parameter names and information for a method.
Originally, Class files did not store method parameter names by default for storage space reasons, because it makes no difference what the computer executes, so you just need to name the parameters properly in the source code. With the popularity of Java, this does cause a lot of inconvenience for program propagation and reuse. Because there are no parameter names in the Class file, it is impossible to get intelligent prompts for method calls when editing and using the methods in the package in the IDE without attaching JavaDoc. This prevents the spread of JAR packages. Later, “-g: var” became the default value that Javac and many ides use to compile classes, generating the names of method arguments into the LocalVariableTable property. The LocalVariableTable attribute is a child of the Code attribute — there is no LocalVariableTable if no method body exists, but for other cases, such as abstract methods and interface methods, it is natural to have no method body. For method signatures, Still, there’s no place to keep method parameter names intact. So this property, added in JDK 8, lets the compiler (with the -parameters parameter at compile time) also write method names into the Class file, and MethodParameters are properties of the method table, level with the Code property, and can be obtained at runtime through the reflection API. The structure of MethodParameters is shown in the table.The parameter structure referenced is shown in the table.
Where name_index is an index to the constant pool CONSTANT_Utf8_info, representing the name of the parameter. And access_flags is an indicator of the parameter’s state, which can contain one or more of the following three states:
- 0x0010 (ACC_FINAL) : Indicates that the parameter is modified by final.
- 0x1000 (ACC_SYNTHETIC) : indicates that this parameter does not appear in the source file and is automatically generated by the compiler.
- 0x8000 (ACC_MANDATED) : indicates that this parameter is implicitly defined in the source file. A typical scenario in the Java language is the this keyword.
13. Modularity-related attributes
An important feature of JDK 9 is Java modularity, because the module description file (module-info.java) is ultimately compiled into a separate Class file for storage, so, The Class file format also extends the Module, ModulePackages, and ModuleMainClass properties to support Java modularity-related functionality. The Module property is a complex, variable-length property that, in addition to representing the Module’s name, version, and flag information, stores the entire contents of the Module’s requires, exports, opens, uses, and provides definitions, as shown in the table.
Where module_name_index is an index to the constant pool CONSTANT_Utf8_info, representing the module name. Module_flags is a module status indicator, which can contain one or more of the following three states: ·0x0020 (ACC_OPEN) : Indicates that the module is open. ·0x1000 (ACC_SYNTHETIC) : indicates that the module does not appear in the source file and is automatically generated by the compiler. ·0x8000 (ACC_MANDATED) : means that this module is implicitly defined in the source file. Module_version_index is an index to the constant pool CONSTANT_Utf8_info and represents the module version number. The following properties respectively record the modules’ requires, exports, opens, uses and provides definitions. Since their structures are basically similar, the author only introduces the exports in order to save space, and the property structure is shown in the table.
Each element of the exports attribute represents a package exported by the module. Exports_index is an index to the constant pool CONSTANT_Package_info, representing packages exported by the module. Exports_flags is the status indicator for the exported package, which can contain one or more of the following two states:
-
0x1000 (ACC_SYNTHETIC) : indicates that the exported package does not appear in the source file and is automatically generated by the compiler.
-
0x8000 (ACC_MANDATED) : means that the export package is implicitly defined in the source file.
Exports_to_count is the Unqualified counter for the export package. If this counter is zero, it means that the export package is Unqualified, which means that all content in the package can be accessed by any other module. If the counter is not zero, then the following exports_to_index is an array of counter length, each array element being an index to the CONSTANT_Module_info constant in the constant pool, indicating that only modules within the array range are allowed access to the contents of the exported package.
ModulePackages is another variable length property used to support Java modularity that describes all packages in the module, whether exported or open. The structure of this property is shown in the table.
Package_count is a counter in the package_index array, where each element is an index to the CONSTANT_Package_info constant, representing a package in the current module. The last ModuleMainClass property is a fixed-length property that determines the MainClass (MainClass) of the module, and its structure is shown in the table.
Where main_class_index is an index to the constant pool CONSTANT_Class_info, representing the main class of the module.
14. Runtime annotates related properties
Back in JDK 5, the Syntax of the Java language was enhanced in several ways, including support for annotations. To store annotation information in the source code, Increased RuntimeVisibleAnnotations Class file synchronization, RuntimeInvisibleAnnotations, RuntimeVisibleParameterAnnotations and RuntimeInvisiblePar Annotations Ameter-Annotations represent four attributes. In JDK 8, the scope of Java language annotations was further enhanced with the addition of type annotations (JSR 308), So Class file synchronization also increased RuntimeVisibleTypeAnnotations and RuntimeInvisibleTypeAnnotations two attributes. Due to the six properties both structure and function are the same, so we put them together, represented by RuntimeVisibleAnnotations are introduced.
RuntimeVisibleAnnotations is a variable-length attribute, it records the class, field, or method statement on runtime visible annotations, when we use the reflection API to obtain the annotation on the class, field, or method, the return value is the picked by this property. The structure of the RuntimeVisibleAnnotations properties as shown in table.
Annotations Num_Annotations is a counter of the Annotations array. Each element in Annotations represents a run-time visible annotation stored in the Class file as an annotation structure, as shown in the table.
Type_index is an index value to the constant pool CONSTANT_Utf8_info constant, which should represent an annotation in the form of a field descriptor. Num_element_value_pairs is the counter of the element_value_pairs array. Each element in element_value_pairs is a key-value pair that represents the parameters and values of that annotation.