Let’s learn some questions.

  1. What is the maximum length of a method/field name in Java?
  2. Which literals go into the constant pool?
  3. What is the default constructor for an inner class?
  4. How is the code inside the method stored?
  5. When does the inner class get a reference to the outer class?

The transition from native machine code to bytecode as a result of code compilation is a small step in the development of storage formats, but a giant leap in the development of programming languages.

If every computer in the world had only one instruction set, x86, and only one operating system, Windows, the Java language might not have existed.

The Java virtual machine executes class files, which in turn provides language independence. The diagram below:

File-like structure brain map

A Class file is a set of binary streams in 8-byte units. Each data item is arranged in a compact and strict order in the file, without adding any delimiters in the middle. This makes the entire Class file store almost all the data necessary for the program to run, and there is no gap. When a data item needs to occupy more than 8 bytes of space, it will be divided into several 8 bytes for storage in the way of the highest order.

The Class file format uses a pseudo-structure similar to the C-language structure to store data, with only two data types: “unsigned number” and “table.” The rest of the parsing will be based on these two data types.

Unsigned number

Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.

table

A table is a composite data type consisting of multiple unsigned numbers or other tables as data items. To facilitate differentiation, all table names are customarily terminated with “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file can essentially be viewed as a table composed of data items in a strict order as shown in the following table.

Magic number and Class file version

The first four bytes of each Class file are called the Magic Number, whose sole purpose is to determine whether the file is an acceptable Class file for the virtual machine. Many file format standards, not just Class files, use magic numbers for identification. Image formats, such as GIF or JPEG, have magic numbers in their headers. The use of magic numbers rather than extensions for identification is primarily for security reasons, since file extensions can be changed at will. The author of the file format is free to choose the magic value, as long as it has not been widely adopted and does not cause confusion.

The next four bytes of the magic number store the Version number of the Class file: bytes 5 and 6 are Minor versions, and bytes 7 and 8 are Major versions.

Constant pool

Followed by the Lord, after the version number is constant pool entrance, constant pool can be compared to the Class files in the repository, it is most associated with other projects in the Class file structure of data, usually also is one of the largest data project takes a Class file space, in addition, it is also the first in the Class files of data table type projects.

Since the number of constants in a constant pool is not fixed, you need to place an entry of type U2 in the constant pool, representing the constant pool capacity count (constant_pool_count). This capacity count starts at 1 instead of 0, so the number of data items is constant_pool_count-1.

There are two main types of constants in the constant pool: Literal and Symbolic References.

Literal

Literals are close to the Java language level concepts of constants, such as text strings, constant values declared final, and so on.

Symbolic References

Symbolic references are the concept of compilation principle and include the following types of constants:

  • Packages exported or exposed by modules
  • Fully Qualified Name of class and interface
  • Field name and Descriptor
  • The name and descriptor of the method
  • Method handles and Method types (Method Handle, Method Type, Invoke Dynamic)
  • Dynamic Call Points and Dynamic constants (Dynamically-Computed Call Site, Dynamically-Computed Constant)

Java code does not have a “wire” step for Javac compilation like C and C++, but rather dynamically wires when the virtual machine loads the Class file (see Chapter 7). That is, the final layout of methods and fields in memory is not stored in the Class file, and symbolic references to these fields and methods cannot be used by the virtual machine without being translated at runtime. When the virtual machine does class loading, symbolic references are retrieved from the constant pool and parsed and translated into specific memory addresses either at class creation time or at runtime.

Each constant in the constant pool is a table, and the type of constant is shown as follows:

Each constant has its own data structure. The CONTANT_Class_info structure in the above table is as follows:

Tag is the flag bit that distinguishes constant types; Name_index is the constant pool index that points to a constant of type CONSTANT_Utf8_info in the constant pool.

Consider CONTANT_Utf8_info constants as follows:

The length value indicates the length of the UTF-8 encoded string, which is followed by a string of consecutive length bytes represented in utF-8 abbreviated encoding.

Total table of structures for 17 data types in the constant poolThat is as follows:

Constant pool UTF-8 string structure

Visit marks

After the constant pool ends, the next two bytes represent access_flags, which are used to identify some class or interface level access information. The specific flag bits and meanings are shown in the following table:

Access flag instance

Class index, parent index and interface index set

This_class and super_class indexes are a U2-type data, while interfaces are a set of U2-type data.

The class index, superclass index, and interface index collections are arranged in order after the access flags. The class index and superclass index are represented by two index values of type U2, each pointing to a class descriptor constant of type CONSTANT_Class_info. A fully qualified name string defined in a constant of type CONSTANT_Utf8_info can be found by the index value in a constant of type CONSTANT_Class_info.

For a collection of interface indexes, the first entry item of type U2 is the interface counter (interfaces_count), which represents the capacity of the index table. If the class does not implement any interface, the counter value is 0, and the index table of the following interface does not occupy any bytes.

Class index, superclass index, and interface index collection instances

5. Set of field tables

The field table (field_info) is used to describe variables declared in an interface or class. Fields in the Java language include class-level variables as well as instance-level variables, but not local variables declared inside methods.

Field table structure

  • Access_flags is a field modifier that represents the field access scope
  • Name_index indicates the index of the field name in the constant table, indicating the field name
  • Descriptor_index Index of the field descriptor in the table (indexes are all strings), indicating the field type

Field access flags (access_flags)

When non-mutually exclusive flags are present at the same time, the flag values are added. For example, if ACC_PRIVATE and ACC_STATIC exist at the same time, the flag value is 0x0002+0x0008 = 0x000A. Therefore, the sum of all flag bits must be a unique value, and there will be no conflict.

The descriptor identifies the character meaning

For the ** array type **, each dimension will be described by a prefacing “[” character, such as a two-dimensional array defined as” java.lang.string [][] “will be recorded as” [[Ljava/lang/String; “and an integer array” int[] “will be recorded as” [I “.

The set of field tables does not list fields inherited from a parent class or interface, but it is possible to have fields that do not originally exist in Java code. For example, in order to maintain access to external classes, the compiler automatically adds a field pointing to an external class instance. 民运分子

Field table structure instance

Method table set

Method table structure

  • Access_flags is a method modifier that represents the method access scope
  • Name_index indicates the index of the method name in the constant table
  • Descriptor_index Index of the method descriptor in the invariant table (indexes are all strings), representing the method parameters and return type

When describing methods with descriptors, they are described in the order of the argument list followed by the return value. The argument list is placed within a set of parentheses () in strict order. For example, the descriptor of void inc() is “()V”, and the descriptor of java.lang.string toString() is “()Ljava/lang/String; Int indexOf(char[]source, int sourceOffset, int sourceCount, char[]target, int targetOffset, int targetCount, Int fromIndex) has the descriptor “([CII[CIII)I”.

Method access flag

Method table structure instance

Corresponding to the field table collection, method information from the superclass does not appear in the method table collection if the superclass method has not been overridden in a subclass. But again, it is possible to add methods automatically by the compiler, the most common being the class constructor “()” method and the instance constructor “()” method.

Property table set

Class files, field tables, and method tables can all carry their own set of property tables to describe information specific to certain scenarios.

Attributes predefined by the VM specification

For each attribute, its name is represented by a constant of type CONSTANT_Utf8_info from the constant pool, and the structure of the attribute value is completely customized, requiring only a U4 length attribute to specify the number of bits the attribute value takes.

Property sheet structure

Code attributes

The Code in the method body of a Java program is processed by the Javac compiler and eventually converted into bytecode instructions stored in the Code property. The Code attribute appears in the property set of the method table, but not all method tables must have this attribute. For example, methods in interfaces or abstract classes do not have the Code attribute. If a method table has the Code attribute, its structure will look like the following table.

  • Attribute_name_index is an index to a constant of type CONSTANT_Utf8_info. The constant value is fixed to “Code”, which represents the attribute name of the attribute.
  • Attribute_length indicates the length of the attribute value, and since the index of the attribute name and the length of the attribute are six bytes, the length of the attribute value is fixed to the length of the entire attribute list minus six bytes.
  • Max_stack represents the maximum depth of the Operand Stack. The operand stack is never deeper than this at any point in the method’s execution. This value is used to allocate Stack depth in the Stack Frame when the VM is running.
  • Max_locals represents the storage space required by the local variables table. In this case, max_locals units are variable slots, which are the smallest units used by the VIRTUAL machine to allocate memory for local variables. For byte, CHAR, float, int, short, Boolean, and returnAddress data types up to 32 bits in length, each local variable occupies one variable slot, while two 64-bit data types, double and Long, require two variable slots.
  • Code_length indicates the bytecode length.
  • Code is a series of byte streams used to store bytecode instructions. Since called bytecode instructions, that just as its name implies each instruction is a type of u1 single-byte, when the virtual machine, while reading to the code of a bytecode can correspond to figure out what the bytecode represents the instruction, and you can know whether to follow this instruction behind parameters, and the subsequent parameters should be how to parse.

The Code attribute is the most important attribute in a Class file. If the information in a Java program is divided into two parts: ** Code (the Java Code that contains the method body) and ** Metadata (the Metadata that contains the classes, fields, method definitions, and other information). The Code attribute is used to describe Code, and all other data items are used to describe metadata.

The property table structure of the exception table

The meanings of these fields are: If an exception of type catch_type or a subclass of catch_type occurs between the bytecode line start_PC and line end_PC (excluding line end_PC), Go to the handler_PC line to continue processing. When the value of catch_type is 0 (finally), any exception needs to be referred to handler_PC.

Note: the “line” of the bytecode here is a graphic description of the bytecode offset from the beginning of the method body, not the Line number of the Java source code, the same below.

Exceptions properties

The Exceptions attribute here is an attribute on a level with the Code attribute in the method table; the reader should not be confused with the exception table just described. The Exceptions attribute lists the Checked Excepitons that may be thrown by a method. They are the Exceptions listed after the throws keyword in the method description. Its structure is shown in Table 6-17.

Table 6-17 Exceptions attribute structure

The number_OF_exceptions field in this property indicates that the method may throw number_OF_exceptions checked, each of which is represented by an EXCEPtion_INDEx_table item. Exception_index_table is an index to a constant of type CONSTANT_Class_info in the constant pool, representing the type of exception checked.

LineNumberTable properties

The LineNumberTable property describes the mapping between the Java source line number and the bytecode line number (the offset of the bytecode). You can use the -g: None or -g: lines options in Javac to cancel or require this information to be generated. If you choose not to generate the LineNumberTable property, the main effect on the program is that when an exception is thrown, the line number of the error is not displayed on the stack, and breakpoints cannot be set from the source line when debugging the program. Table 6-18 shows the structure of the LineNumberTable attribute.

Table 6-18 LineNumberTable attribute structure

Line_number_table is a set of line_number_table_length and type line_number_info. The line_number_info table contains two u2-type data items, start_PC, which is the bytecode line number, and line_number, which is the Java source line number.

LocalVariableTable and LocalVariableTypeTable properties

LocalVariableTable properties

The LocalVariableTable attribute is used to describe the relationship between the variables of the LocalVariableTable in the stack frame and the variables defined in the Java source code. You can disable or require this information to be generated using the -g: None or -g: Vars options in Javac. If this attribute is not generated, the biggest impact is that when someone else references this method, all the parameter names are lost. For example, the IDE will replace the parameter names with placeholders such as arg0 and arg1. This has no effect on the program execution, but it will make writing code very inconvenient. Moreover, parameter values cannot be obtained from the context based on parameter names during debugging. Table 6-19 shows the structure of the LocalVariableTable attribute.

Table 6-19 LocalVariableTable attribute structure

The local_variable_info item represents the association between a stack frame and a local variable in the source code, as shown in Table 6-20.

Table 6-20 Structure of the LOCAL_VARIable_INFO project

  • The start_PC and length attributes represent, respectively, the bytecode offset at the beginning of the local variable’s life cycle and the length of its scope coverage, which together represent the scope of the local variable within the bytecode.
  • Name_index: represents the name of a local variable. Index to a constant of type CONSTANT_Utf8_info in the constant pool.
  • Descriptor_index: The proxy local variable’s descriptor. Also index.
  • Index: The slot position of this local variable in the local variable table of the stack frame. When the variable data type is 64-bit (double and long), it occupies both index and index+1 slots.
LocalVariableTypeTable property (generic)

This new attribute structure is very similar to LocalVariableTable, only replacing the record’s field descriptor with the field’s Signature. For non-generic types, descriptors and signature descriptors can describe the same information, but after the introduction of generics, because the parameterized types of generics in descriptors are erased, descriptors can not accurately describe generic types. Hence the LocalVariableTypeTable property, which uses the characteristic signature of the field to complete the description of the generic.

SourceFile and SourceDebugExtension properties

SourceFile properties

The SourceFile attribute records the name of the SourceFile that generated the Class file. This property is also optional and can be turned off or required to generate this information using Javac’s -g: None or -g: source options. In Java, class names and file names are the same for most classes, with some exceptions (such as inner classes). ** If this property is not generated, when an exception is thrown, the file name of the error code will not be displayed on the stack. ** The property is a fixed-length property. Table 6-21 shows the structure of the property.

Table 6-21 Structure of the SourceFile attribute

  • The sourcefile_index data item is an index to a constant of type CONSTANT_Utf8_info in the constant pool. The constant value is the file name of the sourcefile.
SourceDebugExtension properties

To make it easier to add programmer custom content to the compiler and dynamically generated classes, the SourceDebugExtension property was added in JDK 5 ** to store additional code debugging information **. A typical scenario is that during JSP file debugging, the line number of the JSP file cannot be located through the Java stack. The JSR 45 proposal provides a standard mechanism for debugging programs written in non-Java languages that need to be compiled into bytecode and run in the Java virtual machine. The SourceDebugExtension property can be used to store debugging information added to the standard. For example, allowing the programmer to quickly locate the line number in the original JSP from the exception stack. Table 6-22 shows the structure of the SourceDebugExtension attribute.

Table 6-22 SourceDebugExtension attribute structure

  • Debug_extension stores additional debugging information, which is a set of strings represented in variable-length UTF-8 format. At most one SourceDebugExtension attribute is allowed in a class.

ConstantValue properties

The ConstantValue property notifies the VIRTUAL machine to automatically assign values to static variables. The Javac editor currently implemented by Oracle generates a ContantValue attribute for initialization only when both final and static are used to modify a base type or java.lang.String.

Table 6-23 ConstantValue attribute structure

From the data structure, you can see that the ConstantValue attribute is a constant length attribute, and its attribute_length value must be fixed to 2. The constantvalue_index data item represents a reference to a literal constant in the constant pool.

InnerClasses attribute

The InnerClasses attribute is used to record the association between the inner class and the host class. If an inner class is defined in a class, the compiler will generate the InnerClasses attribute for it and the InnerClasses it contains. Table 6-24 shows the structure of the InnerClasses attribute.

Table 6-24 InnerClasses attribute structure

The data item number_OF_CLASSES represents how many inner classes information needs to be logged. Each inner class information is described by an inner_classes_info table. Table 6-25 shows the structure of the inner_classes_info table.

Table 6-25 Inner_classes_info table structure

  • Inner_class_info_index: Index pointing to a constant of type CONSTANT_Class_info in the constant pool, representing a symbolic reference to an inner class.
  • Out_class_info_index: Index to CONSTANT_Class_info constants in the constant pool, representing symbolic references to the host class.
  • Inner_name_index: The index pointing to a constant of type CONSTANT_Utf8_info in the constant pool. This represents the name of the inner class. If the inner class is anonymous, this value is 0.
  • Inner_class_access_flags: Indicates the access flag of an inner class. The value is similar to the access_flags of a class. Table 6-26 lists the value range.

Table 6-26 INNER_class_access_FLAGS flags

Deprecated and Synthetic attributes

The Deprecated and Synthetic attributes are both Boolean attributes of the token type. There is only a distinction between have and have not, and there is no concept of attribute value.

  • Deprecated: Used to indicate that a class, field, or method has been Deprecated by the program author and can be set in code using the “@deprecated” annotation.
  • Synthetic: indicates that fields or methods are not directly generated by Java source code, but are added by the compiler itself. All classes, methods and fields generated from non-user code should have at least one of the Synthetic attributes or ACC_SYNTHETIC flag bits set, with the exception of instance constructor “()” and class constructor “()” methods.

The Deprecated and Synthetic attributes have very simple structures, as shown in Table 6-27.

The value of the attribute_LENGTH data item must be 0x00000000, because no attribute value needs to be set.

Note: Some attributes have not been written, and will be added later.