1 source

  • Source: Java Virtual Machine JVM Fault Diagnosis and Performance Optimization — Ge Yiming
  • Chapter: Chapter 9

This article is some notes of chapter 9.

2 an overview

This article mainly introduces the main composition of Class file, including magic number, version number, constant pool, access flags and so on.

3 ClassDocument overview

According to the JVM specification, a Class file can be described very rigorously as:

ClassFile{
	u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}
Copy the code

Each of these fields is described in detail below, in order.

4 the magic number

The Magic Number is a Class flag that tells the JVM that this is a Class file. The Magic Number is a 4-byte unsigned integer, fixed to 0xCAFEBABE. If a Class file does not begin with 0xCAFEBABE, the following error is thrown:

In Linux, you can use vim to open the class file. For example, to open the test. class file, run the following command:

vim -b Test.class :%! xxdCopy the code

Switch to hexadecimal to see the magic number:

5 version

The magic number is followed by the minor version and the major version number of the Class, which indicates the compile time of which the current Class file was generated. Both small and large versions take up two bytes, as shown in the following figure:

  • 0000This is the minor version number
  • 0037Is the large version number, in decimal notation55Theta is the correspondingJDK 11The compile time of the version

6 constant pool

The version number is followed by the number of constant pools and several constant pool entries:

Each constant pool entry has a tag attribute:

The mapping is as follows:

  • tag3: Indicates that the type isCONSTANT_Integer
  • tag4: Indicates that the type isCONSTANT_Float

CONSTANT_Integer, for example, has the following structure:

CONSTANT_Integer_info {
    u1 tag;
    u4 bytes;
}
Copy the code

A tag plus a four-byte unsigned integer. Most of the other types are similar and space is limited, see the JVM specification for details.

7 Access Tag

The access token uses two bytes to indicate the access information of the class, such as public/abstract, and the corresponding relationship is as follows:

  • ACC_PUBLIC:0x0001Said,publicclass
  • ACC_FINAL:0x0010, indicating whether the value isfinalclass
  • ACC_SUPER:0x0020Represents calling a method of the parent class using an enhanced method
  • ACC_INTERFACE:0x0200Indicates whether the interface is used
  • ACC_ABSTRACT:0x0400Is an abstract class
  • ACC_SYNTHETIC:0x1000, generated by compile-time class, no source corresponding
  • ACC_ANNOTATION:0x2000, indicating whether it is a comment
  • ACC_ENUM:0x4000Is an enumeration

8 Current class, parent class, and interface

The format is as follows:

u2             this_class;                                    
u2             super_class;
u2             interfaces_count;
u2             interfaces[interfaces_count];
Copy the code

Where this_class and super_class are both two-byte unsigned integers pointing to a CONSTANT_Class in the constant pool, representing the current type and its parent class. In addition, because a class can implement multiple interfaces, the index of multiple interfaces needs to be stored as an array. If no interface is implemented, the interfaces_count is 0.

9 fields

The format of the fields is as follows:

u2             fields_count;
field_info     fields[fields_count];
Copy the code

Fields_count is an unsigned 2-byte integer with the number of fields followed by the specific field information. Each field is a field_info structure as follows:

field_info {
    u2             access_flags;                         // Access tags, similar to class access tags, can represent public, private, static, and so on
    u2             name_index;                           // A two-byte integer pointing to CONSTANT_Utf8 in the constant pool
    u2             descriptor_index;                     // Also a two-byte integer, used to describe the field type and also pointing to CONSTANT_Utf8 in the constant pool
    u2             attributes_count;                     // Number of attributes
    attribute_info attributes[attributes_count];         // Attributes, such as storing initialization values, some comment information, need to use attribute_info
}

attribute_info {
    u2 attribute_name_index;                             // Attribute name, pointing to the constant pool index
    u4 attribute_length;                                 // Attribute length
    u1 info[attribute_length];                           // The information represented by the byte array
}
Copy the code

Methods 10

10.1 Basic structure of the method

The format of the method is as follows:

u2             methods_count;
method_info    methods[methods_count];
Copy the code

Each of these method_info structures represents a method:

method_info {
    u2             access_flags;                            // Access the tag, the tag method is public/private, etc
    u2             name_index;                              // The method name, an index pointing to the constant pool
    u2             descriptor_index;                        // The method descriptor, which is also an index to a constant character
    u2             attributes_count;                        // Number of attributes
    attribute_info attributes[attributes_count];            Like fields, methods can also carry attributes, a number of attributes + an array of attribute descriptions
}
Copy the code

10.2 Codeattribute

The main content of a method is stored in the attribute. The most important attribute in the attribute is Code, which stores the bytecode and other information of the method. The structure is as follows:

Code_attribute {
    u2 attribute_name_index;                      // Attribute name pointing to the constant pool index
    u4 attribute_length;                          // Attribute length, excluding the first 6 bytes (u2+u4)
    u2 max_stack;                                 // Maximum depth of operand stack
    u2 max_locals;                                // The maximum value of the local variable table
    u4 code_length;                               // Bytecode length
    u1 code[code_length];                         // The bytecode content itself
    u2 exception_table_length;                    // The exception processing table length
    {   u2 start_pc;                              // Four fields represent offsets between start_pc and end_pc
        u2 end_pc;                                // if an exception is encountered from catch_type
        u2 handler_pc;                            // The code jumps to handler_pc
        u2 catch_type;                            
    } exception_table[exception_table_length];    / / table
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}
Copy the code

The Code property itself also contains other properties to further store additional information, including:

  • LineNumberTable
  • LocalVariableTable
  • StackMapTable

10.2.1 LineNumberTable

LineNumberTable records the mapping between bytecode offsets and line numbers. The structure is as follows:

LineNumberTable_attribute {
    u2 attribute_name_index;                             // Index to the constant pool
    u4 attribute_length;                                 // Attribute length
    u2 line_number_table_length;                         // Number of entries
    {   u2 start_pc;                                     // Bytecode offset
        u2 line_number;	                                 // The line number of the bytecode offset
    } line_number_table[line_number_table_length];       // Table array, each element corresponds to a < start_PC,line_number> tuple
}
Copy the code

10.2.2 LocalVariableTable

This property, also known as the local variable table, records all local variables in a method and is structured as follows:

LocalVariableTable_attribute {
    u2 attribute_name_index;                                     // The current attribute name, pointing to the constant pool index
    u4 attribute_length;                                         // Attribute length
    u2 local_variable_table_length;                              // Local variable table entry entry
    {   u2 start_pc;                                             // Start position of the current local variable
        u2 length;                                               // The current local variable length (which can be used to calculate the end position)
        u2 name_index;                                           // The local variable name pointing to the constant pool index
        u2 descriptor_index;                                     // The type description of the local variable, pointing to the constant pool index
        u2 index;                                                // The slot of the local variable in the local variable table of the current stack frame
    } local_variable_table[local_variable_table_length];         
}
Copy the code

10.2.3 StackMapTable

StackMapTable contains StackMap Frame data, which does not contain the information required by the runtime, and is only used for type verification of Class files. The structure is as follows:

StackMapTable_attribute {
    u2              attribute_name_index;                         // Constant pool index, always "StackMapTable"
    u4              attribute_length;                             // Attribute length
    u2              number_of_entries;                            // The number of stack mapping frames
    stack_map_frame entries[number_of_entries];                   // Specific stack mapping frame
}

union stack_map_frame {                                           // Each stack mapping frame is defined as an enumerated value as follows
    same_frame;                                                   See the JVM specification for what each value means
    same_locals_1_stack_item_frame;                               / / https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.7.4
    same_locals_1_stack_item_frame_extended;
    chop_frame;
    same_frame_extended;
    append_frame;
    full_frame;
}
Copy the code

Each stack map frame is used to describe the data type of the system at a particular bytecode offset, including the type of the local variable table and the type of the operand stack.

Appendix:ASMSimple to use

ASM is a Java bytecode manipulation library that many well-known libraries rely on, such as AspectJ, CGLIB, and more. But ASM’s performance far exceeds that of high-level bytecode libraries such as CGLIB because ASM is closer to the bottom, more flexible and powerful.

Here is a simple example of using ASM to print Hello World:

package com.company;

import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.MethodVisitor;
import org.objectweb.asm.Opcodes;

public class Main extends ClassLoader implements Opcodes {
    public static void main(String[] args) throws Exception{
    	/ / create the ClassWriter, specify COMPUTE_MAXS and COMPUTE_FRAMES, respectively to calculate maximum local variables and deepest operand stack
        ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_MAXS | ClassWriter.COMPUTE_FRAMES);
        // Use ClassWriter to set the basic information of the class, such as the public access tag. The class name is Example
        cw.visit(V11,ACC_PUBLIC,"Example".null."java/lang/Object".null);
        // Generate the constructor for Example
        MethodVisitor mw = cw.visitMethod(ACC_PUBLIC ,"<init>"."()V".null.null);
        mw.visitVarInsn(ALOAD,0);
        mw.visitMethodInsn(INVOKESPECIAL,"java/lang/Object"."<init>"."()V".false);
        mw.visitInsn(RETURN);
        mw.visitMaxs(0.0);
        mw.visitEnd();

		Public static void main(String []args) and bytecode for main()
		// Require the runtime to call system.out.println () and print "Hello world" :
        mw = cw.visitMethod(ACC_PUBLIC+ACC_STATIC,"main"."([Ljava/lang/String;)V".null.null);
        mw.visitFieldInsn(GETSTATIC,"java/lang/System"."out"."Ljava/io/PrintStream;");
        mw.visitLdcInsn("Hello world!");
        mw.visitMethodInsn(INVOKEVIRTUAL,"java/io/PrintStream"."println"."(Ljava/lang/String;) V".false);
        mw.visitInsn(RETURN);
        mw.visitMaxs(0.0);
        mw.visitEnd();

		// Get the binary representation
        byte[] code = cw.toByteArray();
        Main m = new Main();
        // Load the class file into the system, call the 'main()' method through reflection, and print the resultClass<? > mainClass = m.defineClass("Example",code,0,code.length);
        mainClass.getMethods()[0].invoke(null.new Object[]{null}); }}Copy the code