Toc

Class file structure

Before I wrote an essay about the importance of class files, and explains the composition of the class files from the macroscopic Angle, the article through train (www.juejin.im/post/684490)…

In this article we’ll take a closer look at the class file, starting with what the bytecode looks like.

The following analysis of bytecode will focus on this simple example. All the bytecodes start with cafe Babe. JAVA is a big fan of coffee, just like I love Big Powers

It is the combination of class files and the JVM that allows code written in any language to run on the JVM as long as it is compiled into a class file that the JVM can recognize. This makes the JAVA language, and all languages that run on the JVM, platform-independent. The JVM can also become language independent, with the class file’s job being to teach the JVM how and what to run.

A class file is a set of hexadecimal streams based on 8-bit bytes, without any delimiters in between. It is because the class files are streaming, without any separators so class files inside the number of data items in the order and the above is strictly limited, the meaning of each byte, length, sequence, are not allowed to change, because the JVM is the length of the above, the information such as order to translate the class files, Once you know what is one set of information and what symbols are another set of information, look at the class file design to make it clearer.

The class file uses a pseudo-structure similar to the C language structure to store data. The class file contains virtual machine instructions, symbol tables, and other auxiliary information, or the table contains all the contents of any class file.

A brief introduction to the contents of the class file structure:

Two data types: unsigned numbers and tables

  1. Unsigned numbers are basic data types (there are also basic data types in Java classes). Unsigned numbers such as U1, U2,u4,u8 represent 1 byte, 2 byte,4 byte, or 8 byte, and can be used to describe numbers, index references, quantity values, or string values.
  2. Table is like object reference type in Java class. Object attribute can be basic data type (corresponding to U1,U2 unsigned number), or other object (corresponding to other table). Parameter entities in Java project usually end with “_Param” (class file tables are used to end with “_info”).
  3. The order in the figure above is the order strictly required by the Class file;
  4. Each counter is used to describe the number of data in the table. For example, the method counter is methods_count, which means methods_count in the method-info table.

JAVA code is infinitely variable, and yet everything is summed up in a single table? Do not understand this form, oath improper programmer!! Hopefully not in the face.

Class magic number and version

The first four bytes of each Class file are called the Magic Number, whose sole function is to determine whether the file is a Class file acceptable to the virtual machine. Value :0xCAFEBABE(coffee babe)

The four bytes following the magic number are the Version numbers of the Class file: bytes 5-6 are Minor versions and bytes 7-8 are Major versions.

J2SE 6.0 = 50 (0x32 HEX) J2SE 5.0 = 49 (0x31 hex) JDK 1.4 = 48 (0x30 hex) JDK 1.2 = 46 (0x2E hex) JDK 1.1 = 45 (0x2D hex)Copy the code

This is the JDK version number in hexadecimal, 34 in hexadecimal is 52 in decimal, which corresponds to JDk1.8. Older JDK versions are backward compatible with older class files, but cannot run older class files.

Constant pool

Constant pool represent the Class in the file storage resources, followed by the version number is the constant pool after primary and secondary entrances, due to the constant pool in constant data is not fixed, therefore in the constant pool entrance placed a u2 types of data, count value represents the constant pool capacity, starting from 1, the bytecode is 0 x002d (namely the decimal 45, The index value ranges from 1 to 44, and the 0th item is left blank. This is done to satisfy the purpose that some subsequent data pointing to the constant pool index value needs to be represented without referring to any constant pool item.

There are two main types of constants in the constant pool: literals; Symbolic reference.

  1. Literals are close to the Concepts of constants at the Java language level, such as text strings, constant values declared final, etc.
  2. Symbolic references contain three types of constants:
  • Fully qualified names of classes and interfaces

org.springframework….. Bean

  • The name and descriptor of the field

private/public/protected

  • The name and descriptor of the method

private/public/protected

For example, 10 represents the symbolic reference of a method in the class. In our bytecode screenshot, its symbol bit is 0x0A, which corresponds to 10 in the table, that is, a constant of this type represents the symbolic reference of a method in a class.

Each item has its own structure, mainly literals, and symbolic references to fields, classes, and interface methods. Everyone has a hand in it.

Structure of CONSTANT_Methodref_info with flag bit 10

Type Name Quantity U1 tag 1 U2 name_index 1 U2 name_index 1Copy the code

Name_index is the index in the figure, which is an index value representing the fully qualified name of the class or interface. In the bytecode, name_index has two U values 0x0009 (decimal value 9) and 0x001D (decimal value 29), respectively. According to the table, they are class descriptors pointing to declared methods and indexes pointing to name and type descriptors respectively.

Then the bytecode is 0x09, which refers to the symbol of field reference Fieldref. The structure is the same as that of CONSTANT_Methodref_info, and the contents and indexes of all 44 constants can be calculated in turn.

Here’s a look at some of the other scenarios using Javap: javap-verbose TestJVM

Classfile /Users/zengzhiqin/Desktop/daima/leetcode/out/production/leetcode/TestJVM.class Last modified 2020-9-20; size 731 bytes MD5 checksum 73a774d54f51805cb2319a2133c47c04 Compiled from "TestJVM.java" public class TestJVM minor version: 0 major version: 52 flags: ACC_PUBLIC, ACC_SUPER Constant pool: #1 = Methodref #9.#29 // java/lang/Object."<init>":()V #2 = Fieldref #5.#30 // TestJVM.a:I #3 = Fieldref #5.#31 // TestJVM.b:I #4 = Fieldref #32.#33 // java/lang/System.out:Ljava/io/PrintStream; #5 = Class #34 // TestJVM #6 = Methodref #5.#29 // TestJVM."<init>":()V #7 = Methodref #5.#35 // TestJVM.multi:()I #8 = Methodref #36.#37 // java/io/PrintStream.println:(I)V #9 = Class #38 // java/lang/Object #10 = Utf8 a #11 = Utf8 I #12 =  Utf8 b #13 = Utf8 <init> #14 = Utf8 ()V #15 = Utf8 Code #16 = Utf8 LineNumberTable #17 = Utf8 LocalVariableTable #18 = Utf8 this #19 = Utf8 LTestJVM; #20 = Utf8 add #21 = Utf8 ()I #22 = Utf8 multi #23 = Utf8 main #24 = Utf8 ([Ljava/lang/String; )V #25 = Utf8 args #26 = Utf8 [Ljava/lang/String;  #27 = Utf8 SourceFile #28 = Utf8 TestJVM.java #29 = NameAndType #13:#14 // "<init>":()V #30 = NameAndType #10:#11 // a:I #31 = NameAndType #12:#11 // b:I #32 = Class #39 // java/lang/System #33 = NameAndType #40:#41 // out:Ljava/io/PrintStream;  #34 = Utf8 TestJVM #35 = NameAndType #22:#21 // multi:()I #36 = Class #42 // java/io/PrintStream #37 = NameAndType #43:#44 // println:(I)V #38 = Utf8 java/lang/Object #39 = Utf8 java/lang/System #40 = Utf8 out #41 = Utf8 Ljava/io/PrintStream; #42 = Utf8 java/io/PrintStream #43 = Utf8 println #44 = Utf8 (I)VCopy the code

We can see a lot of I, V, init,LineNumberTable, and other things that we don’t understand and don’t see in code, referenced by fields, methods, and properties. It is used to describe something unnamable, something that is inconvenient to be represented with fixed bytes, such as what the return value of a method is, how many parameters there are, what type of each parameter is, etc., that is, these uncertainties need to be expressed by symbolic references to the regular table.

When a method is added, four constants are added to the constant pool; The same goes for adding fields:

  1. A symbolic reference to the CONSTANT_Methodref_info method
  2. A partial symbolic reference to the CONSTANT_NameAndType_info method to which the method symbol reference points
  3. Method name
  4. Method descriptor

Access tokens

The next two bytes after the constant pool represent access_flags, which identify some Class or interface level access information, including whether the Class is a Class or an interface, public, abstract, final, and so on. Flag bits and their meanings are as follows:

TestJVM this class simply by public decoration, so other sign of all is false, eventually access_flags should be 0 x0001 | 0 x0020 = 0 x0021, bytecode is indeed the value content.

A collection of class indexes, parent indexes, and interface indexes

After the access flag, the class index (this), the superclass index (super), and the set of interfaces are sorted in order. The Class file uses these three items to determine the integration relationship of the Class.

The class index and the parent index refer to two index values of type U2, each pointing to a class descriptor constant of type CONSTANT_Class_info. Find the class by finding the fully qualified name string defined in a constant of type CONSTANT_Utf8_info by the index value in a constant of type CONSTANT_Class_info.

  1. Both the class index and the parent index are u2 type data.

JAVAP (); JAVAP (); JAVAP (); JAVAP ()

  1. The first entry in the interface index collection entry is the u2-type interface counter (interfaces_count) that represents the capacity of the index table (i.e. how many interfaces are implemented). If the class does not implement any interface, the counter value is 0 and the following interface index table does not take any bytes, 0x0000, because the class does not implement any interface.

Set of field tables

I’m tired of writing. Amway gives readers a song by Little A Qi called “No Man”. It sounds very good

Following the set of interface indexes is the field counter: used to identify how many fields there are, followed by the set of field tables. The field table (field_info) is used to describe variables declared in an interface or class.

Fields include class-level variables as well as instance-level variables. Information that can be included is:

  1. Field scope (public, private, protected modifiers)
  2. Instance variable or class variable (static modifier)
  3. Variability (final)
  4. Concurrency visibility (volatile)
  5. Can be serialized (TRANSIENT)
  6. Field data types (primitives, objects, arrays)
  7. The field names

Each modifier is a Boolean, either present or absent, which can be represented by a flag bit; However, the name of the field and the type of the field defined are not fixed, so you can only refer to constants in the constant pool. From the content information of the fields, the following field table structure is abstracted:Name_index and Descriptor_index are references to the constant pool, representing the simple name of the field and the field and method descriptors, respectively, ofFully qualified names, simple names, and descriptors:

  1. Fully qualified name

The fully qualified name of the ai/yunxi/ VM /TestClasss class, which is simply the “. I’m going to replace it with /

  1. The simple name

Methods or field names without type or parameter modifications such as add() and int m are simply added and m

  1. The descriptor

The data type used to describe the field, the parameter list (number, type, and order) of the method, and the return value

Field access flag

In the bytecode, 0x0002 indicates that the field is private, and the bytecode represents as follows:

Select * from fields_count; select * from fields_count; U2 0x0002 private is true, #11 indicates the constant pool string I, u2 0x000B descriptor_index, u2 0x000B descriptor_index, This descriptor Identifier Character Meaning Identifies Basic type int U2 0x0000 Attribute_count Attribute list set has no attributes, 0 indicates no additional description information Attribute_info has no content, no bytesCopy the code

The basic type of the above identifier is I, that is, the basic type of the corresponding table int

Private int A; private int A; Consistent with the source code.

Method table collection

After understanding the field table, the method table structure is almost the same as the field table structure, through accessing the symbol, name index, descriptor index can clearly express the definition of the method. After all, some modifiers can modify methods but not fields, and some modifiers can modify fields but methods do not.

Overload a method:

  1. Have the same simple name as the original method
  2. To have a different signature from the original method (signature is a method of each parameter in the constant pool of field symbol reference set, because the return value is not in the signature, so the return value is different as overload condition)

U2 0x0004 -> Add (),multi(),main(), constructor (); #14 = ()v; v = void (); () = void (); The constructor u2 0x000e descriptor_index (#15 refers to Code, u2 0x0001 descriptor_index) is used to store some additional information Attribute_info 0x000f An instruction seen by JAVAP pointing to #15, corresponding to the constant "Code", indicating that this attribute is a bytecode description of the methodCopy the code

The first method:

Property sheet collection

For the better part of a year, we’re just talking about fields, method headers and all of that stuff can be expressed in terms of access flags, name indexes, method descriptors, all of that metadata, so where’s the body of the method? This is about the property table out of the mountain!

YanLiJian friends may have already speak field table and method table, found out the property sheet, used to describe some of the scenes of the proprietary information, and other data about above items, other data project requirement strict order, length and content, attribute table limit is grazing condition, does not require the table for each attribute has the strict order, Any compiler that implements a property can write its own property information to the property sheet, as long as it does not duplicate an existing property name, and the JVM will run ignoring any that it does not recognize.

The Code in the method body of Java program is compiled and processed by Javac and eventually becomes bytecode instructions, which are stored in the Code attribute, which appears in the attribute set of the method table. But not all method tables have Code attributes, such as abstract classes or interfaces.

The structure of the Code property table is shown below:

  1. Attribute_name_index to CONSTANT_Utf8_info constant is fixed to “Code”
  2. Attribute_length Indicates the total length of an attribute value
  3. Max_stack represents the maximum depth of Operand Stacks
  4. Max_locals represents the unit of storage space represented by the local variable :Slot
  5. Code_length and code are used to store bytecode instructions generated after compilation of Java source programs, codelength represents bytecode length, and code is a series of byte streams used to store bytecode instructions. Bytecode instruction, meaning of each instruction bytecode, whether parameter is required, is a single byte of type U1, the value range is 0x000 XFF, namely zero255, which can represent a total of 256 instructions, and currently the JVM specification defines about 200 instructions.

There are many attributes, and the JAVA Virtual Machine specification has 21 predefined ones, all of which we can see in our daily lives

Again, javap-verbose TestJVM displays all the remaining instructions, and you can see the method description and invocation

{public int b; descriptor: I flags: ACC_PUBLIC public TestJVM(); descriptor: ()V flags: ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: aload_0 5: iconst_3 6: putfield #2 // Field a:I 9: aload_0 10: iconst_4 11: putfield #3 // Field b:I 14: return LineNumberTable: line 5: 0 line 6: 4 line 7: 9 LocalVariableTable: Start Length Slot Name Signature 0 15 0 this LTestJVM; public int add(); descriptor: ()I flags: ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: getfield #2 // Field a:I 4: aload_0 5: getfield #3 // Field b:I 8: iadd 9: ireturn LineNumberTable: line 10: 0 LocalVariableTable: Start Length Slot Name Signature 0 10 0 this LTestJVM; public int multi(); descriptor: ()I flags: ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: getfield #2 // Field a:I 4: aload_0 5: getfield #3 // Field b:I 8: imul 9: ireturn LineNumberTable: line 14: 0 LocalVariableTable: Start Length Slot Name Signature 0 10 0 this LTestJVM; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC Code: stack=3, locals=1, args_size=1 0: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream; 3: new #5 // class TestJVM 6: dup 7: invokespecial #6 // Method "<init>":()V 10: invokevirtual #7 // Method multi:()I 13: invokevirtual #8 // Method java/io/PrintStream.println:(I)V 16: return LineNumberTable: line 18: 0 line 20: 16 LocalVariableTable: Start Length Slot Name Signature 0 17 0 args [Ljava/lang/String; } SourceFile: "TestJVM.java"Copy the code

Args_size is 1, but neither the instance constructor nor the add(), multi() methods take arguments. The reason for this is that: In any instance method we know we can call this.method(). This calls the object to which the method belongs. This is implemented by the Javac compiler by converting the access to this keyword into the access to a normal method parameter at compile time. Virtual machine and then call instance method was introduced to this parameter automatically, therefore in the local variables of instance methods at least there is a pointer to the current object instance local variables, local variables will be reserved first slot to hold the object instance references, other natural sidelined method parameters calculated starting from 1.

Bytecode analysis, starting with the method property table location above:Attribute_name_index is a constant index pointing to CONSTANT_UTF8_INFO. The constant value is fixed to Code, which represents the attribute name of the attribute.

The exception table of the property table

Take a look at a simple code with exception syntax:

/** * @author by zengzhiqin * 2020-09-13 */ public class TestException { public int inc() { int x; try { x = 1; return x; } catch (Exception e) { x = 2; return x; } finally { x= 3; }}}Copy the code

(Lines 0 to 4 of the bytecode assign certificate 1 to variable X, and make a copy of the value of x into the last slot of the local variable table. The value in this slot will be reread to the top of the operation stack before iReturn is executed as the method return value. This slot is represented by returnValue) :

0: iconst_0 // constant 0 pressed onto the operand 1: istore_2 // pop the top element of the operand stack and save it to the second location of the local variable table 2: ILoAD_0 // pressed onto the top of the operand 3: ILoAD_1 // pressed onto the top of the operand stack 4: Iadd // Add the first two ints in the operand stack and push the result to the top of the operand stack 5: istore_2 // pop the top element of the operand stack and save it to position 2 of the local variator 6: iload_2 // load the second variable of the local variator to the top of the operand stack 7: Ireturn // returns 8: aload // loads an object reference to the top of the operand stack from the corresponding position in the local variable tableCopy the code

Above are explanations of some of the instructions needed

Zengzhiqin @ cengzhiqindeMacBook - Pro  ~ / Desktop/daima/leetcode/SRC  javap -c TestException Compiled the from "TestException.java" public class TestException { public TestException(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public int inc(); Code: 0: iconst_1 // try x=1,1 on the operand stack 1: istore_1 // store 1 from the operand stack to the first location of the local variable table,x=1 2: iload_1 // load the first location of the local variable table element to the top of the operand stack 3: Istore_2 // pop the top element 1 in the local variable table and save it to position 2 in the local variable table 4: iconst_3 // finally block x=3, push 3 into the operand stack 5: istore_1 // pop the top element 3 in the local variable table and save it to position 1 in the local variable table 6: Iload_2 // Put the second value of the variable table 1 on the top of the stack, ready to return iRETURN 7: IRETURN // Normally return 1 correct match ~ 8: Astore_2 // Assign to Exception e defined in catch, stored in slot 2 9: iconst_2 // catch x=2, 2, and push onto operand stack 10: Istore_1 // pop 2 at the top of the stack and save to slot 1 11: ILoAD_1 // pop 2 at the top of the stack and save to slot 1 12: istore_3 // pop 2 at the top of the stack and save to slot 3: Iconst_3 // finally x=3, push 3 into the operand stack 14: istore_1 // put 3 into the first position of the local variable table, ready to return to ireturn 15: iload_3 // load the value of the third position of the local variable table 2 to the top of the stack 16: 17: astore 4 // If there is an Exception that does not belong to java.lang.Exception or its subclasses, go here 19: Iconst_3 // finally block x=3, push 3 into the operand stack 20: istore_1 // store 3 in the first position of the local variable table 21: aload 4 // put the exception reference at the top of the stack and throw 23: Athrow // throw Exception table: from to target type 0 4 8 Class java/lang/Exception 0 4 17 any 8 13 17 any 17 19 17 any }Copy the code

As a preliminary guess, Java virtual machines execute bytecode in a stack-based architecture, as I explained in my last post.

In the process of exception execution, finally code block will be copied on all normal and abnormal paths. In this bytecode, iconst_3 is corresponding to finally code block, and there are three copies. Therefore, even if there is a return statement in the try or catch code block, Finally, the finally block returns 1. If the finally block returns X, the finally block returns 3.

We can see the exception table and summarize the exception table structure:

  • The operand done by lines 0-4 of bytecode is to assign the integer 1 to the variable x
  • If no exception occurs at this point, it continues to lines 5-7
  • If an exception occurs, the PC register pointer goes to line 8
  • If any exception occurs on lines 0-4, jump to line 17
  • If any exception occurs on lines 8-13, jump to line 17
  • If any exception occurs on lines 17-19, jump to line 17

As you can see, exception tables are actually part of JAVA code, and the compiler uses exception tables instead of simple commands to implement JAVA exceptions and finally handling mechanisms.

Exceptions are the most commonly used at ordinary times, other properties you are interested to understand also, wrote this lovely here is really tired, and you is a eighty percent chance to jump directly to see me this sentence, passing the little brothers with some praise, I’m number code path of forrest gump forrest gump, interested can look at me ah ~ ~ ~