directory

• Write it first

• Unsigned numbers and tables

• Versions of magic numbers and Class files

• constant pool

• Access flags

• A collection of class indexes, parent indexes, and interface indexes

• Collection of field tables

• Collection of method tables


• Write it first

Mentioned Java, we may first think of is the slogan of “write once, run anywhere”, it embodies the advantages of Java has nothing to do with the platform, to achieve this, the basis of the characteristic of is through the Java compiled into byte code files, virtual machine can be loaded and executed the same bytecode that is independent of the platform so as to realize the platform independence. However, if we think about it another way, since the virtual machine loads bytecode files directly, which means it doesn’t execute Java files directly, which means the virtual machine doesn’t really care what language the bytecode files are compiled into, as long as you give me the bytecode files, the virtual machine can execute them, This demonstrates the language independence of the Java virtual machine. There are many languages that can be run on the Java Virtual machine, such as Glojure, JRuby, Jython, Scala, and so on, all of which are compiled into bytecode files and executed on the Java Virtual machine. The Java virtual machine is not bound to any language, including Java. It is only associated with a specific binary file format called a “Class file,” which contains the Java virtual machine instruction set and symbol table, as well as a number of other auxiliary information. It’s worth noting that since a virtual machine can run a class file by compiling it in any language, this means that class files have a lot of mandatory preshipment and structural constraints to ensure security. Each Class file corresponds to a unique Class or interface definition information, but on the other hand, classes or interfaces do not have to be defined in a file. For example, classes or interfaces can be generated directly from Class loaders (see my other article on Class loaders).

A Class file is a set of binary streams based on 8-bit bytes. Each data item is arranged in the Class file in strict order and compact, without any delimiter added in the middle. As a result, almost all the data stored in the Class file is necessary for the program to run. When encountering data items that need to occupy more than 8-bit space, it will be divided into several 8-bit bytes according to the highest order and stored.

• Unsigned numbers and tables

According to the Java Virtual Machine specification, the Class file format uses a pseudo-structure similar to the C-language structure to store data with only two data types, unsigned numbers and tables.

Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or string values encoded in UTF-8.

A table is a data type consisting of multiple unsigned numbers or other tables as data items. All tables habitually end with “_info”. Tables are used to describe hierarchical composite structure data.

Whether it is unsigned number or table, when it is necessary to describe multiple data of the same type but with an indefinite number of data, it is often used in the form of a front-loaded capacity counter plus several consecutive data items, then this series of consecutive data of a certain type is called a set of a certain type.

The following will introduce the specific details of each data item, before this, again, the class file because there is no any separator, so whether order or number, and even data storage byte sequence such details, are strictly limited, which bytes represent what meaning, how much is the length, how order, are not allowed to change. For the class file that will be analyzed later, use Winhex to open the class file. The class file I randomly found in the previous project, you can also open a class to check and verify. Here I first post a summary, and then I will take partial screenshots of the individual data item analysis.

• Versions of magic numbers and Class files

The first four bytes of each Class file are called magic numbers. The only function is to determine whether the file is a class file acceptable to the VIRTUAL machine. The magic number is the identification of a file format. Many file storage standards use magic numbers for identification, such as image formats, such as GIF, JPEG, etc. The use of magic numbers rather than extensions for identification is mainly for security reasons, as file extensions can be changed at will. So if you want a file of your own type format, you are free to choose the magic number (not to be confused with the existing format, of course), so the magic number of the class file is 0xCAFEBABE, as shown below.

The next four bytes of the magic number store the version number of the class file, the fifth and sixth bytes are the minor version numbers, and the seventh and eighth bytes are the major version numbers. Java version numbers start at 45, and each major release of the JDK increases the major version number by 1 (JDK1.0-1.1 used 45.0 to 45.3). Older JDK versions are backward compatible with older class files. However, a later version of a class file cannot be run. Even if the class file format has not changed, the virtual machine must refuse to execute a class file that is older than its version number. I was 1.8

• constant pool

Immediately after the version number of primary and secondary is constant pool entrance, constant pool can be understood as a class file repository, it is a class file structure in other projects in the associated data types, most is also one of the largest data project takes a class file space, at the same time it is also the first in the class files of data table type projects. Since the number of constants is uncertain, a u-2 entry should be placed at the entry of the constant pool, representing the constant pool capacity. It’s worth noting that, contrary to Java custom, the capacity count starts at 1 instead of 0. See, the class file I opened has a constant pool capacity of 0x0036, which is 54 in decimal, which means there are 53 constants in the constant pool with indexes ranging from 1 to 53.

There are two main types of constants in the constant pool: literals, such as text strings and constant values declared final, and symbolic references, which are compilation concepts and include fully qualified names of classes and interfaces, field names and descriptors, and method names and descriptors. It is worth noting that Java code is dynamically wired when the CLASS file is loaded by the VIRTUAL machine during javAC compilation, so the final memory layout of each method and field is not stored in the class file. Therefore, symbolic references to these fields and methods do not get to the true memory entry address without run-time conversion. When the virtual machine runs, symbolic references need to be obtained from the constant pool, which are parsed and translated into specific memory addresses at class creation time or runtime. Each constant in the constant pool is a table, which looks something like this

We can compare the table to see the corresponding type of constant. In my example, the corresponding type is 0x0A. In decimal, the corresponding type is 10. Now that we know the type is 10, we wonder what the next four bytes (divided into two 2-bytes) represent in the table.

The rest of the constants are computed in the same way, but if you don’t like it, you can use the Javap tool (see my other article, the JDK command line tool) to output javap’s -verbose argument. The content is as follows, I will not take screenshots because it is too long, I will paste the content directly, just have a look.

C:\Program Files\Java\jdk1.8.0_191\bin>javap -verbose Main
Classfile /C:/Program Files/Java/jdk1.8.0_191/bin/Main.class
  Last modified 2019-12-19; size 965 bytes
  MD5 checksum f0e541356c7d5365134c19dd1c16e9ac
  Compiled from "Main.java"
public class Main
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #10.#32        // java/lang/Object."<init>":()V
   #2 = Class              #33            // com/dbc/leecode/Algorithm/Reclass/ListNode
   #3 = Methodref          #2.#34         // com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
   #4 = Fieldref           #2.#35         // com/dbc/leecode/Algorithm/Reclass/ListNode.next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
   #5 = Fieldref           #36.#37        // java/lang/System.out:Ljava/io/PrintStream;
   #6 = Integer            -2147483648
   #7 = Methodref          #38.#39        // com/dbc/leecode/Algorithm/Solution21_30/Solution30.divide:(II)I
   #8 = Methodref          #40.#41        // java/io/PrintStream.println:(I)V
   #9 = Class              #42            // Main
  #10 = Class              #43            // java/lang/Object
  #11 = Utf8               <init>
  #12 = Utf8               ()V
  #13 = Utf8               Code
  #14 = Utf8               LineNumberTable
  #15 = Utf8               LocalVariableTable
  #16 = Utf8               this
  #17 = Utf8               LMain;
  #18 = Utf8               main
  #19 = Utf8               ([Ljava/lang/String;)V
  #20 = Utf8               args
  #21 = Utf8               [Ljava/lang/String;
  #22 = Utf8               listNode1
  #23 = Utf8               Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
  #24 = Utf8               listNode2
  #25 = Utf8               listNode3
  #26 = Utf8               listNode4
  #27 = Utf8               listNode5
  #28 = Utf8               s
  #29 = Utf8               [I
  #30 = Utf8               SourceFile
  #31 = Utf8               Main.java
  #32 = NameAndType        #11:#12        // "<init>":()V
  #33 = Utf8               com/dbc/leecode/Algorithm/Reclass/ListNode
  #34 = NameAndType        #11:#44        // "<init>":(I)V
  #35 = NameAndType        #45:#23        // next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
  #36 = Class              #46            // java/lang/System
  #37 = NameAndType        #47:#48        // out:Ljava/io/PrintStream;
  #38 = Class              #49            // com/dbc/leecode/Algorithm/Solution21_30/Solution30
  #39 = NameAndType        #50:#51        // divide:(II)I
  #40 = Class              #52            // java/io/PrintStream
  #41 = NameAndType        #53:#44        // println:(I)V
  #42 = Utf8               Main
  #43 = Utf8               java/lang/Object
  #44 = Utf8               (I)V
  #45 = Utf8               next
  #46 = Utf8               java/lang/System
  #47 = Utf8               out
  #48 = Utf8               Ljava/io/PrintStream;
  #49 = Utf8               com/dbc/leecode/Algorithm/Solution21_30/Solution30
  #50 = Utf8               divide
  #51 = Utf8               (II)I
  #52 = Utf8               java/io/PrintStream
  #53 = Utf8               println
{
  public Main();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 9: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   LMain;

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=4, locals=7, args_size=1
         0: new           #2                  // class com/dbc/leecode/Algorithm/Reclass/ListNode
         3: dup
         4: iconst_1
         5: invokespecial #3                  // Method com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
         8: astore_1
         9: new           #2                  // class com/dbc/leecode/Algorithm/Reclass/ListNode
        12: dup
        13: iconst_2
        14: invokespecial #3                  // Method com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
        17: astore_2
        18: new           #2                  // class com/dbc/leecode/Algorithm/Reclass/ListNode
        21: dup
        22: iconst_3
        23: invokespecial #3                  // Method com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
        26: astore_3
        27: new           #2                  // class com/dbc/leecode/Algorithm/Reclass/ListNode
        30: dup
        31: iconst_4
        32: invokespecial #3                  // Method com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
        35: astore        4
        37: new           #2                  // class com/dbc/leecode/Algorithm/Reclass/ListNode
        40: dup
        41: iconst_5
        42: invokespecial #3                  // Method com/dbc/leecode/Algorithm/Reclass/ListNode."<init>":(I)V
        45: astore        5
        47: aload_1
        48: aload_2
        49: putfield      #4                  // Field com/dbc/leecode/Algorithm/Reclass/ListNode.next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
        52: aload_2
        53: aload_3
        54: putfield      #4                  // Field com/dbc/leecode/Algorithm/Reclass/ListNode.next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
        57: aload_3
        58: aload         4
        60: putfield      #4                  // Field com/dbc/leecode/Algorithm/Reclass/ListNode.next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
        63: aload         4
        65: aload         5
        67: putfield      #4                  // Field com/dbc/leecode/Algorithm/Reclass/ListNode.next:Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
        70: bipush        6
        72: newarray       int
        74: dup
        75: iconst_0
        76: iconst_1
        77: iastore
        78: dup
        79: iconst_1
        80: iconst_0
        81: iastore
        82: dup
        83: iconst_2
        84: iconst_m1
        85: iastore
        86: dup
        87: iconst_3
        88: iconst_0
        89: iastore
        90: dup
        91: iconst_4
        92: bipush        -2
        94: iastore
        95: dup
        96: iconst_5
        97: iconst_2
        98: iastore
        99: astore        6
       101: getstatic     #5                  // Field java/lang/System.out:Ljava/io/PrintStream;
       104: ldc           #6                  // int -2147483648
       106: iconst_m1
       107: invokestatic  #7                  // Method com/dbc/leecode/Algorithm/Solution21_30/Solution30.divide:(II)I
       110: invokevirtual #8                  // Method java/io/PrintStream.println:(I)V
       113: return
      LineNumberTable:
        line 11: 0
        line 12: 9
        line 13: 18
        line 14: 27
        line 15: 37
        line 16: 47
        line 17: 52
        line 18: 57
        line 19: 63
        line 21: 70
        line 22: 101
        line 23: 113
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0     114     0  args   [Ljava/lang/String;
            9     105     1 listNode1   Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
           18      96     2 listNode2   Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
           27      87     3 listNode3   Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
           37      77     4 listNode4   Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
           47      67     5 listNode5   Lcom/dbc/leecode/Algorithm/Reclass/ListNode;
          101      13     6     s   [I
}
SourceFile: "Main.java"
Copy the code

• Access flags

After the constant pool ends, the next two bytes represent the access flag, which identifies some Class or interface level access information, including whether the Class is defined as a Class or an interface, public, abstract, and final if it is a Class. For details, see the following table, which is still the corresponding table to see hexadecimal numbers.

• A collection of class indexes, parent indexes, and interface indexes

Both the class index and the parent index are u2-type data, while the interface index set is a SET of U2-type data. The class file determines the inheritance relationship of this class by the three data. The class index is used to determine the fully qualified name of the class, and the parent index is used to determine the fully qualified name of the class’s parent. Since Java cannot inherit multiple classes, there is only one parent index. All Java classes except java.lang.Object have a parent class. None of the Java classes has a parent index of 0. The interface index collection is used to describe which interfaces are implemented by the class. The implemented interfaces are listed in the interface miniature collection in the order after the implements statement. The class index, superclass index, and interface index collections are arranged in order after the access flags. The class index and superclass index are represented by two index values of type U2, each pointing to a class descriptor constant of type CONSTANT_Class_info. The fully qualified name string in a CONSTANT_Utf8_info constant defined is found by the index value in a CONSTANT_Class_info constant. The whole process is the same as the table lookup process above, so I don’t have to push it again.

• Collection of field tables

Field tables are used to describe variables declared in interfaces or classes. Fields include class-level variables and power-level variables, but do not include local variables declared in methods. What information can be contained in describing a field in Java? You can include the following information: field scope, instance variable or class variable, visibility, concurrent visibility, whether to force reads and writes from main memory, whether to serialize, field data type, and field name. All of the above information is Boolean and either has some modifier or no modifier, which is good for using flag bits. The name of the field and the data type defined by the field cannot be fixed, but can only be described as constants in the constant pool

Here to explain the fully qualified name, simple name, descriptor three concepts, the fully qualified name and the name of simple is easy to understand, the inside of the watch before I use javap results, “com/DBC/leecode/Algorithm/Reclass/ListNode” is the fully qualified name of a class, It simply replaces the “.” in the full name of the class with a “/ “. To avoid confusion between consecutive fully qualified names, a”; “is usually added at the end of the usage. Indicates that the fully qualified name ends. A simple name is a method or field name that has no type or parameter modification. In this case, the inc () method and m field are simply “inc” and “m”, respectively. Descriptors for methods and fields are more complex than fully qualified and simple names. Descriptors are used to describe the data type of the field, the parameter list of the method, and the return value, depending on the meaning of the descriptor character.

Here are some special types, array types, where each dimension will be described by a prefixed “[” character. For example, a two-dimensional array defined as” java.lang.String[][] “will be recorded as” [[Ljava/lang/String “, An integer array “int[]” will be marked as “[I”. Methods are described in the same order as the argument list followed by the return value. The arguments in the argument list are placed in a set of parentheses “()”. The descriptor for the java.lang.string toString() method is “()Ljava/lang/String”, IndexOf (char[]source, int sourceOffset, int sourceCount, char[]target, int tarOffset, int targetCount Int fromIndex) is described as “([CII[CIII)I”

• Collection of method tables

The method table is named like the field table, but I won’t talk about it here.