Java Virtual Machine Series 1: How are Java files loaded and executed

Java Virtual Machine Series 2: Class bytecode detailed analysis

Java Virtual Machine Series 2: Runtime data area parsing

One. Start from the problem

  • What is class bytecode?
  • How is bytecode executed by the JVM?
  • What are common classes, member variables, methods, local variables?
  • How do methods call other methods?
  • How does the method return?
  • What’s the difference between a stack – based virtual machine and a register – based virtual machine?

The entire analysis is based on class bytecode, which is quite different from The Android dex bytecode. Class bytecodes are organized by classes, whereas dex is a collection of classes

What is class bytecode?

Bytecode is a set of structures defined by the JVM specification to describe the content of the class. Since its description is platform independent of the instruction set, as long as it can translate its own language into bytecode, it can be executed by the virtual machine, such as Java, Groovy, Kotlin, etc. The Hotpot Virtual machine is a standard implementation of the JVM specification.

1. Here’s the simplest example

Seeing is believing, let’s define a test.java class

public class Test {
    static String a = "hucaihua";
}
Copy the code

When you compile the test.class and look at its contents in the 010Editor, you can see that it is binary content organized in bytes, which means that it is essentially a binary stream stored in 01 strings. .

In this case, the software is represented in hexadecimal for the sake of demonstration. A hexadecimal system needs to be represented by four bits of binary, so each piece of data separated here represents one byte eight bits.

CA FE BA BE 00 00 00 37 00 14 0A 00 05 00 0F 08
00 10 09 00 04 00 11 07 00 12 07 00 13 01 00 01
61 01 00 12 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53
74 72 69 6E 67 3B 01 00 06 3C 69 6E 69 74 3E 01
00 03 28 29 56 01 00 04 43 6F 64 65 01 00 0F 4C
69 6E 65 4E 75 6D 62 65 72 54 61 62 6C 65 01 00
08 3C 63 6C 69 6E 69 74 3E 01 00 0A 53 6F 75 72
63 65 46 69 6C 65 01 00 09 54 65 73 74 2E 6A 61
76 61 0C 00 08 00 09 01 00 08 68 75 63 61 69 68
75 61 0C 00 06 00 07 01 00 04 54 65 73 74 01 00
10 6A 61 76 61 2F 6C 61 6E 67 2F 4F 62 6A 65 63
74 00 21 00 04 00 05 00 00 00 01 00 08 00 06 00
07 00 00 00 02 00 01 00 08 00 09 00 01 00 0A 00
00 00 1D 00 01 00 01 00 00 00 05 2A B7 00 01 B1
00 00 00 01 00 0B 00 00 00 06 00 01 00 00 00 01
00 08 00 0C 00 09 00 01 00 0A 00 00 00 1E 00 01
00 00 00 00 00 06 12 02 B3 00 03 B1 00 00 00 01
00 0B 00 00 00 06 00 01 00 00 00 02 00 01 00 0D
00 00 00 02 00 0E
Copy the code

The official description of the class structure

We know that a class is a formatted binary stream, that is, its binary stream represents the content in a format specified by the JVM vm specification. In the JVM, class is defined as follows:

ClassFile { u4 magic; // Magic number, fixed value 0xCAFEBABE u2 minor_version; // Version number u2 major_version; // Main version number u2 constant_pool_count; Cp_info constant_pool[constant_pool_count-1]; cp_info constant_pool[constant_pool_count-1]; // Constant pool contents u2 access_flags; //class access identifier u2 this_class; // Current class constant index u2 super_class; // Superclass constant index U2 interfaces_count; // The number of interfaces u2 interfaces[interfaces_count]; U2 fields_count; Field_info fields[fields_count]; U2 methods_count; Method_info methods[methods_count]; // Method content u2 attributes_count; // Number of attributes attribute_info attributes[attributes_count]; // Attribute content}Copy the code

U4 and U2 are description fields of fixed length. U4 is 4 bytes long,u2 is 2 bytes long.

Cp_info, field_info, method_info, and attribute_info are structs, each of which has a separate definition and varies in length.

2. Description of key structures

2.1 CP_info (Description of 18 constant types)

Cp_info is the largest block in class in terms of bytes. It defines 18 constant types in different constructs, each of which uses a tag to indicate its specific type.

The following figure analyzes the actual content in CONSTANT_Class_info. The other content is found the same way.

2.2 field_info (Field Description)

The definition of field_info is as follows:

field_info { u2 access_flags; // Access tag u2 name_index; // Name in the constant index U2 descriptor_index; // The index u2 attributes_count described in constants; // Number of attributes attribute_info attributes[attributes_count]; // Attribute list}Copy the code
  • Access_flags access tokens that includes the following nine classes, commonly used such as public, static, private, protected, etc. :
Flag Name	Value	Interpretation
ACC_PUBLIC	0x0001	Declared public; may be accessed from outside its package.
ACC_PRIVATE	0x0002	Declared private; usable only within the defining class.
ACC_PROTECTED	0x0004	Declared protected; may be accessed within subclasses.
ACC_STATIC	0x0008	Declared static.
ACC_FINAL	0x0010	Declared final; never directly assigned to after object construction (JLS §17.5).
ACC_VOLATILE	0x0040	Declared volatile; cannot be cached.
ACC_TRANSIENT	0x0080	Declared transient; not written or read by a persistent object manager.
ACC_SYNTHETIC	0x1000	Declared synthetic; not present in the source code.
ACC_ENUM	0x4000	Declared as an element of an enum.
Copy the code

2.3 Attribute_info

Attribute description information can be used to describe ClassFile, field_info,method_info, and cod_attribute.

Common things like generics, annotations, etc. fall under attribute_info.

It is defined as follows:

attribute_info { u2 attribute_name_index; / / attribute names such as < Signature > said generics, < RuntimeVisibleAnnotations > said annotation and u4 attribute_length; // Attribute length u1 info[attribute_length]; // Attribute information}Copy the code

The following figure uses an example to illustrate properties

2.4 method_info (Method Description)

The method description is defined as field_info and will not be repeated:

method_info {
    u2             access_flags;
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}
Copy the code

The biggest difference between a method description (method_info) and a field description (field_info) is that attribute_info is different,

All method_info contains an attribute_info called, which contains the instruction information for the method.

The following figure shows the instruction information in the GET method we defined