Java Virtual Machine Series 1: How are Java files loaded and executed

Java Virtual Machine Series 2: Class bytecode detailed analysis

Java Virtual Machine Series 2: Runtime data area parsing

One. Start from the problem

The process of JVM learning is really understanding how Java files are loaded, compiled, and executed, with the execution being the most critical; In the process of execution, it involves data storage, instruction parsing and other key problems.

This paper mainly answers two questions:

  • How are Java files compiled and loaded?
  • Where are the instructions and how are they executed?
  • Are the instructions all the same? How is stack – based instruction different from register – based instruction?

How are Java files compiled and loaded

In order to distinguish the difference from JVM, virtual machines in Android, such as Dalvik/Art, are represented by Android Virtual Machine, or AVM for short.

1. Java file to class file

Java files are compiled into class files by javac, and we create an empty test.java

public class Test {
    
}

Copy the code

Run this command to generate a file named test. class in the current directory

javac Test.java
Copy the code

It is a structure structure of class C, including magic number, version number, constant pool, variable information, method information, etc. As for the detailed structure of the class file, I find in the 010EditorJava Virtual Machine Series 2: Class bytecode detailed analysisThe explanation is given.

2. Add the class file to the dex file

What is a class file

In a normal Java virtual machine, a class file holds the JVM’s instruction set, which can be interpreted and executed directly by the JVM by translating JVM instructions into platform-specific instructions.

What is a dex file

On the Android VIRTUAL machine AVM, class is compiled into a dex file through the tool dx, which stores the dex instruction set of a specific virtual machine, and is also interpreted and executed.

Purpose and significance of the emergence of DEX

Dex is more compact and can store more class information than a class file, reducing the number of JVM I/OS.

Dex format and description

The Android SDK provides the tool D8 (Java 8 features, fast compilation, small size, usually in the Build-tools directory of the Android SDK installation directory) to convert class files into dex files. Dex is the executable file used by VMS.

Run the following command to obtain the class.dex file in the same directory

d8 Test.class
Copy the code

From the 010Editor, we can see that it is also a class struct structure. Compared with the class file, a dex can describe multiple classes, which is more compact, so it can reduce IO times during loading.

Dex Indicates the official description of each field

How do I view the instructions contained in dex

Baksmali. jar can help us disassemble dex and generate SMALI instructions. These instructions can be interpreted and executed by AVM, and their relationship with DEX is as follows:

  • Dex A storage format that contains class information and instruction information
  • Smali a syntax for writing instructions that can be executed by Dalvik/ARK

You can view amV directives directly in Android Studio through the plugin Java2smali

A detailed explanation of the instruction set is explained in another section

How is the dex file loaded

On The Android platform, dex also completes the loading and parsing process of DEX file through the parent delegation mechanism.

The class responsible for loading the class is called ClassLoader, and its hierarchy looks like this

ClassLoader abstract class, the parent class BootClassLoader is mainly used to load classes in the AndroidFramework; BaseDexClassLoader PathClassLoader System directory dex DexCLassLoader External dex InMemoryDexClassLoader Load memory dexCopy the code

The parent delegate mechanism uses the decorator mode to solve the delegate problem. All subclasses hold the same base class ClassLoader as parent. The loaderClass of a subclass checks whether the parent class has been loaded. It involves loadDex – converting dexElements – converting dexFile – searching for classes. Each class contains the instructions to be executed by AVM. It is also a complicated process, and changes with the version.

Where are the instructions and how are they executed?

Where are the instructions?

After the dex file is loaded into memory, it is parsed into an array of Elements containing dexFile, which is used to find the class file where the instructions are stored.

How are instructions executed

When we call a method, this method in the class bytecode will be loaded into memory, and in the storage stack area opened up a space called a stack frame structure, which contains the local variable scale method is executed, the operand stack method called dynamic link and export addresses, then execute the engine will, according to the PC instruction counter point to the location of the read the instructions, Explain execution.

Let’s look at a simple Java method that generates instructions:

public int add(int a ,int b){
        a = 100;
        b= 300;
        return a+b;
    }
Copy the code

The resulting set of instructions looks straightforward:

# virtual methods
.method public add(II)I
    .registers 4
    .param p1, "a"    # I
    .param p2, "b"    # I

    .prologue
    .line 7
    const/16 p1, 0x64

    .line 8
    const/16 p2, 0x12c

    .line 9
    const/16 v0, 0x190

    return v0
.end method
Copy the code

In order of instructions, we analyze the execution flow

# virtual methods
Copy the code

Methods fall into two categories:

  • Direct method, which is a private method, is invoked with invoke-direct
  • Virtual method, non-private method, called with invoke-virtual
.method public add(II)I
.end method
Copy the code

Method definitions start and end. Method definitions start with a method, which is defined as follows

.method public/private [static][final] methodName()< type >Copy the code

Define a public method called add that takes two arguments and returns an int

.registers 4
Copy the code

Using 4 registers, the Android instruction set is register-based, and register speed is much faster than memory reading speed. This is explained in the operating system course, and you can refer to why register speed is much faster than memory

.param p1, "a"    # I 
.param p2, "b"    # I
Copy the code

Use two parameter registers p1 and p2 to store a and b, respectively. The following # is a comment, indicating that the storage content is int

.prologue
.line 7
const/16 p1, 0x64

.line 8
const/16 p2, 0x12c

.line 9
const/16 v0, 0x190
Copy the code

. Prologue indicates the start of a code segment

.line 7 Number of lines in the Java source file

Const /16 p1, 0x64 Puts a 16-bit constant in register P1 with the value 0x64(100)

Const /16 P2, 0x12c Puts a 16-bit constant in register P2 with the value 0x12c(300)

Const /16 v0, 0x190 Puts a 16-bit constant in the variable register v0 with a value of 0x190(400)

return v0
Copy the code

Returns the value in register v0

When the method returns, the address in the PC counter returns the address index for the method set to the stack frame, and pushes the current method off the stack so that the top of the stack in the virtual machine becomes the stack frame of the method’s caller, and the PC appropriately increments to the caller’s next instruction.

Stack based instruction set and register based instruction set

The JAVA Hotspot instruction stream is a stack-based instruction set architecture, while android Darlvik is a register-based instruction set architecture.

Instructions are generally designed as follows:

Opcode Operand 1... Operand n (n is usually less than 3)Copy the code

Java’s opcodes are single-byte (8-bit), so up to 256 operations are supported.

Stack-based instruction set features:

  • Simple in design and implementation, suitable for resource-constrained systems.
  • Portability is strong
  • The same code has many instructions, many memory accesses and low efficiency
  • Operands are based on stack and stack, no address instruction

Features of register-based instruction sets:

  • Design and implementation rely on hardware, can make full use of system resources
  • Poor portability
  • The same code instruction number is less, reduce memory access, high efficiency
  • Operands are placed in registers and generally contain 1-3 addresses

Here’s an example:

public void get(T t){
    int a = 10;
    int b = a + 20;
}
Copy the code

Stack-based instructions

0 bipush 10 // push constant 10 to the top of the operand stack 2 istore_2 // store the top element in the second position of the local variable table 3 iload_2 // load the second local variable, 4 bipush 20 // push constant 20 to the top of the operand stack 6 iadd // push constant 20 to the top of the operand stack one by one, perform addition 7 istore_3 // store the result to the location of local variable table 3 8 return // return the exit address stored in the stack frameCopy the code

Register-based instruction

Const /16 v0, 0xA // add-int/lit8 v1, v0, 0x14 // add-int/lit8 v1, v0, 0x14 // add-int/lit8 v1, v0, 0x14 // add-int/lit8 v1, v0, 0x14Copy the code

From the point of view of instructions, register-based instructions have less number, so IO operation is faster, and register-based instructions will use the address in the register as the operand, while stack-based instructions have no address instruction, it obtains the address through the instruction mnemonic, such as ISTore_2, ILOad_2, etc..

The end eggs

1. Why does reflection take longer than object calls?

The general use of reflection,

Class cls = Class.forName("com.hch.Test");
Object ob = cls.newInstance();
Method method = cls.getDeclaredMethod("test");
method.invoke(ob ,xxxx );
Copy the code
  • 1.Class.forName Searches for and loads bytecode from dex to memory
  • 2.cls.getDeclaredMethod

Class converts all methods into a list of ArtMethods. ArtMethod is a class that contains the instruction content of the method and the instruction heat. GetDeclaredMethod searches for and matches methods based on the method name and parameters in the list

  • 3. Method. invoke converts parameters to concrete types, and invoke does runtime permission checks internally. When reflection reaches the threshold (15), bytecode is written dynamically and loaded into memory without compiler optimization and without JIT optimization

Reflection optimization method, everyone can think of:

  • 1. Cache Class. ForName results
  • 2. Cache the result of cls.getDeclaredMethod
  • 3.method.setAccessible(true); Disable the access check

2. Why is it fast to call a method directly from an object?

This is divided into two cases depending on the method being called,

  • 1. The static, private, and built-in init methods of a class are converted directly into method bytecode addresses at compile time by static binding, so that objects call these methods almost without time. Private methods in bytecode use invoke-direct {p0, v1, v2}, LTest; – > add (II) I; Invoke-static {}, LTest; ->mStatic()V
  • 2. Other methods in a class (protected, public) need to be determined dynamically at run time, a process called dynamic linking. Dynamic methods locate bytecode addresses by looking up indexes of methods in the runtime constant pool based on the object being called. This call is slower than a static method call, but it is still much faster than a reflection lookup.

Look at a simple invocation example

public class Test { public int del(int a ,int b){ return a-b; } private int add(int a ,int b){ a = del(a , b); return a+b; } public final void mFinal(){ int a= 10; return; } public static void mStatic(){ int a= 100; return; } public void get(){ add(30 ,40); mFinal(); mStatic(); Person person = new Person(); person.print(); return ; }}Copy the code

Compiled bytecode

.class public LTest; .super Ljava/lang/Object; .source "Test.java" # direct methods .method public constructor <init>()V .registers 1 .prologue .line 1 invoke-direct {p0}, Ljava/lang/Object; -><init>()V return-void .end method .method private add(II)I .registers 4 .param p1, "a" # I .param p2, "b" # I .prologue .line 7 invoke-virtual {p0, p1, p2}, LTest; ->del(II)I move-result p1 .line 8 add-int v0, p1, p2 return v0 .end method .method public static mStatic()V .registers 1 .prologue .line 17 const/16 v0, 0x64 .line 18 .local v0, "a":I return-void .end method # virtual methods .method public del(II)I .registers 4 .param p1, "a" # I .param p2, "b" # I .prologue .line 3 sub-int v0, p1, p2 return v0 .end method .method public get()V .registers 4 .prologue .line 22 const/16 v1, 0x1e const/16 v2, 0x28 invoke-direct {p0, v1, v2}, LTest; ->add(II)I .line 23 invoke-virtual {p0}, LTest; ->mFinal()V .line 24 invoke-static {}, LTest; ->mStatic()V .line 25 new-instance v0, LPerson; invoke-direct {v0}, LPerson; -><init>()V .line 26 .local v0, "person":LPerson; invoke-virtual {v0}, LPerson; ->print()Ljava/lang/String; .line 27 return-void .end method .method public final mFinal()V .registers 2 .prologue .line 12 const/16 v0, 0xa .line 13 .local v0, "a":I return-void .end methodCopy the code

Where the get method call looks like this

.method public get()V .registers 4 .prologue .line 22 const/16 v1, 0x1e const/16 v2, 0x28 invoke-direct {p0, v1, v2}, LTest; ->add(II)I .line 23 invoke-virtual {p0}, LTest; ->mFinal()V .line 24 invoke-static {}, LTest; ->mStatic()V .line 25 new-instance v0, LPerson; invoke-direct {v0}, LPerson; -><init>()V .line 26 .local v0, "person":LPerson; invoke-virtual {v0}, LPerson; ->print()Ljava/lang/String; .line 27 return-void .end methodCopy the code

You can see that the mStatic method is invoked invoke-static, the Add private method is invoked invoke-Direct, and the mFinal method and the Print method in Person are invoked invoke-virtual.