An overview of the

Why do you need to know about the Dex file

After understanding the Dex file, I can have a deeper understanding of some problems encountered in daily development. For example: APK slimming, hotfix, plug-in, application reinforcement, Android reverse engineering, 64 K method number limit.

What is a Dex file

Before you understand what a Dex file is, take a look at the JVM, Dalvik, and ART. The JVM is a JAVA virtual machine used to run JAVA bytecode programs. Dalvik is a runtime environment designed by Google for Android platform, which is suitable for systems with limited memory and processor speed in mobile environment. ART, or Android Runtime, is a new Android Runtime environment designed by Google to replace Dalvik and released in Android 4.4. ART performs better than Dalvik. Android programs are generally developed using Java language, but Dalvik VM does not support the direct execution of Java bytecode, so it will translate, reconstruct, interpret and compress the compiled.class files. This process is processed by DX. After the processing is complete, the generated product will end with.dex and is called dex file. Dex file format is a compression format designed for Dalvik. Dex files are the product of many.class files that can be executed in the Android runtime environment.

How is the Dex file generated

The process of converting Java code into DEX file is shown in the figure below. Of course, the real process is not so simple. Here is just a visual display:

Note: The picture is from the Internet

Now let’s use a simple example to convert Java code to a dex file.

From the Java to the class

Let’s create a hello.java file and write some simple code for analysis. The code is as follows:

public class Hello {
    private String helloString = "hello! youzan";

    public static void main(String[] args) {
        Hello hello = new Hello();
        hello.fun(hello.helloString);
    }

    public void fun(String a) { System.out.println(a); }}Copy the code

Compile the Java file using the JDK’s Javac in its sibling directory.

javac Hello
Copy the code

After the javac command is executed, a hello. class file is generated in the current directory. The hello. class file can be directly executed on the JVM. Use the use command to execute the file.

java Hello
Copy the code

Execution should print “Hello! Youzan”

It is also possible to disassemble the hello. class file by executing javap commands.

javap -c Hello
Copy the code

The result is as follows:

public class Hello {
  public Hello();
    Code:
       0: aload_0
       1: invokespecial #1 // Method java/lang/Object."
      
       ":()V
      
       4: aload_0
       5: ldc           #2 // String hello! youzan
       7: putfield      #3 // Field helloString:Ljava/lang/String;
      10: return

  public static void main(java.lang.String[]);
    Code:
       0: new           #4 // class Hello
       3: dup
       4: invokespecial #5 // Method "
      
       ":()V
      
       7: astore_1
       8: aload_1
       9: aload_1
      10: getfield      #3 // Field helloString:Ljava/lang/String;
      13: invokevirtual #6 // Method fun:(Ljava/lang/String;) V
      16: return

  public void fun(java.lang.String);
    Code:
       0: getstatic     #7 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: aload_1
       4: invokevirtual #8 // Method java/io/PrintStream.println:(Ljava/lang/String;) V
       7: return
}
Copy the code

Code is followed by specific instructions for the JVM to execute. You can refer to the official JAVA documentation for the specific meaning of this directive.

From the class to the dex

The.class files generated above are ready to run in the JVM, but there is a special processing required to run in the Android runtime: the DX processing, which translates, reconstructs, interprets, compresses, and so on.

The dx.jar file is located in your SDK root directory, build-tools, or any version of your SDK. Use the dx tool to process the hello. class file generated above. In the hello. class directory, use the following command:

dx --dex --output=Hello.dex Hello.class
Copy the code

After the command is executed, a hello. dex file is generated in the current directory. The.dex file can be executed directly in the Android runtime environment, usually through the PathClassLoader to load the dex file. Now perform dexdump naming in the current directory to decompile:

dexdump -d Hello.dex
Copy the code

The result is as follows (the meanings of some areas are described below) :

Processing 'Hello.dex'. Opened'Hello.dex', DEX version '035'------ here is the information about the hello. Java Class written ------ Class# 0 -
  Class descriptor  : 'LHello; '
  Access flags      : 0x0001 (PUBLIC)
  Superclass        : 'Ljava/lang/Object; '
  Interfaces        -
  Static fields     -
  Instance fields   -
    #0 : (in LHello;)
      name          : 'helloString'
      type          : 'Ljava/lang/String; 'Access: 0x0002 (PRIVATE) ------ The following field describes the constructor information. Numbers like 7010 0400 0100 1a00 0b00 are instructions translated from the code in the method. Dalvik uses a 16-bit code unit, so this is a group of four digits, each in hexadecimal. Invoke-direct These are the mnemonics corresponding to previous instructions and represent the actual operation of those instructions. If interested in these instructions can go to https://source.android.com/devices/tech/dalvik/instruction-formats to view -- -- -- -- -- -- Direct the methods#0 : (in LHello;)
      name          : '<init>'Method name: this is the obvious constructortype          : '()V'- method prototypes, () said the ginseng, behind () said the return value, V represents the void - access: 0 x10001 (PUBLIC CONSTRUCTOR) - method access type code - registers: 2 -- Number of registers used by a method -- ins: 1 -- Method input parameters. Methods take a special parameter by default in addition to the parameters we defined -- outs: 1 insns size: August 16 - bit code units - size - 000148 orders, | [000148] Hello. < init > : (V) : 000158 7010 0400 0100 | 0000: invoke-direct {v1}, Ljava/lang/Object; .<init>:()V // method@0004 00015e: 1a00 0b00 |0003: const-string v0,"hello! youzan"// string@000b 000162: 5b10 0000 |0005: iput-object v0, v1, LHello; .helloString:Ljava/lang/String; // field@0000 000166: 0e00 |0007:return-void
      catches       : (none)
      positions     :
        0x0000 line=1
        0x0003 line=2
      locals        :
        0x0000 - 0x0008 reg=1 this LHello;

    #1 : (in LHello;)
      name          : 'main'
      type          : '([Ljava/lang/String;)V'
      access        : 0x0009 (PUBLIC STATIC)
      code          -
      registers     : 3
      ins           : 1
      outs          : 2
      insns size    : 11 16-bit code units
000168:                                        |[000168] Hello.main:([Ljava/lang/String;)V
000178: 2200 0000                              |0000: new-instance v0, LHello; // type@0000 00017c: 7010 0000 0000 |0002: invoke-direct {v0}, LHello; .<init>:()V // method@0000 000182: 5401 0000 |0005: iget-object v1, v0, LHello; .helloString:Ljava/lang/String; // field@0000 000186: 6e20 0100 1000 |0007: invoke-virtual {v0, v1}, LHello; .fun:(Ljava/lang/String;) V // method@0001 00018c: 0e00 |000a:return-void
      catches       : (none)
      positions     :
        0x0000 line=5
        0x0005 line=6
        0x000a line=7
      locals        :

  Virtual methods   -
    #0 : (in LHello;)
      name          : 'fun'
      type          : '(Ljava/lang/String;) V'access : 0x0001 (PUBLIC) code - registers : 3 ins : 2 outs : 2 insns size : 6 16-bit code units 000190: |[000190] Hello.fun:(Ljava/lang/String;) V 0001a0: 6200 0100 |0000: sget-object v0, Ljava/lang/System; .out:Ljava/io/PrintStream; // field@0001 0001a4: 6e20 0300 2000 |0002: invoke-virtual {v0, v2}, Ljava/io/PrintStream; .println:(Ljava/lang/String;) V // method@0003 0001aa: 0e00 |0005:return-void
      catches       : (none)
      positions     :
        0x0000 line=10
        0x0005 line=11
      locals        :
        0x0000 - 0x0006 reg=1 this LHello;

  source_file_idx   : 1 (Hello.java)
Copy the code

At this point, you have completed the transformation of the Java code into a Dalvik executable file, the dex.

Dex File format

Now let’s analyze the specific formats of Dex files. Just like MP3, MP4, JPG, PNG files, Dex files also have their own formats. Only by following these formats can they be correctly recognized by the Android runtime environment.

The overall layout of the Dex file is as follows:

The following will be a brief introduction to the file header, index area and class definition area respectively. Other areas can be found on the Android website.

The file header

The header area determines how the file is read. The format is shown in the following table (the order in which files are arranged is the order in the table below) :

Id area

The ID area stores the offset of the real data of string, type, prototype, field and method resources in the file. We can find the real data corresponding to the ID according to the offset of the ID area.

String ID field

The block is a list of offsets, each of which corresponds to a real string resource, each of which is 32 bits. We can find the corresponding actual string data by the offset. The specific format is as follows:

Type the id area

This block is a list of indexes whose values correspond to an item in the list of offsets for the string ID field. Data format is as follows:

Method prototype ID field

This block is a list of method prototype ids in the format:

Member id area

This block stores a list of prototype ids in the format:

Methods to id

This block stores a list of method ids in the format: this block stores a list of prototype ids in the format:

The class definition area

This section stores a list of class definitions with the following data structure:

Tool for parsing dex files

010 Editor, a tool that can parse dex files, is recommended. It allows us to understand the format of dex file more clearly through the preset template.

Application of Dex file in Android Tinker hotfix

Among the current mainstream Android hotfix solutions, Tinker has the advantages of free, open source and large number of users, so Youzan also builds Android hotfix services based on Tinker. The main principle of Tinker hot repair is to generate patch packages by comparing the dex files of the old APK and the dex files of the new APK, and then synthesize the new DEX files through the patch packages and the dex files of the old APK in APP. The process is as follows:

Note: The image is from Tinker website

Generation of patch packages

Tinker officially uses its own synthetic solution, DexDiff. Based on the Dex file format, it has the advantages of small patch package and small memory consumption. In the DexDiff algorithm, Dex files are divided into different block classes according to the format of Dex files, as shown in the following figure:

private void readHeader(Dex.Section headerIn) throws UnsupportedEncodingException {
 byte[] magic = headerIn.readByteArray(8); 
 int apiTarget = DexFormat.magicToApi(magic);
 checksum = headerIn.readInt(); 
 signature = headerIn.readByteArray(20);
 fileSize = headerIn.readInt();
 int headerSize = headerIn.readInt();
 int endianTag = headerIn.readInt();
 linkSize = headerIn.readInt();
 linkOff = headerIn.readInt();
 mapList.off = headerIn.readInt();
 stringIds.size = headerIn.readInt();
 stringIds.off = headerIn.readInt();
 typeIds.size = headerIn.readInt();
 typeIds.off = headerIn.readInt();
 protoIds.size = headerIn.readInt();
 protoIds.off = headerIn.readInt();
 fieldIds.size = headerIn.readInt();
 fieldIds.off = headerIn.readInt();
 methodIds.size = headerIn.readInt();
 methodIds.off = headerIn.readInt();
 classDefs.size = headerIn.readInt();
 classDefs.off = headerIn.readInt();
 dataSize = headerIn.readInt();
 dataOff = headerIn.readInt();
}
Copy the code

After reading the specific data of each block of the old and new Dex files from the file, the patch package can be compared and generated. Because the data structure of each block is inconsistent, each block has its own diFF algorithm to deal with patch generation and synthesis of each block. The algorithm list is as follows:

Patch composition

After receiving the patch file, the client uses the same reading method to convert the old Dex file to the relevant data structure, and then uses the operation instructions in the patch package to modify the old Dex data and generate the new Dex data. Finally, the data is written into the file and the new Dex file is generated. In this way, the patch is synthesized.

Write in the last

There is nothing particularly in-depth in this article, nor is there a complete description of the dex file format. Mainly to share a dex file structure, and some practical applications. Let everyone encounter related problems in the future, you can have some direction to understand the DEX file, and then solve the problem. Finally, if you have any suggestions or comments, welcome feedback.

The resources

  • Android Official Information
  • Tinker introduction
  • Dalvik versus Java bytecode