This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details

Class file structure in-depth parsing

What is a class file

A file format that can be recognized, loaded, and executed by the JVM is similar to the MP3 format, which can only be used in certain places. For example, mp3 files can only be played, whereas class files need to be loaded by the JVM.

The Java language is not the only language that can generate class files. There are other languages as well:

How do I generate a class file

Class files are generated automatically through the IDE and calSS files are executed through run

Generate class files through javac and execute class files through Java commands

The class file

Record all the information in a class file, remember all the information, including class names, methods, variables, etc., far more information is recorded in class than in Java source code. For example, we don’t define keywords like this and super in the class, but we can use them because the Java virtual machine recorded them in the class file.

Class file structure

An 8-bit binary file

Each data in close order, no gap, do not want some files, for easy reading, do not let the data close arrangement. The nice thing about class files is that they are small.

Each class or interface occupies a separate class file

The class file format uses a struct-like structure to store data. This structure has only two types of data: unsigned columns and tables. Unsigned numbers are the basic data types, u1, U2, U4, and U8, representing one byte, 2,4, and 8 bytes respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or utF-8 string values. Tables are compound data structures composed of multiple unsigned numbers or other tables, all of which end in _info. Tables are used to describe hierarchical compound structured data types. (But I think it’s more like a structure)

Let’s look at the structure

type The name of the The number of role
U4 (unsigned four bytes) magic 1 Encrypted segment, whether the current CALSS file is string modified
u2 minor_version 1 At least that version of the JDK can be loaded
u2 major_version 1 Calss is currently generated for a different version
u2 constant_pool_count 1 The number of constant pools, usually only one
cp_info constant_pool constant_poll_count – 1 A real constant pool, which contains a lot of stuff inside, as explained below
u2 access_flags 1 Scope flags, such as whether the class file is of public or public final type, etc. I’ll put a picture at the bottom
u2 this_class 1 This, the JVM added this field to the class when it was generated. Is that why we can use it without defining it
u2 super_class 1 Same as this above
u2 interfaces_count 1 The number of
u2 interfaces interfaces_count These two indicate how many interfaces are implemented by the current class. Note: only the direct ones are counted, not the indirect ones, such as interfaces implemented by the parent class are not counted.
u2 fields_count 1 The number of
field_info fields fields_count All member variables in the current class also contain other information, such as type, class, etc
u2 medthods_count 1 The number of
medthod_info methods methods_sount All methods in the class are logged, including additional information.
u2 attribute_count 1 The number of
attribute_info attributes attributes_count Anything related to class attributes that is not included above will be included in this document, such as annotations.

If we look at this table, the class file defines a lot of fields, and these fields contain a lot of content. Through these fields, the JVM can find all the contents of our class

Access_flags: scope

This diagram clearly shows the scope of the access_flags field.

Constant_pool: Constant pool. There are several common types

​ CONSTANT_Integer_info

​ CONSTANT_Long_info

​ CONSTANT_String_info

// All of our information is stored in the constant pool.

CONSTANT_Class_info // Class information, such as name, reference to the class, etc

CONSTANT_Fieldref_info // Information about variables in a class

CONSTANT_Methodref_info // Method information in the class

You can view the contents of the class file using a tool called 010 Editor.

U2 in struct cp_info_constant_pool[0] represents the unsigned number, u2 represents the access flag, such as u2 class_index refers to the class to which this method belongs. It’s kind of an index,

Notice the U1 tag:

There are 14 types in the constant pool, all of which are a table and each table has its own mechanism. All of these constants have one feature: each constant starts with a flag bit represented by an unsigned number of type U1, as shown in the following table:

Struct cp_info constant_poll[0] (struct cp_info constant_poll[0]) (Methodref) (struct cp_info constant_poll[0]) (Methodref) We can see what type the U1 tag represents.

Methods_count = 1 struct field_info (methods_count = 2) methods_count (methods_count = 2) This corresponds to the table above

Class file malpractice

Large memory footprint, not suitable for mobile terminal

Stack loading mode, slow loading

File IO operations are many, but class lookup is slow. Find and load classes every time they are loaded

Dex file structure is analyzed in depth

What is a dex file

A file format that can be recognized, loaded, and executed by DVM. Dex files can be generated in C and C ++

How to generate a DEX file

Help us build automatically through the IDE

Manually generate the dex file by running the dx command

Function of dex file

Record the information of all class files in the entire project, is the entire project (class is the information of the current class)

Dex File structure

An 8-bit binary stream file

Each data is arranged in close order without gap

All Java source files in the entire application are placed in one dex, and multidex is not considered here

Dex file header

  1. Magic: generally called magic number, it can determine whether the current dex file is valid or not. It can be seen that its size is 9 h, that is, he uses 8 1 byte unsigned numbers.

  2. Checksum: indicates the checksum of the dex file, which is used to determine whether the dex file is damaged

  3. Signature [] : used to verify the dex file, which is a sha-1 signature of the entire dex file.

  4. File_size: indicates the size of a file, occupying 4 bytes

  5. Header_size: indicates the header size of the DEX, occupying 4 bytes

  6. Endian_tga: specifies the CPU in the DEX environment. The default value is 0x123456789

  7. Link_size and link_off: specify the size of the link segment and the offset of the file, respectively, usually 0

  8. Map_off: specifies the DexMapList file offset

  9. String_ids_size and string_IDs_offf: These two fields represent the number and positional offsets of all strings used in the dex file. By converting size to 16 and offset to 112.

10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The last null character “0” indicates the end. And then we find 70H and let’s see

Now let’s look at 70h

Starting at 70 and ending at AOh, all selected are the offsets of the string by which we can find the string. If you look at the second box above, which is the index of the string, you can see that it also starts at 70H and is 40h, which is to A0h. The actual string index is stored in this area. So if we look at 70H, the first 92, 01, 00, 00, that represents the offset address 0192H, and then we look for 0192h

As you can see, I have selected 8 bytes in total, of which we can use 6 bytes. The first 06 is the number of bytes we use, and the last 0 is the end of the string. Let’s convert them:

hexadecimal 3C 69 6E 69 74 3E
The decimal system 60 105 110 105 116 62
ASCII < i n i t >

Online base conversion table

Online ASCII comparison table

In this way we can find the hexadecimal string and convert it to the corresponding string.

Everything else is the same, we don’t need to look like this, in fact, we can directly find the corresponding value in the index area. Take this example above:

This number starts with 163, because he doesn’t have the 06 in front of him, which means he’s going to need at least six more.

That’s pretty much all the way down from 9. The above method can be used to find the specific location.

The index area

Type index:

Such as Hellow class index, Object index, String index. These are all the references that we have.

Method prototype Index:

Field index:

Method index:

The e above main method, printStream printing method, etc. It records the method index referenced by the current class and the inherited method index

Types of index:

Index all classes in the entire dex file

The map list:

It is a check on the entire header file

Data area

The value of each index is the data.

Finally, take a look at the unlikely format of the entire dex

Difference between Class and Dex

Each class file is a table. This file only records the current Java information.

Dex divides the file into three areas that store information about all the Java files in the project, so dex takes advantage as the class grows. He only needs a DEX file, many areas can be reused, reducing the size of the DEX file.

In essence, they are the same, dex evolved from the class file, but there is a lot of redundant information in CalSS, dex removed the redundant information, and integrated


If this article is helpful to your place, we are honored, if there are mistakes and questions in the article, welcome to put forward!