This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details
Class file structure in-depth parsing
What is a class file
A file format that can be recognized, loaded, and executed by the JVM is similar to the MP3 format, which can only be used in certain places. For example, mp3 files can only be played, whereas class files need to be loaded by the JVM.
The Java language is not the only language that can generate class files. There are other languages as well:
How do I generate a class file
Class files are generated automatically through the IDE and calSS files are executed through run
Generate class files through javac and execute class files through Java commands
The class file
Record all the information in a class file, remember all the information, including class names, methods, variables, etc., far more information is recorded in class than in Java source code. For example, we don’t define keywords like this and super in the class, but we can use them because the Java virtual machine recorded them in the class file.
Class file structure
An 8-bit binary file
Each data in close order, no gap, do not want some files, for easy reading, do not let the data close arrangement. The nice thing about class files is that they are small.
Each class or interface occupies a separate class file
The class file format uses a struct-like structure to store data. This structure has only two types of data: unsigned columns and tables. Unsigned numbers are the basic data types, u1, U2, U4, and U8, representing one byte, 2,4, and 8 bytes respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or utF-8 string values. Tables are compound data structures composed of multiple unsigned numbers or other tables, all of which end in _info. Tables are used to describe hierarchical compound structured data types. (But I think it’s more like a structure)
Let’s look at the structure
type | The name of the | The number of | role |
---|---|---|---|
U4 (unsigned four bytes) | magic | 1 | Encrypted segment, whether the current CALSS file is string modified |
u2 | minor_version | 1 | At least that version of the JDK can be loaded |
u2 | major_version | 1 | Calss is currently generated for a different version |
u2 | constant_pool_count | 1 | The number of constant pools, usually only one |
cp_info | constant_pool | constant_poll_count – 1 | A real constant pool, which contains a lot of stuff inside, as explained below |
u2 | access_flags | 1 | Scope flags, such as whether the class file is of public or public final type, etc. I’ll put a picture at the bottom |
u2 | this_class | 1 | This, the JVM added this field to the class when it was generated. Is that why we can use it without defining it |
u2 | super_class | 1 | Same as this above |
u2 | interfaces_count | 1 | The number of |
u2 | interfaces | interfaces_count | These two indicate how many interfaces are implemented by the current class. Note: only the direct ones are counted, not the indirect ones, such as interfaces implemented by the parent class are not counted. |
u2 | fields_count | 1 | The number of |
field_info | fields | fields_count | All member variables in the current class also contain other information, such as type, class, etc |
u2 | medthods_count | 1 | The number of |
medthod_info | methods | methods_sount | All methods in the class are logged, including additional information. |
u2 | attribute_count | 1 | The number of |
attribute_info | attributes | attributes_count | Anything related to class attributes that is not included above will be included in this document, such as annotations. |
If we look at this table, the class file defines a lot of fields, and these fields contain a lot of content. Through these fields, the JVM can find all the contents of our class
Access_flags: scope
This diagram clearly shows the scope of the access_flags field.
Constant_pool: Constant pool. There are several common types
CONSTANT_Integer_info
CONSTANT_Long_info
CONSTANT_String_info
// All of our information is stored in the constant pool.
CONSTANT_Class_info // Class information, such as name, reference to the class, etc
CONSTANT_Fieldref_info // Information about variables in a class
CONSTANT_Methodref_info // Method information in the class
You can view the contents of the class file using a tool called 010 Editor.
U2 in struct cp_info_constant_pool[0] represents the unsigned number, u2 represents the access flag, such as u2 class_index refers to the class to which this method belongs. It’s kind of an index,
Notice the U1 tag:
There are 14 types in the constant pool, all of which are a table and each table has its own mechanism. All of these constants have one feature: each constant starts with a flag bit represented by an unsigned number of type U1, as shown in the following table:
Struct cp_info constant_poll[0] (struct cp_info constant_poll[0]) (Methodref) (struct cp_info constant_poll[0]) (Methodref) We can see what type the U1 tag represents.
Methods_count = 1 struct field_info (methods_count = 2) methods_count (methods_count = 2) This corresponds to the table above
Class file malpractice
Large memory footprint, not suitable for mobile terminal
Stack loading mode, slow loading
File IO operations are many, but class lookup is slow. Find and load classes every time they are loaded
Dex file structure is analyzed in depth
What is a dex file
A file format that can be recognized, loaded, and executed by DVM. Dex files can be generated in C and C ++
How to generate a DEX file
Help us build automatically through the IDE
Manually generate the dex file by running the dx command
Function of dex file
Record the information of all class files in the entire project, is the entire project (class is the information of the current class)
Dex File structure
An 8-bit binary stream file
Each data is arranged in close order without gap
All Java source files in the entire application are placed in one dex, and multidex is not considered here
Dex file header
-
Magic: generally called magic number, it can determine whether the current dex file is valid or not. It can be seen that its size is 9 h, that is, he uses 8 1 byte unsigned numbers.
-
Checksum: indicates the checksum of the dex file, which is used to determine whether the dex file is damaged
-
Signature [] : used to verify the dex file, which is a sha-1 signature of the entire dex file.
-
File_size: indicates the size of a file, occupying 4 bytes
-
Header_size: indicates the header size of the DEX, occupying 4 bytes
-
Endian_tga: specifies the CPU in the DEX environment. The default value is 0x123456789
-
Link_size and link_off: specify the size of the link segment and the offset of the file, respectively, usually 0
-
Map_off: specifies the DexMapList file offset
-
String_ids_size and string_IDs_offf: These two fields represent the number and positional offsets of all strings used in the dex file. By converting size to 16 and offset to 112.
10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The last null character “0” indicates the end. And then we find 70H and let’s see
Now let’s look at 70h
Starting at 70 and ending at AOh, all selected are the offsets of the string by which we can find the string. If you look at the second box above, which is the index of the string, you can see that it also starts at 70H and is 40h, which is to A0h. The actual string index is stored in this area. So if we look at 70H, the first 92, 01, 00, 00, that represents the offset address 0192H, and then we look for 0192h
As you can see, I have selected 8 bytes in total, of which we can use 6 bytes. The first 06 is the number of bytes we use, and the last 0 is the end of the string. Let’s convert them:
hexadecimal | 3C | 69 | 6E | 69 | 74 | 3E |
---|---|---|---|---|---|---|
The decimal system | 60 | 105 | 110 | 105 | 116 | 62 |
ASCII | < | i | n | i | t | > |
Online base conversion table
Online ASCII comparison table
In this way we can find the hexadecimal string and convert it to the corresponding string.
Everything else is the same, we don’t need to look like this, in fact, we can directly find the corresponding value in the index area. Take this example above:
This number starts with 163, because he doesn’t have the 06 in front of him, which means he’s going to need at least six more.
That’s pretty much all the way down from 9. The above method can be used to find the specific location.
The index area
Type index:
Such as Hellow class index, Object index, String index. These are all the references that we have.
Method prototype Index:
Field index:
Method index:
The e above main method, printStream printing method, etc. It records the method index referenced by the current class and the inherited method index
Types of index:
Index all classes in the entire dex file
The map list:
It is a check on the entire header file
Data area
The value of each index is the data.
Finally, take a look at the unlikely format of the entire dex
Difference between Class and Dex
Each class file is a table. This file only records the current Java information.
Dex divides the file into three areas that store information about all the Java files in the project, so dex takes advantage as the class grows. He only needs a DEX file, many areas can be reused, reducing the size of the DEX file.
In essence, they are the same, dex evolved from the class file, but there is a lot of redundant information in CalSS, dex removed the redundant information, and integrated
If this article is helpful to your place, we are honored, if there are mistakes and questions in the article, welcome to put forward!