Many people use Java, and many use Kotlin, but their compiled files, known as class files, are not popular because they are difficult to understand and less useful for actual business development. However, it is not useless, you just do not understand it.
Start the text.
View the class file
First, we need to write a Java file:
public class Test { public void sayHello(){ System.out.println("Hello"); }}Copy the code
It is then compiled using Javac.
Not afraid to understand, even the flow chart I have written for you:
Open with Vim:
vim -b Test.class
Copy the code
How did this happen? How did this class file look different from the previous class file?
Ok, let’s switch to hexadecimal:
: %! xxdCopy the code
It’s much more comfortable. All right, let’s get started.
Class file structure
First of all, we have to think about the compilation of Java code into bytecode, which means that the bytecode contains all the contents of Java code. In this case, what is the most efficient way to store data using bytecode? Do you use newlines to distinguish data? Or use special notation? None of them seem to be ideal.
Bytecode is actually stored in streams, such as bamboo, in sections, where each section stores different data and may not be the same length. It’s probably stored this way:
There may be some students will have questions, since it is like bamboo, one by one, then how to determine the length of this section?
There are generally two solutions:
- Fixed number of bytes
- A fixed number of bytes is used at the beginning of the section to indicate the length of the section
If the division is made in this way, it can also be stated as follows:
The magic number
Fixed 4 bytes and fixed cafebabe, some of you might be wondering, why cafebabe?
Java ICONS and Google Translate should solve our problem:
The version number
Fixed 4 bytes, the first two bytes being the minor version and the last two bytes being the major version:
Minor version: 0000 Major version: 0034, converted to decimal: 52
Let’s check the table:
So, my current version is Java SE 8.0
Constant pool
Variable length, the number of subsequent tables determined by the first two bytes.
A constant pool is not a static variable in our Java code. It is a pool of these tables:
Where I extract the CONSTANT_Utf8_info table to see:
- Tag: A byte that identifies a table. Almost every table has this tag
- Length: two bytes, indicating the length of the following bytes
- Bytes [length] : indicates the length of a byte that stores real data
At this point, some of you might be confused.
Huh? I don’t know if I’m talking bytecode? Isn’t bytecode in streaming mode? How does it relate to a table, how does a table represent a data flow?
Let me make it simple:
Access tokens
It contains two characters to mark the access information of the current class file, for example: Is this class an interface? The class? Enumeration? With or without the public modifier, there are the following values:
Class/superclass/interface:
- Class: Takes two bytes and identifies the class index. Which table is identified as the constant pool, such as 0x0007, refers to the seventh table in the constant pool
- Parent: Takes two bytes and identifies the parent index. Same as above.
- Interface: Variable length. The first two bytes identify the number of interface indexes and the rest are interface indexes.
Field description set
Variable length, with the first two bytes identifying the format of the field table followed by the field table:
- 5. Access_flags: a fixed two-byte unique identifier for field access, including the following:
- Name_index: a fixed two-byte field name index that identifies the table in the constant pool.
- Descriptor_index: Fixed two bytes, field description index, identifying the table in the constant pool.
- Attributes_count: Fixed two bytes identifying the number of subsequent attribute tables.
- Attributes [attributes_count] : table of attributes:
This will not go on to say, are the same routine, interested can go to query.
Method description set
- Methods_count: specifies the number of subsequent method tables.
- Methods [methods_count]
Attribute description set
- Attributes_count: Fixed two bytes identifying the number of subsequent attribute tables.
- Attributes [attributes_count] : table of attributes:
As long as the things I explained before we all understand the words, the subsequent ones according to the gourd gourd gourd gourd to see the next line, there is no need to recite, know the whole logical structure can, if interested, you can carefully view through the link: document
Bytecode isn’t that hard after all, just follow the rules.
However, the hexadecimal bytecode is not intended for developers, we can decompile it in Javap to make it clearer.
The class decompiling
javap -v Test.class
Copy the code
See if it looks like the structure we talked about above. Because some data is not available, some are omitted, such as the field description set.