Bytecode structure analysis

Many people use Java, and many use Kotlin, but their compiled files, known as class files, are not popular because they are difficult to understand and less useful for actual business development. However, it is not useless, you just do not understand it.

Start the text.

View the class file

First, we need to write a Java file:

public class Test { public void sayHello(){ System.out.println("Hello"); }}Copy the code

It is then compiled using Javac.

Not afraid to understand, even the flow chart I have written for you:

Open with Vim:

vim -b Test.class
Copy the code

How did this happen? How did this class file look different from the previous class file?

Ok, let’s switch to hexadecimal:

: %! xxdCopy the code

It’s much more comfortable. All right, let’s get started.

Class file structure

First of all, we have to think about the compilation of Java code into bytecode, which means that the bytecode contains all the contents of Java code. In this case, what is the most efficient way to store data using bytecode? Do you use newlines to distinguish data? Or use special notation? None of them seem to be ideal.

Bytecode is actually stored in streams, such as bamboo, in sections, where each section stores different data and may not be the same length. It’s probably stored this way:

There may be some students will have questions, since it is like bamboo, one by one, then how to determine the length of this section?

There are generally two solutions:

Fixed number of bytes
A fixed number of bytes is used at the beginning of the section to indicate the length of the section

If the division is made in this way, it can also be stated as follows:

The magic number

Fixed 4 bytes and fixed cafebabe, some of you might be wondering, why cafebabe?

Java ICONS and Google Translate should solve our problem:

The version number

Fixed 4 bytes, the first two bytes being the minor version and the last two bytes being the major version:

Minor version: 0000 Major version: 0034, converted to decimal: 52

Let’s check the table:

So, my current version is Java SE 8.0

Constant pool

Variable length, the number of subsequent tables determined by the first two bytes.

A constant pool is not a static variable in our Java code. It is a pool of these tables:

Where I extract the CONSTANT_Utf8_info table to see:

Tag: A byte that identifies a table. Almost every table has this tag
Length: two bytes, indicating the length of the following bytes
Bytes [length] : indicates the length of a byte that stores real data

At this point, some of you might be confused.

Huh? I don’t know if I’m talking bytecode? Isn’t bytecode in streaming mode? How does it relate to a table, how does a table represent a data flow?

Let me make it simple:

Access tokens

It contains two characters to mark the access information of the current class file, for example: Is this class an interface? The class? Enumeration? With or without the public modifier, there are the following values:

Class/superclass/interface:

Class: Takes two bytes and identifies the class index. Which table is identified as the constant pool, such as 0x0007, refers to the seventh table in the constant pool
Parent: Takes two bytes and identifies the parent index. Same as above.
Interface: Variable length. The first two bytes identify the number of interface indexes and the rest are interface indexes.

Field description set

Variable length, with the first two bytes identifying the format of the field table followed by the field table:

5. Access_flags: a fixed two-byte unique identifier for field access, including the following:

Name_index: a fixed two-byte field name index that identifies the table in the constant pool.
Descriptor_index: Fixed two bytes, field description index, identifying the table in the constant pool.
Attributes_count: Fixed two bytes identifying the number of subsequent attribute tables.
Attributes [attributes_count] : table of attributes:

This will not go on to say, are the same routine, interested can go to query.

Method description set

Methods_count: specifies the number of subsequent method tables.
Methods [methods_count]

Attribute description set

Attributes_count: Fixed two bytes identifying the number of subsequent attribute tables.
Attributes [attributes_count] : table of attributes:

As long as the things I explained before we all understand the words, the subsequent ones according to the gourd gourd gourd gourd to see the next line, there is no need to recite, know the whole logical structure can, if interested, you can carefully view through the link: document

Bytecode isn’t that hard after all, just follow the rules.

However, the hexadecimal bytecode is not intended for developers, we can decompile it in Javap to make it clearer.

The class decompiling

javap -v Test.class
Copy the code

See if it looks like the structure we talked about above. Because some data is not available, some are omitted, such as the field description set.