🔉 introduction

Code to compile the results of the transition from native machine code to bytecode, seems to be the storage format that’s one small step for progress, but it is a programming language, a big step in the development of our computer from birth to development for so many years is still don’t have the intelligence, common sense still only know 1 s and 0 s, which we still have to compile a Java program into machine code to computer to run, However, due to the development of virtual machines and a large number of languages built on virtual machines, machine code is not our only choice, and more and more program languages choose the operating system independent, platform-neutral format as the program language compiled storage format.

If x86 was the only instruction set in the world and Windows was the only operating system, there would be no need for the Java language, which was created to be written once and run everywhere. In the application layer of the operating system, Java virtual machines can be built on different platforms and different hardware. These virtual machines can load and run bytecodes that do not belong to this platform, so as to realize the desire to break the platform limit, really write once, run everywhere.

The bytecode

Byte Code is a storage format supported by different platforms. The foundation for language independence is Byte Code. Java virtual machines are only associated with Class files. At the same time, the Java Virtual Machine Specification imposes many syntactic and structural constraints for security reasons.

All kinds of symbolic variables, syntax and keywords in Java language are combined by multiple bytecode instructions, so the expression ability of bytecode must be more powerful than the Java language itself. Therefore, language features that cannot be expressed by Java language do not mean that bytecode cannot be expressed. This also provides room for other programming languages to implement features that differ from the Java language.

Structure of the Class file

I remember when I first read the content, the feeling just like reading a word gobbledygook, special boring, also don’t know what to do, but it is very important, one of the basis of the Java virtual machine is a key way for the understanding of the Java virtual machine, there is no shortcut, if you want to more in-depth study on the knowledge of the Java virtual machine, we must firmly go on.

The Java Virtual Machine (JVM) has maintained very good backward compatibility thanks to the stable structure of the Class file. The first Java Virtual Machine Specification was published in 1997, but over the course of more than a dozen iterations, the details of the definition have changed little. Even the changes are extensions to the existing infrastructure, without making any changes to what has already been defined.

The CLass file is a group of binary streams in the unit of 8 bytes. Each data item is arranged in strict order. There is no separator in the middle, and no gap exists.

The Java Virtual Machine Specification states that Class files are stored in pseudo-structures similar to THE C-language constructs, which contain only two types: unsigned numbers and tables. These are two basic elements that are important to understand.

What is an unsigned number? An unsigned number is a basic type of data. U1, U2, u4, and U8 represent unsigned numbers of 1, 2, 4, and 8 bytes respectively. They can be used to describe numbers, index references, quantity values, or utF-8 string values.

What is a watch? A table is a data type that is a mixture of unsigned numbers and other tables. For ease of distinction, all table names end with _info. The entire Class file is essentially a table.

Whether unsigned numbers or tables, when the same type needs to be described, but the number of data is uncertain, it is necessary to adopt the set approach, that is, a leading capacity counter plus a continuous data item representation.

The Class file is not like the familiar XML file format, because it does not have delimiters, so the data items to be stored in order, in number, or in byte order of the data store information must be strictly defined. To what extent? The meaning of which byte, length and order are not allowed to change.

Magic number in the Class file

The first four bytes of each Class file are called magic numbers, also known as 0xcafebabe. The use of magic numbers as identifiers is of course for security reasons, and the fact is that not only Java but also familiar images have similar magic numbers. Figure is as follows:

The version number in the Class file

Following the magic number is the version number in the Class file, so our Java8 version is 52.0. To show that I’m not bragging, let’s go to the code.

Bytes 5 and 6 are minor version numbers, and bytes 7 and 8 are major version numbers

Constant pool

The number of constants in the constant pool is uncertain, so the entry of U2 represents the value of the constant pool container. We can see that there are 18 constants, and the index value ranges from 1 to 18. 0 is left blank. If some subsequent index data pointing to the constant pool does not refer to any constant pool, it can be identified by the 0 index value.

There are two main constants stored in the constant pool: literals, which are similar to constants at the Java language level, and symbolic references, which are part of compilation principles. There are mainly the following constants:

  1. Packages exported or exposed by modules (Package)
  2. Fully qualified names of classes and interfaces (Fully Qualified Name)
  3. Field name and descriptor (Descriptor)
  4. Method name and descriptor,
  5. Method handles and method types (Method Handle, Method Type, Invoke Dynamic)
  6. Dynamic call points and dynamic constants (Dynamically-Computed Call Site, Dynamically-Computed Constant)

It is important to note that our Class file does not store any information about methods and fields in the memory layout, that is, cannot be used directly. You definitely need to replace symbolic references in the constant pool with actual memory addresses during actual class loading, class creation and parsing.

For every constant in the constant pool, there is a constant table. Remember that the suffix of the table is _info.

Constant pool item table

My mother, my mother called me to go home for dinner, first slip away…

After eating back, these constants are different, have their own independent data structure, and there is no connection between the two, if you want to general introduction, it is not even the door, so you can only touch one by one, first to see who is lucky.

So it’s not enough to have the project table, we need to bring in the structure table.

Project structure table

CONSTANT_Methodref_info represents the symbolic reference to a method in the class. Tag is 10. This tag is a bit that identifies a constant type. The index is 04, which points to the CONSTANT_Class_info constant in the constant pool, and the other index is 15, which points to the CONSTANT_NameAndType constant in the constant pool.

And then, let’s test it to see if it’s a braggart. How do we test it? We can enter the command javap as shown below:

Haha, isn’t that interesting? But look at this graph and you do know that # plus number corresponds to the constant pool reference, but there is no number. For example, I, V, LineNumberTable, LocalVariableTable, this part is not really handled by Java code, but is handled by the compiler automatically, and these values are referenced by field tables, method tables, and property tables. It is used to describe return values, parameter types and numbers, etc., because Java classes are infinite, so it is not possible to use unsigned numbers to represent, otherwise how long it would be. When the method needs to describe this information, it can directly refer to symbolic references in the constant table.

Access tokens

The constant pool is finally over, but our journey is not over yet. The constant pool is followed by the access flag, which is to see if your Class is a Class or an interface. Is the abstract? Are you being targeted by final? The access flags are as follows:

Access tag table

There are 16 access flags, only 9 are currently visible. Our test class is just a normal class, using only two access flags, and the value of 20 plus 1 is 21.

Class, superclass, interface index collection

It’s not enough to know what our class is, because Java supports inheritance, but how is this relationship represented in a class file? Index, class index and superclass index are all represented by U2, and interface index is naturally a set of U2. The index is used to determine the fully qualified name of the class. Java does not support multiple inheritance, so a class has only one index of its parent class.

How many implements an interface can be, so an interface is represented by an index set of implemented interfaces in left-to-right order after the implements keyword. The class index, the parent index, and the interface index collection are all arranged in order after the access flag. We can see that 03 and 04,00 represent the class index, the parent index, and the index interface collection respectively.

Set of field tables

Field tables are used to describe variables declared in interfaces or classes. Fields in The Java language include class-level variables and instance-level variables, but not local variables declared inside methods. Modifiers that fields can contain have field scopes. Static modifier, final modifier, concurrent visibility, serialization, field data type, and field name.

Modifiers are tricky, either there or there, so they are good for fiAG, while field names, what types are defined, and data are not fixed and can only refer to the constant description of the constant pool.

The field table is as follows:

Field access flags look like this:

Due to syntax restrictions, ACC_PUBLIC, ACC_PRIVATE, and ACC_PROTECTED can only have one flag. Fields in the interface must have ACC_PUBLIC, ACC_STATIC, and ACC_FINAL flags. These are caused by Java language rules.

The access flag is followed by two parameters name_index and Descriptor_index, both references to the constant pool, representing the simple name and the field and method descriptors, respectively, and the fully qualified name of the class mentioned earlier, all at once.

Fully qualified names: Replace the “.” of class names with “/”, and use “;” for consecutive fully qualified names. Segmentation. Simple names: Method and field names without type and parameter modifications. Method and field descriptors: Describe the type of the field, the parameters of the method, and the return value. According to the descriptor rules, basic types and Void types representing no return values are represented by an uppercase character, and object types are represented by the fully qualified name of the character L plus the class.

// Picture changed descriptor identifier character meaning:

Note: The Void type in the Java Virtual Machine Specification is a separate descriptor for VoidDescriptor, so FOR the sake of laziness, I put them all together as base types.

So once we’ve done that, what about array types? What about array types? The common “java.lang.string [][]” ->[[Ljava/lang/String; “an integer array” int[] “will be recorded as” [I “.

What about the method? How do descriptors describe methods? Before say the answer, let’s guess, the method has a parameter list and return values, there must be between order, is actually a parameter list before, and the return value, the parameters of the parameter list is constant, the return value type with a character, it should be (parameter list) + one character at a time, fully qualified name of a class, let’s look at the reality: If the toString() descriptor is: ()Ljava/lang/String, let’s do a harder one:

Int indexOf(char[]source, int sourceOffset, int sourceCount, char[]target, int targetOffset, int targetCount, Int fromIndex (🆗 I, int fromIndex (🆗 I, int fromIndex)) [CII[CIII] I = [CII[CIII] I = [CII[CIII] I = [CII[CIII] I = [CII[CIII] I = [CII[CIII] I = [CII[CIII] I

01 = 1; 01 = 1; 01 = 1; 05 = m; 06 = I; Isn’t that funny, hahaha.

Descriptor, followed by a collection of property sheet that is used to store some additional information about the attribute table details behind us, here, only need to pay attention to the field will not appear in the table is inherited from a parent or parent interface fields, but there might be a Java does not exist in the field, you said this is not to be haunted, not ah, In addition, the Java language does not allow double names, but the class file does, as long as the descriptor is different.

Method table

Once the field table is in place, the method table is easy because the two are almost identical and contain access flags, name indexes, descriptor indexes, property list sets, etc., as shown below:

Method table structure

Method table flag

I don’t know if you have any questions. We describe the shell of the method very clearly. Where is the method body to dig? The method body is compiled into this instruction by our Javac compiler and placed in a property table called Code in the method property table collection, which we’ll cover later, so don’t worry about it.

So let’s go back and look at the same thing, entry is the number of methods 2, then access flag 1, for public, then name index 7, for constant pool initialization method, that’s what the compiler added, descriptor index 8, number of attributes 1, for constant pool 9, looks like a Code.

In the Java language, to override a method, in addition to the original method of the same name, also have a characteristic signature is different from the original method, the characteristics of the signature is refers to the various parameters in a method in the field in the constant pool collection referenced by the symbol, the return value will not included in this collection, so without a return value to distinguish, but the Class file to distinguish the range is bigger, The return value is also computed.

📝 digression

The last remaining part is the property table, about the property table afraid you look too tired, so do a separate chapter

I am a simple and ordinary person, but also hope that after experiencing the cruelty of the society, still simple and ordinary.

I do not know when to start, I began to feel the value of life, every day I can see the figure of 120 first aid, know that will always be filled with new life, there will be new life gone, life long and short, have you ever wondered, if you only five years, what do you choose, or a little bit more simple, we are you going to doing in five years, Want to hear your answer, see if you have a long-term plan, whether to live one day at a time, life is so short, really willing to live one day at a time?

Let’s take a look at the answer from Kuli. This is his personal experience. At this time, he is already living in a second-tier city and living a life of wealth and freedom. At the age of 25, He left JD.com for Baidu.

In Baidu, I met the architect for a long time (T9 level), because of his words, Bitter Li from an ordinary technical worker into the eyes of others.

At that time, due to the need of business and thanks to The help of Jiu Ge, he once said to Kuli:

“Could you become an expert in databases if you spent five years studying them?” he asked. He said, “You can become an expert in a certain field when you are 25 years old, 30 years old in five years, 30 years old. Why not do that? You look around at how many 30-year-olds there are who haven’t done anything yet, and you’re already a database expert.”

So what would you like to spend 5 years doing? Tell me your answer?