preface

Hi, I’m Kunge

From today on I intend to the entire Java series of advanced basic article, I believe that you will be helpful to help, great oaks grow from little acrets, lay a good foundation we can go better, for example, before I saw such a sentence in the Kafka article in wu ge “in addition, pageCache (pageCache) has a huge advantage. Anyone who has used Java knows that, instead of using a page cache, an object’s memory overhead (often several times or more of the actual data size) can be very high. “If you don’t understand the representation of Java objects, you’ll be stunned: On the other hand, if you know the principles of Object layout, GC, NIO, etc. in Java, it’s easy to understand how these frameworks work and how they are designed

Another reason that makes me decide to write this series is that some readers often ask me about learning routes. I have written some Outlines before, but failed to expand them from the point level, so THIS time I plan to explain each knowledge point in detail from the point point, and then organize it into A PDF, so that if someone asks me again, Just throw this PDF to them. Welcome to join me on wechat: Geekoftaste

Each series will be illustrated in a graphic way, trying to make it simple. For example, we mentioned above that the object cost a lot, how much is it? I will guide you to analyze step by step by way of diagrams. You can see why an int[128][2] has 246% more overhead than an int[256]. For example, we all know that yong gc or Old GC will be triggered when the Eden or Tenured region is full. There are a lot of reasons why GC pauses are too long. If you read the general ideas I’ve summarized, you’ll be able to quickly check out the problems according to this theory. This series of articles is very useful for improving Java internal power

The Java family is outlined as follows:

In this article, we’ll take a look at bytecodes. After all, this is the fundamental reason why Java is cross-platform, and bytecodes can help unlock the secrets of how the JVM runs

Can you briefly introduce the features of Java

Java is an object-oriented, statically typed language, with cross-platform characteristics, unlike C,C++ which requires manual memory management, compilation, it is interpreted, with cross-platform and automatic garbage collection characteristics, so its cross-platform exactly how to achieve?

We know that the computer can only identify the binary code to machine language. So no matter use what a high-level language, eventually had to be translated into machine language to identify and implement by the CPU, for these compiled c + + language is a direct one pace reachs the designated position into corresponding platform in the executable file (that is, the machine language instructions), and for Java, The source file is first compiled into bytecode by the compiler, which is then interpreted by the VIRTUAL machine (JVM) as machine instructions at runtime, as shown in the following figure

That is to say, Java is cross-platform through bytecode, which is interpreted and executed by the JVM implemented for each platform. The JVM hides OS differences and we know that Java projects are deployed as Jar packages (a collection of class files). This means that JAR packages can be run on any platform (interpreted and executed by the JVM on that platform), which is why Java can be cross-platform

That’s why the JVM can run languages like Scala, Groovy, and Kotlin, not because the JVM executes them directly, but because they eventually generate bytecode that the JVM can execute, and if you’ve noticed, The use of bytecode also exploits the idea of layering in computer science, effectively shielding interactions with the upper layers by adding an intermediate layer such as bytecode.

How does the JVM execute bytecode

Before we do that, let’s take a look at the JVM’s overall memory structure to get a sense of it, and then how does the JVM execute bytecode

JVM memory is divided into stacks, heaps, non-heaps, and the JVM itself. Heaps are used to allocate class instances and arrays. Non-heap includes “method areas,” “memory needed for processing or optimization within the JVM (such as jIT-compiled code caches),” each class structure (such as runtime pool, field, and method data), and code for methods and constructors

We focus on the stack. We know that threads are the smallest unit of CPU scheduling. In the JVM, once a thread is created, it is assigned a thread stack

Now that we’re close to the truth about JVM execution, the JVM executes in stack frames, which are made up of four components

  • The return value
  • Local Variables: Store Local Variables used by methods
  • Dynamic linking: In byte code, all of the variables and methods in the form of symbols refer to save in the constant pool of class files, such as a method call another method, is through the constant pool point method of symbol to represent the reference, the role of dynamic link is to convert these symbols refer to call a method of direct reference, So maybe some people don’t understand this, so let’s just do itjavap -verbose Demo.class Command to see what a constant pool in bytecode looks like

Note: Only partial symbolic references in the constant pool are listed above

#4 refers to #19, #19 refers to Object, #16 refers to #7.#8, #7 refers to the name of the method, and #8 refers to ()V. When the bytecode is loaded, the class information is loaded into the method area in the meta-space (after Java 8), and dynamic linking replaces these symbolic references with direct references to the calling method, as shown below

In order to support Java polymorphism, we declare a variable Father f = new Son(). When f.thod () is executed, it is bound to son’s method(if any). This is because of the technique of dynamic linking, also known as late binding, as opposed to static linking (also known as early binding), which can be determined at runtime. Static linking occurs at compile time, which means methods are bound before the program executes. In Java, only final, static, private, and constructor methods are pre-bound. Dynamic linking happens at run time, and almost all methods are bound at run time

Let me give you an example of the difference between the two

class Animal{
    public void eat(a){
        System.out.println("Animal feeding."); }}class Cat extends Animal{
    @Override
    public void eat(a) {
        super.eat();// behaves as early binding (static linking)
        System.out.println("Cat feeding"); }}public class AnimalTest {
    public void showAnimal(Animal animal){
        animal.eat();// displays late binding (dynamic linking)}}Copy the code
  • Operand Stack: A program is mainly composed of instructions and operands. Instructions are used to explain what an operation does, such as addition or multiplication. Operands are the data to be executed by instructionsStack-based instruction set architectureandRegister-based instruction set architectureBelong to the former two kinds, the JVM instruction set, that is to say, any operation is a stack to manage, based on the stack instruction can better realize cross-platform, stack is in memory allocation, and often linked and hardware registers, different hardware architectures are not the same, does not favor the cross-platform, of course in the instruction set architecture disadvantages of stack is also very obvious, The stack-based implementation takes more instructions to complete (because the stack is just a FILO structure that needs to be pushed and removed frequently), and the registers are in the CPU’s cache.Stack-based speeds are considerably slowerThis is also a performance sacrifice for cross-platform, as you can’t have your cake and eat it too.

Introduction to Java bytecode technology

Note that the thread also has a “PC program counter”, which is unique to each thread and records the line number of the bytecode executed by the current thread, that is, the address to the next instruction, that is, the instruction code to be executed. The execution engine reads the next instruction. Let’s take a look at what the bytecode looks like. Suppose we have the following Java code

package com.mahai;
public class Demo {
	  private  int a = 1;
    public static void foo(a) {
        int a = 1;
        int b = 2;
        int c = (a + b) * 5; }}Copy the code

After executing javac Demo. Java, you can see the following bytecode

The bytecode is meant for the JVM, so we need to translate it into code that adults can understand. The good news is that the JDK provides a de-parsing tool, Javap, that can de-parse code areas (assembly instructions), local variation tables, exception tables and line-offset mapping tables, constant pools, and more. To see how a file that is reverse-parsed by bytecode looks like, run javap-verbose for more details. In this case, we are focusing on how the Code section is executed, so we use javap -c

javap -c Demo.class
Copy the code

This form is much more readable, but what do aload_0 and invokespecial mean, and how does Javap parse these instructions from bytecode

First of all we need to understand what is the instruction, the instruction operand = operation code + opcode said this instruction what to do,, such as addition, subtraction, multiplication, and division operand is opcode operation number, such as 1 + 2 this instruction, operation code is actually addition, 1, 2 for the operands, each operation in the Java code is represented by a byte, Each opcode has a corresponding mnemonic such as ALOad_0, invokespecial, iconST_1. Some opcodes already contain operands. For example, the mnemonic corresponding to bytecode 0x04 is iconst_1, indicating that int 1 is pushed to the top of the stack. These opcodes are equivalent to instructions, and some of them require operands to form instructions, such as bytecode 0x10 for bipush, followed by an operand that pushes the constant value of a single byte (-128 to 127) to the top of the stack. Examples of bytecodes and mnemonics are listed below

The bytecode mnemonics Said meaning
0x04 iconst_1 Push int 1 to the top of the stack
0xb7 invokespecial Call superclass builder methods, instance initializer methods, private methods
0x1a iload_0 Push the first local variable of int to the top of the stack
0x10 bipush Push constant values of a single byte (-128~127) to the top of the stack

Now that we can see what Javap does, it basically finds the corresponding mnemonic of the bytecode and presents it to us. Let’s briefly look at how the default constructor above maps to the mnemonic based on the bytecode and presents it to us:

The leftmost number is the offset of each byte in the Code area, which is stored in the PC’s program count. For example, if the current instruction points to 1, the next one points to 4

We don’t define the default constructor in the source code, but we do in the bytecode, and you’ll see that we define private int a = 1 in the source code; But this variable assignment is performed in a constructor (as we’ll see below), which is why it’s important to understand bytecode: It reflects the actual logic of the JVM executing program, and the source code is just a representation.

Let’s take a look at how the constructor’s instructions are executed, starting with a look at how instructions are executed in the JVM.

  1. First, the JVM assigns a table of local variables to each method. Think of it as an array, where each pit (we call slot) is assigned to the variables in the method. In the case of instance methods, these local variables can be this, the method parameter, the local variables allocated in the method. The types of these local variables are known as int,long, references, and return addresses. Each slot is 4 bytes, so 8 bytes like long and Double occupy 2 slots. If this method is an instance method, The first slot is this pointer, or no this pointer if the method is static

  2. After the local variable table is allocated, if the method involves assignment, addition, subtraction, multiplication and division, the operation of these instructions will depend on the operand stack, and the corresponding operands of these instructions will be pressed and stacked to complete the execution of the instructions

For example, if there is an instruction such as int I = 69, the corresponding byte instruction is as follows

0:bipush 69
2:istore_0
Copy the code

Its operation in memory is as follows

It can be seen that there are two main steps: The first step isto first push the int value 69, and then pop it into the corresponding position of the local variable table I. Istore_0 represents the elastic stack, and the integer number eject from the operand stack is stored in the local variable. 0 represents the local variable in the 0th slot of the local variable table

With that in mind, how do the bytecode instructions for the default constructor perform

First we need to understand the above instructions

  • Aload_0: Loads a reference to the object in slot 0 from the local table of variables to the top of the operand stack, where 0 represents position 0, which is this
  • Invokespecial: used to call constructors, but can also be used to call private methods in the same class, as well as visible superclass methods, in this case calling the constructor of the parent class (because the #1 symbol reference points to the corresponding init method).
  • Iconst_1: Pushes int 1 to the top of the stack
  • Putfield: It takes an operand that refers to a field in the run-time constant pool, in this case a. The value assigned to this field, as well as the object reference containing it, pops off the top of the operand stack when this instruction is executed. The aload_0 directive pushed the object containing the field (this) onto the operand stack, and iconst_1 pushed 1 onto the stack. Finally, the putfield directive pops these two values off the top of the stack. The result of that is that the value of the a field of this object is updated to 1.

Let’s explain in detail what these mnemonics mean

  • The first command, aload_0, loads the object reference in slot 0 from the local table of variables to the top of the operand stack, i.e. loads this to the top of the stack, as follows

  • The second step, invokespecial #1, represents the ammunition stack and performs the corresponding method of #1. The meaning of #1 can be explained from the next (# Method java/lang/Object."<init>":()VCall the initialization method of the parent class.A subclass initializes from its parent class
  • Subsequent commandsaload_0.iconst_1.putfied #2The illustration below

Some people may be a little strange, the above 6: Javap-verbose path/ demo. class = javap-verbose path/ demo. class #2 This numeric representation is also known as a symbolic reference, which is converted to a direct reference at runtime

It follows that #2 represents the A property of the Demo class, as follows

Let’s take a look at the execution flow of Foo using the GIF again, and I’m sure you get the idea now

The only thing to notice in this example is that foo is a static method, so the local variable area does not have this.

As you can see, the JVM’s process of executing bytecode is exactly the same as the CPU’s process of executing machine code. It goes through four steps: fetch, decode, execute, and store the result of computation. First, the program counter points to the next instruction to be executed. The bytecode operands are converted into machine code (decoding) by the local execution engine, and the values are stored in a local variable area (storing the results of calculations) after execution

I recommend two tools for bytecode

  • One is Hex Fiend, a nice hexadecimal editor for looking at bytecodes
  • One is a plug-in for Intellij Ideajclasslib Bytecode viewerJavap-verbose displays the constant pool, interface, Code and other data corresponding to the javap-verbose command. It is very intuitive and helpful for analyzing bytecode

Next, how is bytecode loaded