This is the 29th day of my participation in the August More Text Challenge

Phase to recommend

  • Java Basics
  • Java concurrent programming

Multilingual compilation to bytecode runs in the JVM

A computer cannot run Java code directly. It must run the Java virtual machine first, and then the Java virtual machine runs the compiled Java code. This compiled Java code is the Java bytecode introduced in this article.

The reason why the JVM can’t run Java code directly is that on the CPU level everything in a computer is a compilation of instructions. Java is a high-level language, and only humans understand its logic, but computers can’t, so Java code has to be compiled into bytecode files. The JVM correctly recognizes the translated instructions and executes them.

  • Java code is translated into bytecode indirectly, and the files stored bytecode are read and executed by JVMS running on different platforms, so as to achieve the purpose of writing once and running everywhere.

  • The JVM also no longer only supports Java, resulting in a number of JVM-based programming languages, such as Groovy, Scala, Koltin, and so on.

The semantics of various variables, keywords, and operation symbols in the source code are eventually compiled into multiple bytecode commands. The semantic description provided by bytecode commands is significantly better than Java itself, so there are other JVA-BASED languages that provide many features that Java does not support.

Java bytecode files

A class file is essentially a binary stream based on 8-bit bytes, in which data items are arranged in a tight sequence. The JVM parses this binary data according to its specific rules to obtain relevant information.

A Class file uses a pseudo-structure to store data and has two types:

  1. Unsigned number
  • Unsigned numbers are basic data types. U1, U2, U4, and U8 represent unsigned numbers of 1 byte, 2 byte, 4 byte, and 8 byte respectively.
  • Unsigned numbers can be used to describe numbers, index references, quantitative values, or string values encoded in UTF-8.
  1. table
  • A compound data type consisting of multiple unsigned numbers or other tables as data items ending in “_info”. Tables are used to describe hierarchical composite structures of data, and the entire Class file is essentially a table. Consists of the following data items.
type The name of the The number of
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count-1
u2 access_flag 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
field_info fields fields_count
u2 methods_count 1
method_info民运分子 methods methods_count
u2 attributes_count 1
attribute_info attributes attributes_count

Class file structure:

class ClassFile {
 	u4 magic;  / / the number 0 xcafebabe
 	u2 minor_version;  // Version number
 	u2 major_version;  // Major version number
 	u2 constant_pool_count; // Constant pool counter
 	cp_info constant_pool[constant_pool_count-1];// The constant pool starts at 1
 	u2 access_flags;// Access the tag
 	u2 this_class;  / / class indexes
 	u2 super_class; // Parent index
 	u2 interfaces_count; // Interface counter
 	u2 interfaces[interfaces_count];/ / interface table
 	u2 fields_count;// Field counter
 	field_info fields[fields_count];/ / field list
 	u2 methods_count; // method counter
 	method_info methods[methods_count];/ / method table
 	u2 attributes_count;// Attribute counter
 	attribute_info attributes[attributes_count];/ / property sheet
}
Copy the code

The structural properties of the Class file

The magic number

The first four bytes of each Class file are called the Magic Number and have a value of 0xCAFEBABE. What it does is: When a JVM tries to load a file into memory, it first determines if the class file has a signature that the JVM considers acceptable. The JVM reads the first four bytes of the file and determines if the first four bytes are 0xCAFEBABE. If so, The JVM thinks it can be loaded and used as a class file.

The version number

  • The byte following the magic number stores the version number of the Class file:

    • The fifth and sixth bytes are Minor versions.
    • The seventh and eighth bytes are the Major Version numbers.
  • The Java version number starts at 45.

Constant pool

Next comes the constant pool data area.

  • The first two bytes occupy a position called the constant pool counter (constant_pool_count), which records the number of constant pool entries (cp_info) that are the components of the constant pool.

  • This is followed by a constant pool entry (CP_info), constant_POOL_count -1.

  • There are two main types of constants in the constant pool: literals and Symbolic references:

  • Literals are close to the Java language’s concepts of constants, such as text strings and constant values declared final.

  • Symbolic references include three types of constants: fully qualified names of classes and interfaces, field names and descriptors, and method names and descriptors.

  • Each constant in the constant pool is a table. All the 14 tables have one common feature: the first bit of the table is a u1 type tag (as shown in the “tag” column of the following table), which represents the constant type of the current constant.

The item type of the constant pool

type Tag (tag distinguishing type) describe
CONSTANT_Utf8_info 1 The character string is utF-8 encoded
CONSTANT_Integer_info 3 Integral literals
CONSTANT_Float_info 4 Floating point literals
CONSTANT_Long_info 5 Long integer literals
CONSTANT_Double_info 6 A double – precision floating-point literal
CONSTANT_Class_info 7 Symbolic reference to a class or interface
CONSTANT_String_info 8 String type literals
CONSTANT_Fieldref_info 9 Symbolic reference to a field
CONSTANT_Methodref_info 10 Symbolic references to methods in a class
CONSTANT_InterfaceMethodref_info 11 Symbolic reference to a method in an interface
CONSTANT_NameAndType_info 12 A partial symbolic reference to a field or method
CONSTANT_MethodHandle_info 15 Represents a method handle
CONSTANT_MethodType_info 16 Identify method types
CONSTANT_InvokeDynamic_info 18 Represents the call point of a dynamic method

Access tokens

Access_flags is a mask flag used to indicate the access permissions and underlying attributes of a class or interface.

Sign the name Flag values meaning
ACC_PUBLIC 0x0001 Whether the type is Public
ACC_FINAL 0x0010 Only the class can set whether or not to be declared final
ACC_SUPER 0x0020 Whether the new semantics of the Invokespecial bytecode instruction are allowed.
ACC_INTERFACE 0x0200 Flag this is an interface
ACC_ABSTRACT 0x0400 Whether it is of the abstract type. For interfaces or abstract classes, the second flag value is true and the other types are false
ACC_SYNTHETIC 0x1000 Indicates that this class is not generated by user code
ACC_ANNOTATION 0x2000 This is a note
ACC_ENUM 0x4000 Flag This is an enumeration

A collection of class indexes, parent indexes, and interface indexes

  • This_class and super_class indexes are a U2-type data, while interfaces are a set of U2-type data, which are used to determine the Class inheritance in the Class file.

  • The class index is used to determine the fully qualified name of the class, and the superclass index is used to determine the fully qualified name of the class’s parent. (All Java classes except java.lang.Object do not have a parent class index of 0); The interface index collection is used to describe which interfaces the class implements.

  • The class index, the superclass index, and the interface index collection are arranged in order after the access flag. The class index and the superclass index are represented by two index values of type U2, each pointing to a class descriptor constant of type CONSTANT_Class_info. A fully qualified name string defined in a constant of type CONSTANT_Utf8_info can be found by the index value in a constant of type CONSTANT_Class_info.

  • A collection of interface indexes. The first entry of the entry — u2 type data is the interface counter (interfaces_count), which represents the capacity of the index table. If the class does not implement any interface, the counter value is 0 and the interface index table that follows does not occupy any bytes.

Field in the table

Each member of the fields[] array must be a data item of a fields_info structure that represents a complete description of a field in the current class or interface. The fields[] array describes all fields declared by the current class or interface, except those inherited from the parent class or interface.

type The name of the The number of note
u2 access_flags 1 Field access flag
u2 name_index 1 Simple name of field (reference to constant pool)
u2 descriptor_index 1 Field and method descriptors (references to constant pools)
u2 attributes_count 1
attribute_info attributes attributes_count
field_info {
u2 access_flags;// Mask flags for the fields to be accessed and the underlying attributes.
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Copy the code

The descriptor identifies the character meaning

Identification character meaning Identification character meaning
B Basic type byte J Base type long
C Base type char S Basic type short
D Base type double Z Basic type Boolean
F Base type float V Base type void
I Basic int L Object types, such as Ljava/lang/Object

Method table

Method table, each member of the methods[] array must be a data item in a method_info structure that represents a complete description of a method in the current class or interface. If the access_flags entry for a method_info structure has neither the ACC_NATIVE nor ACC_ABSTRACT flag set, its corresponding method body should be able to be loaded directly from the current class by the Java virtual machine without reference to other classes. The method_info structure can represent all methods defined in classes and interfaces, including instance methods, class methods, instance initializer method methods, and class or interface initializer method methods. The methods[] array describes only methods declared in the current class or interface, not methods inherited from a parent class or interface.

Property sheet

Attribute table, the value of each item in the Attributes table must be an Attribute_info structure.

In the Java 7 specification, the attributes table entry in the Class file structure contains the following defined attributes: InnerClasses, EnclosingMethod, Synthetic, Signature, SourceFile SourceDebugExtension, Deprecated, RuntimeVisibleAnnotations, RuntimeInvisibleAnnotations and BootstrapMethods properties.

A common format

attribute_info {
u2 attribute_name_index;
u4 attribute_length;
u1 info[attribute_length];
}
Copy the code
The attribute name Use location meaning
Code Method table Bytecode instructions compiled into Java code
ConstantValue Field in the table Constant value defined by the final keyword
Deprecated Classes, method tables, fields Methods and fields declared deprecated
Exceptions Method table Method throws a list of exceptions
EnclosingMethod The class file This property is available only if a class is local or anonymous and identifies the enclosing method of the class
InnerClass The class file Inner class list
LineNumberTable Code attributes The mapping of Java source line numbers to bytecode instructions
LocalVariableTable Code attributes Method local variable description
StackMapTable Code attributes New property in JDK1.6 that allows the new type checker to check and process classes that are required to match local variables and operands of the target method
Signature Classes, method tables, fields Used to support method signatures in case of generics
SourceFile The class file Record the source file name
SourceDebugExtension The class file Store additional debugging information
Synthetic Classes, method tables, fields Flag methods or fields are automatically generated by the compiler
LocalVariableTypeTable class The use of characteristic signatures instead of descriptors was added to describe generic parameterized types after the introduction of generic syntax
RuntimeVisibleAnnotations Classes, method tables, fields Support for dynamic annotations
RuntimeInvisibleAnnotations Classes, method tables, fields Use to indicate which annotations are not visible at runtime
RuntimeVisibleParameterAnnotations Method table Role similar to RuntimeVisibleAnnotations attribute, only role for the object
RuntimeInvisibleParameterAnnotations Method table Like RuntimeInvisibleAnnotations attribute, function as object which for method parameters
AnnotationDefault Method table Use to record the default value of the annotation class element
BootstrapMethos The class file Bootstrap qualifier used to hold an InvokedDynamic instruction reference
RuntimeVisibleTypeAnnotations Class, method table, field, Code attribute Specifies which annotations are visible at runtime
RuntimeInvisibleTypeAnnotations Class, method table, field, Code attribute Indicates which annotations are not visible at runtime
MethodParameters Method table Used to support (with the -parameters parameter at compile time) compiling method names into class files and fetching them at run time
Module class Record a Module name and related information
ModulePackages class Used to record packages exported or opened in a module
ModuleMainClass class Specifies an instruction for a module
NestHost class Reflection and access control apis used to support nested classes, through which an inner class learns about its host class
NestMembers class Reflection and access control apis to support nested classes, which a host class uses to know which inner classes it has

example

The following steps through bytecode with a simple example.

//Main.java
public class Main {
    
    private int m;
    
    public int inc() {
        return m + 1; }}Copy the code

You can run the following command to generate a main. class file in the current directory.

javac Main.java
Copy the code

Open the generated class file as text with the following content:

cafe babe 0000 0034 0013 0a00 0400 0f09
0003 0010 0700 1107 0012 0100 016d 0100
0149 0100 063c 696e 6974 3e01 0003 2829
5601 0004 436f 6465 0100 0f4c 696e 654e
756d 6265 7254 6162 6c65 0100 0369 6e63
0100 0328 2949 0100 0a53 6f75 7263 6546
696c 6501 0009 4d61 696e 2e6a 6176 610c
0007 0008 0c00 0500 0601 0010 636f 6d2f
7268 7974 686d 372f 4d61 696e 0100 106a
6176 612f 6c61 6e67 2f4f 626a 6563 7400
2100 0300 0400 0000 0100 0200 0500 0600
0000 0200 0100 0700 0800 0100 0900 0000
1d00 0100 0100 0000 052a b700 01b1 0000
0001 000a 0000 0006 0001 0000 0003 0001
000b 000c 0001 0009 0000 001f 0002 0001
0000 0007 2ab4 0002 0460 ac00 0000 0100
0a00 0000 0600 0100 0000 0800 0100 0d00
0000 0200 0e

Copy the code

For the hexadecimal code in the file, aside from cafe Babe at the beginning, the rest of the text roughly translates as: what the hell……

Don’t panic heroes, let’s start with cafe Babe as we know it. The first four bytes of the file are called magic numbers. Only class files starting with “cafe Babe “are accepted by the virtual machine. These four bytes are the bytecode file identification. 0,0034 in decimal is 52, which is the major version number. The Java version number starts from 45, except for 1.0 and 1.1, which use 45.x. After each liter, the version number increases by one. That is, the JDK version that compiled the class file was 1.8.0. You can obtain the result by running the Java -version command.

Java(TM) SE Runtime Environment (build 1.8. 0 _131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Copy the code

The results were verified.

Decompile bytecode files

Bytecode files can be decompiled using Javap, a built-in Java decompiler. You can learn about javap usage by using javap-help

Javap <options> <classes> where possible options include: -help --help -? Output this usage message -version version information -V-verbose Output additional information -L Output line number and local variable table -public Displays only public classes and members -protected Displays protected/public classes and members -package Displays package/protected/public classes and members (default) -p -private Displays all classes and members -c disassembles code -S outputs internal type signatures -sysInfo Displays system information (path, size, date, MD5 hash) -constants Displays final constants -classpath <path> Specifies the location where the user class file is to be found -cp <path> Specifies the location where the user class file is to be found -bootclasspath <path> Overwrites the location of the bootclass fileCopy the code

Run the javap -verbose -p main. class command to view the output.

Classfile /E:/JavaCode/TestProj/out/production/TestProj/com/rhythm7/Main.class
  Last modified 2018-4-7; size 362 bytes
  MD5 checksum 4aed8540b098992663b7ba08c65312de
  Compiled from "Main.java"
public class com.rhythm7.Main
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC.ACC_SUPER
Constant pool# 1:= Methodref          #4.#18         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#19         // com/rhythm7/Main.m:I
   #3 = Class              #20            // com/rhythm7/Main
   #4 = Class              #21            // java/lang/Object
   #5 = Utf8               m
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               Lcom/rhythm7/Main;
  #14 = Utf8               inc
  #15 = Utf8               ()I
  #16 = Utf8               SourceFile
  #17 = Utf8               Main.java
  #18 = NameAndType        #7: #8          // "<init>":()V
  #19 = NameAndType        #5: #6          // m:I
  #20 = Utf8               com/rhythm7/Main
  #21 = Utf8               java/lang/Object
{
  private int m;
    descriptor: I
    flags: ACC_PRIVATE

  public com.rhythm7.Main();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/rhythm7/Main;

  public int inc();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: getfield      #2                  // Field m:I
         4: iconst_1
         5: iadd
         6: ireturn
      LineNumberTable:
        line 8: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       7     0  this   Lcom/rhythm7/Main;
}
SourceFile: "Main.java"
Copy the code

Method table collection

After the constant pool is the description of the methods inside the class, represented as a collection of tables in the bytecode, regardless of the hexadecimal content of the bytecode file, let’s go straight to the decompiled content.

private int m;
  descriptor: I
  flags: ACC_PRIVATE
Copy the code

Here we declare a private variable m of type int and return int

public com.rhythm7.Main();
   descriptor: ()V
   flags: ACC_PUBLIC
   Code:
     stack=1, locals=1, args_size=1
        0: aload_0
        1: invokespecial #1                  // Method java/lang/Object."<init>":()V
        4: return
     LineNumberTable:
       line 3: 0
     LocalVariableTable:
       Start  Length  Slot  Name   Signature
           0       5     0  this   Lcom/rhythm7/Main;

Copy the code

Here is the constructor: Main(), which returns void and exposes the method. The main attributes in code are:

  • stack

The maximum operand stack that the JVM assigns to the depth of the operands in the Frame, which is 1

  • locals:

The unit is Slot. Slot is the minimum unit of 4 bytes used by a VM to allocate memory for local variables. Method parameters (including the hidden parameter this in the instance method), display exception handler parameters (exceptions defined by the catch block in the try catch), and local variables defined in the method body need to be stored using the local variable table. It is worth noting that the size of locals does not necessarily equal the sum of the slots occupied by all local variables, since slots in local variables can be reused.

  • args_size:

The number of method arguments, in this case 1, because each instance method has a hidden parameter this

  • attribute_info

Method body, 0,1,4 is the bytecode “line number “, which means to push the first reference type local variable to the top of the stack, then execute the instance method of that type, which is the first variable stored in the constant pool, which is” Java /lang/Object.”” ()V” in the comment, and then execute the return statement. End method.

  • LineNumberTable

This property describes the mapping between the source line number and the bytecode line number (the bytecode offset). You can use the -g: None or -g:lines options to disable or require this information to be generated. If you choose not to generate LineNumberTable, you will not be able to obtain the source line number of the exception when the program runs abnormally, and you will not be able to debug the program based on the number of lines of the source code.

  • LocalVariableTable

This property describes the relationship between local variables in the frame stack and variables defined in the source code. You can cancel or generate this information by using -g: None or -g:vars. If this information is not generated, then when someone references the method, the parameter name will not be obtained. Instead, placeholders such as arg0, arg1 will be used. Start indicates the line on which the local variable is visible, length indicates the number of visible lines, Slot indicates the frame stack position, Name indicates the variable Name, and then the type signature.

The same can be said of another “inc()” method in the Main class: this is pushed, field #2 is taken and placed at the top of the stack, 1 of int is pushed, the top two values are added together, and an int is returned.

In actual combat

Analysis of the try-catch – finally

From the simplest example above, you can get an idea of what source code looks like when compiled into bytecode. The following uses the knowledge learned to analyze some Java problems:

public class TestCode { public int foo() { int x; try { x = 1; return x; } catch (Exception e) { x = 2; return x; } finally { x = 3; }}}Copy the code

Ask what the return value of foo() is when no exception occurs and when an exception occurs.

javac TestCode.java
javap -verbose TestCode.class
Copy the code

View the contents of the foo method of bytecode:

public int foo(); Descriptor: ()I flags: ACC_PUBLIC Code: stack=1, locals=5, args_size=1 0: iconst_1 //int 1 -> top of stack=1 1: Istore_1 // Push the second int local variable to the top of the stack -> local 2=1 2: iloAD_1 // push the second int local variable to the top of the stack -> top =1 3: istore_2 //!! Iconst_3 =1 4: iconST_3 //int 3= 3 5: istore_1 // Int 2=3 6: iloAD_2 //!! Int ->1 8: astore_2 // -> local 3=Exception 9: iconST_2 // -> top of stack =2 10: Istore_1 / / - > local 2 = 2 11: iload_1 = 2 12: / / - > stack istore_3 / /!!!!! - > local 4 = 2:13 iconst_3 = 3 14: / / - > stack istore_1 / / - > local 1 = 3 15: iload_3 / /!!!!! -> top of stack =2 16: ireturn // -> 2 17: astore 4 =any 19: iconst_3 -> top of stack 3 20: Istore_1 // put the first int on top of the stack into the second local variable -> local 2=3 21: aload 4 // push the fifth local variable (reference type) to the top of the stack 23: athrow // throw Exception table: From to target type 0 4 8 Class Java /lang/Exception // Exception 0 4 17 ANY //Exeption 8 13 17 ANY 17 19 17 ANYCopy the code

The same operation is performed in 4,5, and 13,14 of the bytecode, which pushes the int 3 to the top of the operand stack and stores the second local variable. This is exactly what our source code contains in the finally statement block. That is, when the JVM handles an exception, the finally statement is repeated on every possible branch.

Through step by step analysis of bytecode, the final running result can be obtained:

  • If no exception occurs: return 1
  • When an exception occurs: return 2
  • An Exception that is not an Exception or its subclasses is thrown and no value is returned

Kotlin function extension implementation

Kotlin provides a language feature for extension functions that allows you to add custom methods to any object. The following example adds the “sayHello” method to an Object

//SayHello.kt
package com.rhythm7

fun Any.sayHello() {
    println("Hello")}Copy the code

Once compiled, use Javap to look at the bytecode that generated the sayHellokt.class file.

Classfile /E:/JavaCode/TestProj/out/production/TestProj/com/rhythm7/SayHelloKt.class
Last modified 2018-4-8; size 958 bytes
 MD5 checksum 780a04b75a91be7605cac4655b499f19
 Compiled from "SayHello.kt"
public final class com.rhythm7.SayHelloKt
 minor version: 0
 major version: 52
 flags: ACC_PUBLIC.ACC_FINAL.ACC_SUPER
Constant poolOmit constant pool part of bytecode{
 public static final void sayHello(java.lang.Object);
   descriptor: (Ljava/lang/Object;) Vflags: ACC_PUBLIC, ACC_STATIC, ACC_FINAL
   Code:
     stack=2, locals=2, args_size=1
        0: aload_0
        1: ldc           #9                  // String $receiver
        3: invokestatic  #15                 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object; Ljava/lang/String;) V
        6: ldc           #17                 // String Hello
        8: astore_1
        9: getstatic     #23                 // Field java/lang/System.out:Ljava/io/PrintStream;
       12: aload_1
       13: invokevirtual #28                 // Method java/io/PrintStream.println:(Ljava/lang/Object;) V
       16: return
     LocalVariableTable:
       Start  Length  Slot  Name   Signature
           0      17     0 $receiver   Ljava/lang/Object;
     LineNumberTable:
       line 4: 6
       line 5: 16
   RuntimeInvisibleParameterAnnotations:
     0:
       0: #7()}SourceFile: "SayHello.kt"
Copy the code

If you look at the header,koltin has generated a class for the file SayHello with the name “com.rhythm7.sayhellokt “. Sayhello. kt cannot be instantiated because we didn’t want SayHello to be an instantiable object class when we first wrote sayHello. kt. SayHelloKt doesn’t have a constructor. Look at the only method: find that the concrete implementation of any.sayHello () is of the form of a statically immutable method:

public static final void sayHello(java.lang.Object);
Copy the code

So when we use any.sayHello () elsewhere, we are essentially calling Java’s sayHellokt.sayHello (Object) method. Incidentally, when the extended method is Any, which means that Any is non-null, the compiler checks for non-null arguments at the beginning of the method body. Is called kotlin. JVM. Internal. Intrinsics. CheckParameterIsNotNull (Object value, String paramName) method to check the incoming whether Any type of Object is empty. If we extend the function to Any? .sayhello (), the bytecode will not appear in the compiled file.

References:

Java Virtual Machine Specification Java SE 7 Chinese Version