1. The bytecode

1.1 What is bytecode?

Java can be “compiled once, run anywhere” because the JVM is customized for a variety of operating systems and platforms, and because fixed-format bytecodes (.class files) can be compiled and produced for use by the JVM on any platform. Thus, you can also see how important bytecode is to the Java ecosystem. It is called bytecode because bytecode files are made up of hexadecimal values, and the JVM reads them in bytes in groups of two hexadecimal values. In Java, source code is typically compiled into a bytecode file using the Javac command, a. Java file from compile to run as shown in Figure 1.

For developers, understanding bytecodes makes it more accurate and intuitive to understand deeper aspects of the Java language, such as how the Volatile keyword works on bytecodes. In addition, bytecode enhancement techniques are frequently used in SpringAOP, various ORM frameworks, and hot deployment, and it is helpful to understand how they work. In addition, the JVM specification makes it possible to run on the JVM as long as it eventually produces compliant bytecode, so this gives various languages running on the JVM (Scala, Groovy, Kotlin) an opportunity to extend features or implement syntactic sugar that Java doesn’t have. Learning these languages after you understand bytecode is “upstream” and “easy to learn” from a bytecode perspective.

This article focuses on bytecode enhancement technology, from bytecode layer by layer, from JVM bytecode operation set to Java bytecode operation framework, and then we are familiar with various framework principles and applications, will also be introduced one by one.

1.2 Bytecode structure

The.java file is compiled by Javac to produce a.class file, such as writing a simple ByteCodeDemo class, as shown on the left side of Figure 2 below:

After compilation, the bytecodeDemo. class file is generated. When opened, a heap of hexadecimal numbers are displayed, which are divided in bytes, as shown in the right part of Figure 2. As mentioned earlier, the JVM has specification requirements for bytecode, so what structure does a seemingly messy hexadecimal fit into? The JVM specification requires that each bytecode file be made up of ten parts in a fixed order, as shown in Figure 3. In the following sections, we will introduce each of the ten parts:

(1) Magic Number

The first four bytes of all.class files are magic numbers with a fixed value of 0xCAFEBABE. The magic number is placed at the beginning of the file, which the JVM can use to determine if the file is likely to be a.class file, and if so, proceed with subsequent operations.

Interestingly, the magic number’s fixed value was defined by JamesGosling, the father of Java, as CafeBabe, while the Java icon is a cup of coffee.

(2) Version number

The Version number is the first four bytes after the magic number. The first two bytes indicate the Minor Version and the last two bytes indicate the Major Version. In figure 2, the version number is 00 00 00 34. The minor version number is 0 in decimal notation, and the major version number is 52 in decimal notation. The major version number of 52 is 1.8 in Oracle official website, so the Java version number of this file is 1.8.0.

(3) Constant Pool

The byte immediately after the major version number is the constant pool entry. The constant pool stores two types of constants: literals and symbolic references. Literals are constant values declared Final in code, and symbols refer to globally qualified names of classes and interfaces, field names and descriptors, and method names and descriptors. The constant pool is divided into two parts: the constant pool counter and the constant pool data area, as shown in Figure 4 below.

  • Constant pool counter (constant_pool_count) : Since the number of constants is not fixed, two bytes need to be placed to represent the constant pool capacity count. The first 10 bytes of the bytecode of the example code in Figure 2 are shown in Figure 5 below, converting the hexadecimal 24 to a decimal value of 36, excluding the subscript “0,” which means that the class file has 35 constants.

  • Constant pool data area: The data area is composed of (constant_pool_count-1) cp_INFO structures, each cp_INFO structure corresponds to a constant. There are 14 types of CP_info in bytecode (see Figure 6 below), and the structure of each type is fixed.

Take CONSTANT_utf8_info as an example. Its structure is shown on the left side of Figure 7 below. The first byte, “tag,” takes its value from the tag of the corresponding item in figure 6, and since its type is UTf8_INFO, its value is “01.” The next two bytes identify the Length of the string, Length, and Length specifies the value of the string. Extract a CP_INFO structure from the bytecode in Figure 2, as shown on the right side of Figure 7 below. This constant is a string of type UTF8 with a length of one byte and data of “A”.

Other types of cp_info structures will not be described in this article. The overall structure is similar, with the type first identified by a Tag, followed by n bytes to describe the length and/or data. You can use the javap-verbose ByteCodeDemo command to view the JVM’s decomcompiled constant pool, as shown in Figure 8 below. You can see that the decompilation results in a clear representation of the type and value of each cp_INFO structure.

(4) Access marks

Two bytes after the constant pool ends, describing whether the Class is a Class or an interface, and whether it is modified by Public, Abstract, Final, etc. The JVM specification specifies access flags (Access_flags) as shown in Figure 9 below. It is important to note that all the access token the JVM is not exhaustive, but use, is the bitwise or operator to described, such as a class of modifier to Public Final, is the value of the corresponding access modifiers for ACC_PUBLIC | ACC_FINAL, Namely 0 x0001 | 0 x0011 x0010 = 0.

(5) Name of the current class

The two bytes after the access flag describe the fully qualified name of the current class. These two bytes hold the value of the index in the constant pool, from which the fully qualified name of the class can be found.

(6) Name of the parent class

The two bytes after the name of the current class describe the fully qualified name of the parent class, again holding the index value in the constant pool.

(7) Interface information

The parent class name is followed by a two-byte interface counter that describes the number of interfaces implemented by the class or parent class. The next n bytes are the index values of string constants for all interface names.

(8) Field table

Field tables are used to describe variables declared in classes and interfaces, including class-level variables and instance variables, but not local variables declared inside methods. The field table is also divided into two parts. The first part is two bytes, describing the number of fields. The second part is fields_info with details for each field. The structure of the field table is as follows:

Take the bytecode field table in Figure 2 as an example, as shown in Figure 11 below. The access flag of the field in figure 9,0002 corresponds to Private. In the constant pool in Figure 8, the index subscript is used to obtain the field name “A” and the descriptor “I” (representing int). Private int A private int A private int A

(9) Method table

The method table is also composed of two parts. The first part is two bytes describing the number of methods. The second part provides detailed information for each method. Method details are complex, including method access flags, method names, method descriptors, and method attributes, as shown in the following figure:

The permission modifier for a method can still be queried using the values shown in Figure 9. The method name and method descriptor are both indexed values in the constant pool, which can be found by index values. However, the part of “method properties” is more complicated, and it is directly decompiled with javap-verbose to interpret the human-readable information, as shown in Figure 13. You can see that there are three parts to the property:

  • The “Code area” : the corresponding JVM instruction opcode of the source Code, which is the focus of bytecode enhancement.

  • LineNumberTable: a LineNumberTable that matches the opcodes in the Code area with the line numbers in the source Code. This table is used for debugging (how many JVM opcodes are required for a line in the source Code).

  • “LocalVariableTable” : a LocalVariableTable that contains This and local variables. This can be called inside each method because the JVM implicitly passes This as the first argument to each method. This is, of course, for non-static methods.

(10) Additional attribute table

The last part of the bytecode that holds basic information about attributes defined by a class or interface in the file.

1.3 Set of bytecode operations

In Figure 13, the red numbers 0 to 17 in the Code field are the opcodes that the method source Code in Java compiles for the JVM to actually execute. To help people understand, decompilation shows the corresponding mnemonic of the hexadecimal opcodes, the corresponding relationship between the hexadecimal value opcodes and the mnemonic, and the use of each opcode. You can refer to the official Oracle documentation to learn about it and refer to it when necessary. For example, the first mnemonic in the figure above, iconst_2, corresponds to the bytecode 0x05 in Figure 2, which is used to push the int value 2 onto the operand stack. Similarly, understanding the mnemonics from 0 to 17 is a complete implementation of the add() method.

1.4 Operand stack and bytecode

JVM instruction set is based on the stack instead of register, based on the stack can have a good cross-platform sex (as often register instruction sets and hardware hook), but the downside is that to do the same operation, based on the realization of the stack need more instructions to complete (because the stack is a structure of FILO, need frequent pressure out of the stack). Also, because stacks are implemented in memory and registers are in the CPU’s cache, stack-based speed is much slower, another sacrifice for cross-platform performance.

The opcodes, or sets of operations, we mentioned above control the operand stack of the JVM. In order to have a more intuitive sense of how opcodes control the operand stack and understand the function of constant pool and variable table, the operation of add() method on the operand stack is made into GIF, as shown in Figure 14 below. In the figure, only the part referenced in the constant pool is captured, starting with the instruction iconst_2 and ending with the instruction ireturn. Corresponding to the instructions in Code area 0~17 in FIG. 13:

1.5 Viewing bytecode Tools

If you use the Javap command every time you look at the decompiled bytecode, it can be tedious. One Idea plugin is recommended: Jclasslib. After the code is compiled, select “Show Bytecode With jclasslib” from the menu bar “View”, and you can intuitively see the class information, constant pool, method area and other information of the current Bytecode file.

2. Bytecode enhancement

In this paper, the structure of bytecode is introduced, which lays a foundation for us to understand the implementation of bytecode enhancement technology. Bytecode enhancement is a kind of technology that can modify existing bytecode or dynamically generate new bytecode files. Next, we’ll delve into the most directly manipulated bytecode implementations.

2.1 the ASM

For requirements that require manual manipulation of bytecode, YOU can use ASM, which can either generate.class bytecode files directly or dynamically modify the class behavior before the class is loaded into the JVM (as shown in Figure 17 below). Application scenarios for ASM include AOP (Cglib is based on ASM), hot deployment, modifying classes in other JARS, and so on. Of course, such low-level steps are also more difficult to implement. Next, this article introduces two APIS for ASM and uses ASM to implement a somewhat crude AOP approach. But before we do that, it is highly recommended that you take a look at the visitor pattern in order to get a faster understanding of the ASM process. In short, the visitor pattern is used to modify or manipulate data with stable data structures. As we learned in Chapter 1, the structure of bytecode files is fixed by the JVM, so the visitor pattern is suitable for modifying bytecode files.

2.1.1 ASM API

2.1.1.1 core API

The ASM Core API can mimic the SAX approach to parsing XML files by streaming bytecode files without reading in the entire structure of the class. The advantage is that it is very memory efficient, but difficult to program. However, for performance reasons, programming typically uses the Core API. There are several key classes in the Core API:

  • ClassReader: Used to read compiled. Class files.

  • ClassWriter: Used to rebuild a compiled class, such as changing the class name, attributes, and methods, or to generate a bytecode file for a new class.

  • A variety of Visitor classes: As mentioned above, the CoreAPI processes from top to bottom based on the bytecode, with different visitors for different areas of the bytecode file, Such as MethodVisitor to access methods, FieldVisitor to access class variables, AnnotationVisitor to access annotations, and so on. To implement AOP, the focus is on using the MethodVisitor.

2.1.1.2 tree API

The ASM Tree API can read the entire class structure into the memory, which is similar to the DOM parsing method in XML files. The disadvantage is that it consumes a lot of memory, but the programming is relatively simple. The TreeApi differs from the CoreAPI in that the TreeApi maps various areas of bytecode through various Node classes, which can be best understood by analogy to DOM nodes.

2.1.2 Use ASM directly to implement AOP

Leverage ASM’s CoreAPI to enhance classes. Here does not tangle with AOP terminology such as slicing, notification, only in method invocation before, after the increase of logic, easy to understand and easy to understand. Start by defining the Base class that needs to be enhanced: it contains only a process() method that outputs a line of “process”. After the enhancement, we expect the method to print “start” before execution and “end” after.

public class Base {
    public void process(){
        System.out.println("process"); }}Copy the code

To implement AOP with ASM, you need to define two classes: a MyClassVisitor class for visits and modifications to bytecode; The other is a Generator class that defines ClassReader and ClassWriter, where the logic is that ClassReader reads the bytecode and passes it to the MyClassVisitor class for processing. After processing, the ClassWriter writes the section code and replaces the old bytecode. The Generator class is simpler, so let’s take a look at its implementation, as shown below, and then focus on the MyClassVisitor class.

import org.objectweb.asm.ClassReader; import org.objectweb.asm.ClassVisitor; import org.objectweb.asm.ClassWriter; Public Class Generator {public static void main(String[] args) throws Exception {// Read ClassReader ClassReader = new ClassReader("meituan/bytecode/asm/Base"); ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS); // Handle ClassVisitor ClassVisitor = new MyClassVisitor(classWriter); classReader.accept(classVisitor, ClassReader.SKIP_DEBUG); byte[] data = classWriter.toByteArray(); F = new File("operation-server/target/classes/meituan/bytecode/asm/Base.class");
        FileOutputStream fout = new FileOutputStream(f);
        fout.write(data);
        fout.close();
        System.out.println("now generator cc success!!!!!"); }}Copy the code

MyClassVisitor inherits from ClassVisitor for bytecode observation. It also contains an inner class, MyMethodVisitor, which inherits from MethodVisitor for observing methods within the class. The overall code looks like this:

import org.objectweb.asm.ClassVisitor; import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; public class MyClassVisitor extends ClassVisitor implements Opcodes { public MyClassVisitor(ClassVisitor cv) { super(ASM5, cv); } @Override public void visit(int version, int access, String name, String signature, String superName, String[] interfaces) { cv.visit(version, access, name, signature, superName, interfaces); } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions); // There are two methods in the Base class: the no-argument constructor and the Process method. The constructor is not enhanced hereif(! name.equals("<init>") && mv ! = null) { mv = new MyMethodVisitor(mv); }return mv;
    }
    class MyMethodVisitor extends MethodVisitor implements Opcodes {
        public MyMethodVisitor(MethodVisitor mv) {
            super(Opcodes.ASM5, mv);
        }

        @Override
        public void visitCode() {
            super.visitCode();
            mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
            mv.visitLdcInsn("start");
            mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false);
        }
        @Override
        public void visitInsn(int opcode) {
            if((opcode > = Opcodes. IRETURN && opcode < = Opcodes. RETURN) | | opcode = = Opcodes. ATHROW) {/ / method before returning, printing"end"
                mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
                mv.visitLdcInsn("end");
                mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false); } mv.visitInsn(opcode); }}}Copy the code

Using this class, you can modify the bytecode. To read the code in detail, the steps to modify the bytecode are:

  • The visitMethod method in the MyClassVisitor class determines which method the bytecode is currently reading. After the constructor “” is skipped, the methods that need to be enhanced are handed over to the inner class MyMethodVisitor for processing.

  • Next, go to the visitCode method in the inner class MyMethodVisitor, which is called when ASM starts accessing the Code section of a method, and override the visitCode method, placing the AOP lead-in logic there.

  • MyMethodVisitor continues to read bytecode instructions, and the visitInsn method in MyMethodVisitor is called every time ASM accesses a no-argument instruction. We determine whether the current directive is a no-argument “return” directive, and if it is, we add some directives in front of it, putting AOP’s post-logic into the method.

  • To sum up, AOP can be implemented by overriding the two methods in MyMethodVisitor, which require manual writing or modification of bytecode using ASM writing. Insert bytecode by calling the visitXXXXInsn() method of methodVisitor, where XXXX corresponds to the corresponding opcode mnemonic type, for example, mv.visitLdcinsn (“end”) corresponds to LDC “end”, Push the string “end” onto the stack.

After completing the two Visitor classes, run the main method in Generator to complete the bytecode enhancement of the Base class, which can be viewed in the Base. Class file in the compiled Target folder and you can see that the decomcompiled code has changed (as shown on the left side of Figure 18). Then write a test class, MyTest, in which new Base() and call the base.process() method, and you can see the AOP implementation on the right side of the figure below:

2.1.3 ASM tools

When using ASM handwritten bytecode, you need to use a series of visitXXXXInsn() methods to write the corresponding mnemonic, so you need to first convert each line of source code into a mnemonic, and then convert it to visitXXXXInsn() through ASM syntax. The first step of converting source code to mnemonics is troublesome enough. If you are not familiar with the set of bytecode operations, you need to compile and decompile the code to get the corresponding mnemonics of the source code. The second step using ASM to write section code, how to pass the parameter is also very troublesome. The ASM community is also aware of these two issues and provides an Outline of the ASM ByteCode tool.

Once installed, right-click “Show Bytecode Outline” and select the “ASMified” TAB in a new TAB, as shown in Figure 19, to see how the code in this class corresponds to ASM. The top and bottom red boxes correspond to the pre-logic and post-logic in AOP respectively. Copy these two pieces directly to visitMethod() and visitInsn() in Visitor.

2.2 the Javassist

ASM manipulates bytecode at the instruction level, and the intuitive feeling from reading above is that frameworks for manipulating bytecode at the instruction level are somewhat obscure to implement. So in addition to that, let’s briefly introduce another class of frameworks: Javassist, a framework that emphasizes source level manipulation of bytecode.

When you implement bytecode enhancement with Javassist, you don’t have to worry about the rigid structure of bytecode, which has the advantage of being easy to program. You can dynamically change the structure of a class or dynamically generate a class using Java encoded form without needing to understand virtual machine instructions. The most important classes are ClassPool, CtClass, CtMethod, and CtField:

  • CtClass (compile-time class) : compile-time class information, which is an abstract representation of a class file in code. A CtClass object can be obtained by using the fully qualified name of a class to represent the class file.

  • ClassPool: From a development perspective, ClassPool is a HashTable that stores CtClass information. The Key is the class name and the Value is the CtClass object corresponding to the class name. When we need to modify a class, we get the CtClass from the pool using the pool.getctClass (“className”) method.

  • CtMethod and CtField: these two are easier to understand. They correspond to methods and attributes in a class.

With these four classes in mind, we can write a small Demo that demonstrates the simplicity and speed of Javassist. We have also enhanced the Base process() method by printing “start” and “end” before and after the method call. All we need to do is get the CtClass object and its methods from the Pool, and then execute the method.insertbefore and insertAfter methods, taking the Java code to insert, and passing it in as a string.

import com.meituan.mtrace.agent.javassist.*;

public class JavassistTest {
    public static void main(String[] args) throws NotFoundException, CannotCompileException, IllegalAccessException, InstantiationException, IOException {
        ClassPool cp = ClassPool.getDefault();
        CtClass cc = cp.get("meituan.bytecode.javassist.Base");
        CtMethod m = cc.getDeclaredMethod("process");
        m.insertBefore("{ System.out.println(\"start\"); }");
        m.insertAfter("{ System.out.println(\"end\"); }");
        Class c = cc.toClass();
        cc.writeFile("/Users/zen/projects"); Base h = (Base)c.newInstance(); h.process(); }}Copy the code

3. Overloading of runtime classes

3.1 Problem Elicitation

The previous chapter highlighted two different types of bytecode manipulation frameworks, both of which leverage rougher AOP implementations. In order to facilitate the understanding of bytecode enhancement techniques, we have divided the ASM implementation process into two Main methods: the first is the MyClassVisitor to modify the compiled Class file, and the second is the New object and call. This does not involve the JVM runtime reloading of the class, but rather replacing the bytecode of the compiled class with ASM in the first Main method, and using the replaced new class information directly in the second Main method. In addition, in the Javassist implementation, we only load the Base class once, and there is no runtime reloading of the class.

What happens if we load a class in a JVM, then bytecode enhance it and reload it? To simulate this, we simply add Base b=new Base() to the first line of the main() method in the Javassist Demo above, which means that the JVM loads the Base class before the enhancement and then throws an error when the c.Toclass () method is executed, as shown in Figure 20 below. Following the c.toclass () method, we see that it reported an error by calling ClassLoader’s Native method defineClass() at the end. That is, the JVM is not allowed to dynamically reload a class at run time.

Obviously, if a class can only be enhanced before it is loaded, the use of bytecode enhancement techniques becomes very narrow. The desired effect is that in a continuously running JVM where all classes have been loaded, the class behavior can be replaced and reloaded using bytecode enhancement. To simulate this, we rewrite the Base class to write the main method, call the process() method every five seconds, and print a line of “process” in the process() method.

Our goal is to replace the process() method with “start” and “end” printed before and after it while the JVM is running. That is, at run time, the print every five seconds is changed from “process” to “start process end”. How do you solve the problem that the JVM does not allow runtime reloading of class information? To do this, let’s go through the Java class libraries we need.

import java.lang.management.ManagementFactory;

public class Base {
    public static void main(String[] args) {
        String name = ManagementFactory.getRuntimeMXBean().getName();
        String s = name.split("@") [0]; // Print the current Pid system.out.println ("pid:"+s);
        while (true) {
            try {
                Thread.sleep(5000L);
            } catch (Exception e) {
                break;
            }
            process();
        }
    }

    public static void process() {
        System.out.println("process"); }}Copy the code

3.2 Instrument

Instrument is a library provided by the JVM that modifies loaded classes and provides support for staking services written specifically in the Java language. It depends on the Attach API mechanism of JVMTI, which will be introduced in the next section. Prior to JDK 1.6, Instrument could only take effect when the JVM started loading classes. After JDK 1.6, Instrument supported changes to class definitions at run time. To use Instrument’s class modification capabilities, we need to implement its provided ClassFileTransformer interface and define a ClassFileTransformer. The transform() method in the interface is called when the class file is loaded. In the transform method, we can use ASM or Javassist above to rewrite or replace the bytecode passed in, generate a new bytecode array and return it.

We define a class TestTransformer that implements the ClassFileTransformer interface, and still use Javassist to enhance the process() method in the Base class by printing “start” and “end”, respectively, as follows:

import java.lang.instrument.ClassFileTransformer; public class TestTransformer implements ClassFileTransformer { @Override public byte[] transform(ClassLoader loader, String className, Class<? > classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) { System.out.println("Transforming " + className);
        try {
            ClassPool cp = ClassPool.getDefault();
            CtClass cc = cp.get("meituan.bytecode.jvmti.Base");
            CtMethod m = cc.getDeclaredMethod("process");
            m.insertBefore("{ System.out.println(\"start\"); }");
            m.insertAfter("{ System.out.println(\"end\"); }");
            return cc.toBytecode();
        } catch (Exception e) {
            e.printStackTrace();
        }
        returnnull; }}Copy the code

Now that you have Transformer, how does it inject into a running JVM? You also need to define an Agent that uses the Agent’s power to inject Instrument into the JVM. We will introduce Agent in the next section, but now we will introduce another Instrumentation class used in Agent. Since JDK 1.6, Instrumentation can be used as Instrument after startup, Instrument of Native Code, dynamic Classpath change, and more. We can add the Transformer defined above to the Instrumentation and specify the classes to be reloaded, as shown below. Thus, when the Agent is attached to a JVM, bytecode substitution and JVM reload are performed.

import java.lang.instrument.Instrumentation; Public class TestAgent {public static void AgentMain (String args, Instrumentation inst) { AddTransformer (new TestTransformer(),true); Try {// Redefine the class and load the new bytecode inst.reTransformclasses (base.class); System.out.println("Agent Load Done.");
        } catch (Exception e) {
            System.out.println("agent load failed!"); }}}Copy the code

3.3 JVMTI & Agent & Attach API

In the previous section, we introduced the code of the Agent class. To trace the source, we need to introduce JPDA (Java Platform Debugger Architecture). Classes are allowed to be reloaded if JPDA is turned on when the JVM starts. In this case, the old version of the class information that has been loaded can be unloaded and the new version of the class can be reloaded. Like the Debugger in its name, JDPA is a set of standards for debugging Java programs that any JDK must implement.

JPDA defines a complete system, which divides the debugging system into three parts and specifies the communication interface among them. The three parts are Java Virtual Machine Tool Interface (JVMTI), Java Debugging Protocol (JDWP), and Java Debugging Interface (JDI) respectively from lowest to highest, as shown in the following figure:

Now back to business, we can leverage some of JVMTI’s capabilities to help dynamically overload class information. JVM TI (JVM TOOL INTERFACE) is a set of TOOL interfaces provided by the JVM to operate on the JVM. JVMTI allows you to perform multiple operations on the JVM, and then register various event hooks through the interface. The JVM event trigger, trigger a predefined hook at the same time, in order to realize the response to the JVM events, events include the class file is loaded, abnormal generation and capture, thread start and end, enter and exit the critical section, begins and ends with a member variable modification, GC, method invocation entry and exit, critical sections, competition and waiting, VM start and quit and so on.

The Agent is an implementation of JVMTI. There are two ways to start the Agent. One is to start the Agent with the Java process. Second, load at runtime. Dynamically Attach the module (JAR package) to a Java process with a specified process ID through the Attach API.

The Attach API provides the ability to communicate between JVM processes. For example, in order to get another JVM process to Dump an online service thread, we would run the JStack or JMap process and pass the PID parameter to tell it which process to Dump. That’s what the Attach API does. Below, we will dynamically Attach the packaged Agent JAR package to the target JVM using the loadAgent() method of the Attach API. The specific steps are as follows:

  • Define an Agent in which you implement the AgentMain method, the TestAgent class in block 7 defined in the previous section;

  • The TestAgent Class is then packed into a JAR package containing manifest.mf, where the manifest.mf file specifies the agent-class attribute as the fully qualified name of TestAgent, as shown below.

  • Finally, Attach our packaged JAR package to the specified JVM PID using the Attach API as follows:
import com.sun.tools.attach.VirtualMachine; public class Attacher { public static void main(String[] args) throws AttachNotSupportedException, IOException, AgentLoadException, AgentInitializationException {/ / incoming target JVM pid VirtualMachine vm = VirtualMachine. Attach ("39333"); 
        vm.loadAgent("/Users/zen/operation_server_jar/operation-server.jar"); }}Copy the code
  • Since agent-class is specified in manifest.mf, after Attach, the target JVM will run to the AgentMain () method defined in the TestAgent Class. In this method, we use Instrumentation, The bytecode of the specified class is substituted (via JavAssist) for the Base class through TestTransformer, the defined class converter, and the class is reloaded. Thus, we achieve the goal of “changing the class’s bytecode and reloading the class information while the JVM is running.”

Here’s what happens when you reload the class at runtime: Run the main() method in Base first, start a JVM, and see “process” printed on the console every five seconds. The main() method in Attacher is then executed, passing in the PID of the previous JVM. Going back to the console of the previous main() method, you can see that “process” is now followed by “start” and “end” every five seconds, which completes the bytecode enhancement at run time and reloads the class.

3.4 Application Scenarios

At this point, bytecode enhancement is no longer just available before classes are loaded by the JVM. With these libraries, you can modify and reload classes in the JVM at run time. In this way, there are many things that can be done:

  • Hot deployment: Modify online services without deploying services. You can perform operations such as logging and adding logs.

  • Mock: Mock certain services during testing.

  • Performance diagnostic tools: bTrace, for example, is an Instrument that tracks a running JVM without intrusion, monitoring state information at the class and method levels.

4. To summarize

Bytecode enhancement is a key to a running JVM that allows you to make changes dynamically to a running program and track the state of a running program within the JVM. In addition, we use dynamic proxies, AOP, which are also closely related to bytecode enhancement, essentially using various means to generate compliant bytecode files. To sum up, mastering bytecode enhancement can efficiently locate and quickly repair some difficult problems (such as on-line performance problems, methods with uncontrollable entry and exit parameters need emergency logging, etc.), and can also reduce redundant code in development, greatly improving development efficiency.

5. References

  • “ASM4 – Guide”

  • Oracle:The class File Format

  • Oracle:The Java Virtual Machine Instruction Set

  • Javassist tutorial

  • JVM Tool interface-version 1.2

Author’s brief introduction

Zayn, engineer of meituan’s in-store accommodation business development team.

Refer to the address