Curiosity about Debug

When I first learned Java, I was curious about the Debug of IDEA. Not only could I see the context of the breakpoint, but I could use its Evaluate function at the breakpoint to directly execute some commands, perform some calculations, or change the current variable.

At the beginning, I was not familiar with the syntax and often wrote wrong codes. It would take a long time to repackage and deploy the code once, so I directly developed for Debug. Make a breakpoint at the beginning of the method to write, execute the method function in the Evaluate box again and again to adjust the code, no problem and then copy the code into IDEA, and then write the next method, so as to write PHP similar interpretable language, execute immediately, very convenient.

But Java is a static language and is compiled before it can run. Is this code being compiled in real time and “injected” into the service I’m debugging?

As I became more and more familiar with Java, I also learned about reflection, bytecode and other technologies. Until a few days ago, a colleague shared the use and implementation of Btrace, and mentioned Java ASM framework and JVM TI interface. The implementation of Btrace’s ability to modify code has a lot in common with Debug Evaluate, which appeals to me.

Sharing is like an introduction. You can only learn the surface from it. To understand it, you have to study it yourself. So I looked at the information and wrote code to learn its specific implementation.

ASM

The first problem that implementation Evaluate addresses is how to change the behavior of the original code, an implementation known in Java as dynamic bytecode technology.

Dynamic generation of bytecode

We know that Java code we write has to be compiled into bytecode to be executed in the JVM, and bytecode can be interpreted and executed once it is loaded into the virtual machine.

Bytecode files (.class) are plain binary files that are generated by the Java compiler. As long as a file can be changed, if we parse the original bytecode file with specific rules, modify it or redefine it, we can change the code behavior.

There are many technologies in the Java ecosystem that can generate bytecode dynamically, such as BCEL, Javassist, ASM, CGLib, etc., all of which have their own advantages. Some use complex but powerful, some simple but poor performance.

The ASM framework

ASM is the most powerful of them all. It allows you to dynamically modify classes, methods, and even redefine classes.

Of course, it has a high barrier to use, requiring some knowledge of Java bytecode files and familiarity with the JVM’s compilation instructions. Although I am not familiar with the JVM bytecode syntax, there are plug-ins that can view bytecode in IDEA: ASM Bytecode Outline, right-click in the class file to view and select Show Bytecode Outline to view the Bytecode we want to generate in the toolbar on the right. Using the example, we can easily write Java code that manipulates bytecodes.

Cut to the ASMified TAB and you can even get the ASM code directly.

Commonly used method

In ASM code implementation, the most obvious is the visitor pattern, where ASM wraps the reads and operations on code as a visitor that is invoked when parsing bytecode loaded by the JVM.

ClassReader is the entry to the ASM code that parses binary bytecode, and when we instantiate it, we need to pass in a ClassVisitor, and in that Visitor, We can implement methods such as visitMethod()/visitAnnotation() to define access methods to class structures such as methods, fields, annotations.

The ClassWriter interface inherits the ClassVisitor interface, into which we “inject” the ClassWriter when instantiating the ClassVisitor to declare writing to the class.

Instrument

introduce

The JVM will use its own class loader to load the bytecode files during execution, and the changes will not be ignored after loading. To implement changes to existing classes, we also need another Java library instrument.

Instrument is a library provided by the JVM that can modify loaded class files. Prior to 1.6, Instrument only took effect when the JVM started loading classes, but since then, instrument has supported changes to class definitions at runtime.

use

To use instrument’s class modification capabilities, we need to implement its ClassFileTransformer interface to define a ClassFileTransformer. Its only transform() method is called when the class file is loaded. In the transform method, we can rewrite or replace the passed binary bytecode, generate a new bytecode array and return it. The JVM uses the bytecode data returned by the Transform method to load the class.

JVM TI

After defining the bytecode modification and redefinition methods, how do we get the JVM to call the class converters we provide? This brings us to the JVM TI.

introduce

JVM TI (JVM Tool Interface) IS a very powerful Tool Interface for the JVM operation, through this Interface, we can achieve the operation of a variety of JVM components, The JVMTM Tool Interface is a powerful Tool for managing virtual machine heap memory, classes, threads, etc.

JVM TI registers various event hooks through an interface through the event mechanism. Predefined hooks are triggered simultaneously when JVM events are triggered to realize the awareness and response of individual JVM events.

Agent

Agent is a way of implementing JVM TI. We link static libraries in C projects and inject static library functions into projects so that we can reference library functions in projects. We can think of agent as a static library in C, or we can implement it in C or C++, compile it into a DLL or so file, and start it when the JVM starts.

JDWP =transport=dt_socket,suspend=y,address=localhost:3333, The -agentlib option specifies the Java Agent to load. JDWP is the name of the Agent. On Linux, you can find the jDMP. so library in the JRE directory.

JDWP -> JVMTI -> JDWP -> JVMTI -> JDWP -> JVMTI -> JDWP -> JVMTI -> JDWP -> JVMTI -> JDWP -> JVMTI -> JDWP The lowest JVM, TI, ultimately implements operations on the JVM.

use

Agent jar jar jar jar jar jar jar jar jar jar jar

To implement code changes, we need to implement an instrument Agent, which can be implemented by adding preMain () or AgentMain () methods to a class. To implement dynamic instrument functions above 1.6, implement the AgentMain method.

In agentmain method, we call Instrumentation. RetransformClasses () method for the target class definition.

The VirtualMachine class included in the Tools. jar package provides the ability to attach a local JVM. It requires us to pass in a PID for the local JVM, tools.jar can be found in the JRE directory.

The agent is generated

In addition, we also need to pay attention to the packaging of the Agent. It needs to specify an agent-class parameter to specify our Class that contains the AgentMain method.

In addition, some parameters of the manifest.mf file need to be configured to allow us to redefine the class. If your Agent implementation needs to reference some other class libraries, you will need to package them into this JAR. Here is my POM file configuration.

<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifestEntries> <Agent-Class>asm.TestAgent</Agent-Class> <Can-Redefine-Classes>true</Can-Redefine-Classes> <Can-Retransform-Classes>true</Can-Retransform-Classes> < the Manifest Version - > 1.0 < / Manifest - Version > < Permissions > all Permissions - < / Permissions > < / manifestEntries > < / archive > <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build>Copy the code

In addition, you need to use the MVN assembly:assembl command to generate jar-with-dependencies as the agent during packaging.

Code implementation

In my tests, I wrote a Demo that implemented a simple bytecode dynamic modification using the above techniques.

The modified class

The TransformTarget is the target class that is to be modified, and when normally executed, it prints “Hello” every three seconds.

public class TransformTarget { public static void main(String[] args) { while (true) { try { Thread.sleep(3000L); } catch (Exception e) { break; } printSomething(); } } public static void printSomething() { System.out.println("hello"); }}Copy the code

Agent

The Agent is the body that performs the modification class, modifies the methods of the TransformTarget class using ASM, and commits the changes to the JVM using the Instrument package.

The entry Class is also the agent-class of the Agent.

public class TestAgent { public static void agentmain(String args, Instrumentation inst) { inst.addTransformer(new TestTransformer(), true); try { inst.retransformClasses(TransformTarget.class); System.out.println("Agent Load Done."); } catch (Exception e) { System.out.println("agent load failed!" ); }}}Copy the code

Classes that perform bytecode modifications and transformations.

public class TestTransformer implements ClassFileTransformer { public byte[] transform(ClassLoader loader, String className, Class<? > classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException { System.out.println("Transforming " + className); ClassReader reader = new ClassReader(classfileBuffer); ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_FRAMES); ClassVisitor classVisitor = new TestClassVisitor(Opcodes.ASM5, classWriter); reader.accept(classVisitor, ClassReader.SKIP_DEBUG); return classWriter.toByteArray(); } class TestClassVisitor extends ClassVisitor implements Opcodes { TestClassVisitor(int api, ClassVisitor classVisitor) { super(api, classVisitor); } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions); if (name.equals("printSomething")) { mv.visitCode(); Label l0 = new Label(); mv.visitLabel(l0); mv.visitLineNumber(19, l0); mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;" ); mv.visitLdcInsn("bytecode replaced!" ); mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;) V", false); Label l1 = new Label(); mv.visitLabel(l1); mv.visitLineNumber(20, l1); mv.visitInsn(Opcodes.RETURN); mv.visitMaxs(2, 0); mv.visitEnd(); TransformTarget.printSomething(); } return mv; }}}Copy the code

Attacher

Use the tools.jar method to dynamically load the Agent into the target JVM’s classes.

public class Attacher { public static void main(String[] args) throws AttachNotSupportedException, IOException, AgentLoadException, AgentInitializationException { VirtualMachine vm = VirtualMachine.attach("34242"); // Target JVM PID vm.loadAgent("/path/to/agent.jar"); }}Copy the code

In this way, start the TransformTarget class, obtain the PID, pass it into Attacher, specify the Agent JAR, attach the Agent to the TransformTarget. The output “hello” becomes the “Bytecode press!” that we want to change. .

summary

Once you’ve mastered the dynamic modification of the bytecode, it’s a little clearer to look back at how Btrace works. With a little bit of experimentation, you can implement a simplified version of Btrace. The same is true of the stack of Java performance analysis tools that have been implemented by many of the world’s biggest developers. With this in mind, we can write our own tools, or at least modify others’ tools

I have to say that the Java ecosystem is really flourishing. When it is really extensive and profound, consulting the data of a module can always lead to a lot of new concepts, and there are always new things to learn.