ASM introduction

ASM is a general-purpose Java bytecode manipulation and analysis framework that can be used to modify existing classes or generate classes dynamically directly in binary form. ASM provides some common bytecode conversion and analysis algorithms from which you can build custom complex conversion and code analysis tools. ASM provides similar functionality to other Java bytecode frameworks, but with a focus on performance. Because it is designed and implemented as small and fast as possible, it is ideal for use in dynamic systems (and of course, statically, for example, in compilers).

ASM is used in many projects, including the following:

  • OpenJDK, generating lambda call sites, and Nashorn compiler;
  • The Groovy and Kotlin compilers;
  • Cobertura and Jacoco, using instrumental classes to measure code coverage;
  • CGLIB, for dynamically generating proxy classes;
  • Gradle generates classes at run time.

For more information, see asM.ow2.io /

IDE plug-ins

ASM works directly with BytecodeOutline. If you are not familiar with the BytecodeOutline set of BytecodeOutline operations, it can be difficult to write.

  • IDEA: ASM Bytecode Outline
  • Eclipse: BytecodeOutline

Take IDEA as an example, just right-click ->Show Bytecode Outline in the corresponding class, as shown in the figure below:

The panel contains three tabs:

  • Bytecode: bytecode file corresponding to the class;
  • ASMified:ASMGenerate bytecode corresponding code;
  • Groovified: the bytecode instruction corresponding to the class;

ASM API

The ASM library provides two apis for generating and transforming compiled classes. One is the core API, which represents classes in event-based form. Another is the tree API, which represents classes in object-based form. You can compare how XML files are parsed: SAX mode and DOM mode; The core API corresponds to SAX mode, and the tree API corresponds to DOM mode. Each model has its own pros and cons:

  • Event-based apis are faster and require less memory than object-based apis, but class transformations can be more difficult to implement when using event-based apis;
  • Object-based apis load the entire class into memory;

ASM libraries are organized in packages that are distributed in separate JAR files:

  • org.objectweb.asmandorg.objectweb.asm.signaturePackage: defines event-based apis and provides class parser and writer components, which are contained in asM.jar;
  • org.objectweb.asm.utilPackage: Provides a variety of tools based on the core API that can be used during the development and debugging of ASM applicationsasm-util.jar;
  • org.objectweb.asm.commonsPackage: provides several useful predefined class converters, mainly based on the core API, contained inasm-commons.jar;
  • org.objectweb.asm.treePackage: defines object-based apis and provides tools for converting between event-based and object-based representations, contained inasm-tree.jar;
  • org.objectweb.asm.tree.analysisPackage: The package provides a tree-based API class analysis framework and several predefined class parsers, included in theasm-analysis.jar;

Core API

Before studying the core API, it is recommended to look at the visitor pattern, which is how ASM operates and analyzes bytecode.

Visitor pattern

The Visitor pattern suggests putting the new behavior into a separate class called visitor, rather than trying to integrate it into an existing class. Now, the original object to perform the operation on is passed as a parameter to the methods in the visitor, giving the methods access to any necessary data that the object contains; Common application scenarios:

  • If you need to perform certain operations on all elements of a complex object structure, such as a tree of objects, use the visitor pattern.
  • The visitor pattern can be used to clean up the business logic of the auxiliary behavior;
  • This pattern can be used when a behavior makes sense only in some classes in a class hierarchy and not in others;

In fact, bytecode is a complex object structure, and for example, SQL parsing in Sharding-JDBC also uses visitor mode, which can be found to be some data with stable data structure and fixed syntax.

More references: Visitor pattern

class

The visitor pattern has two core classes: independent visitor, receiving visitor event generator; There are two core classes for ASM: ClassVisitor and ClassReader.

ClassVisitor

The ASM API for generating and transforming compiled classes is based on the ClassVisitor abstract class, each method in which corresponds to a class file structure of the same name:

public abstract class ClassVisitor {
    public ClassVisitor(int api);
    public ClassVisitor(int api, ClassVisitor cv);
    public void visit(int version, int access, String name,String signature, String superName, String[] interfaces);
    public void visitSource(String source, String debug);
    public void visitOuterClass(String owner, String name, String desc);
    AnnotationVisitor visitAnnotation(String desc, boolean visible);
    public void visitAttribute(Attribute attr);
    public void visitInnerClass(String name, String outerName,String innerName, int access);
    public FieldVisitor visitField(int access, String name, String desc,String signature, Object value);
    public MethodVisitor visitMethod(int access, String name, String desc,String signature, String[] exceptions);
    void visitEnd(a);
}
Copy the code

Parts whose content can be of any length and complexity are returned by the helper visitor class, which includes AnnotationVisitor, FieldVisitor, MethodVisitor; See the Java Virtual Machine specification for more;

All of the above methods are called by the event producer ClassReader, and all of the arguments in the method are provided by ClassReader. Of course, each method is called in order:

visit visitSource? visitOuterClass? ( visitAnnotation | visitAttribute )* ( visitInnerClass | visitField |visitMethod )* visitEnd
Copy the code

Visit is called first, followed by at most one call to visitSource, followed by at most one call to visitOuterClass, followed by any number of visits to VisitAnnotations and VisitAttributes that can be done in any order, This is followed by any number of calls to visitInnerClass, visitField, and visitMethod, which can be made in any order, ending with a visitEnd call.

ClassReader

The main function of this class is to read the bytecode file and notify the ClassVisitor of the read data. The bytecode file can be passed in a variety of ways:

  • public ClassReader(final InputStream inputStream): byte stream mode;
  • public ClassReader(final String className): Full file path.
  • public ClassReader(final byte[] classFile): binary file;

The common usage modes are as follows:

ClassReader classReader = new ClassReader("com/zh/asm/TestService");
ClassWriter classVisitor = new ClassWriter(ClassWriter.COMPUTE_MAXS);
classReader.accept(classVisitor, 0);
Copy the code

The accept method of ClassReader handles receiving a visitor and includes another parsingOptions argument, including:

  • SKIP_CODESkip access to compiled code (which can be useful if you only need class structures);
  • SKIP_DEBUG: Does not access debugging information or create manual labels for it.
  • SKIP_FRAMESSkip stack mapping frame;
  • EXPAND_FRAMES: Decompress these frames;

ClassWriter

The above example uses ClassWriter, which is inherited from ClassVisitor to generate classes and can be used independently, as follows:

ClassWriter cw = new ClassWriter(0);
cw.visit(V1_5, ACC_PUBLIC + ACC_ABSTRACT + ACC_INTERFACE,"pkg/Comparable".null."java/lang/Object".new String[]{"pkg/Mesurable"});
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "LESS"."I".null.new Integer(-1)).visitEnd();
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "EQUAL"."I".null.new Integer(0)).visitEnd();
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "GREATER"."I".null.new Integer(1)).visitEnd();
cw.visitMethod(ACC_PUBLIC + ACC_ABSTRACT, "compareTo"."(Ljava/lang/Object;) I".null.null).visitEnd();
cw.visitEnd();
byte[] b = cw.toByteArray();

/ / output
FileOutputStream fileOutputStream = new FileOutputStream(new File("F:/asm/Comparable.class"));
fileOutputStream.write(b);
fileOutputStream.close();
Copy the code

ClassWriter generates a bytecode file, converts it to a byte array, and outputs it to a file using FileOutputStream. The decompression result is as follows:

package pkg;

public interface Comparable extends Mesurable {
    int LESS = -1;
    int EQUAL = 0;
    int GREATER = 1;

    int compareTo(Object var1);
}
Copy the code

Flags is required to instantiate ClassWriter. Options include:

  • COMPUTE_MAXS: will calculate the size of the local variables and operand stack for you; I still have to callvisitMaxs, but any arguments can be used: they will be ignored and recalculated; When using this option, you still have to calculate the frames yourself;
  • COMPUTE_FRAMES: Everything is automatically calculated; No longer need to callvisitFrame, but must still be calledvisitMaxsParameters will be ignored and recalculated.
  • 0: does not automatically calculate anything; You must calculate the size of frames, local variables, and operand stacks yourself;

The above is just a separate use of ClassWriter, but it is more meaningful to use the above three core classes together.

Conversion operations

A ClassVisitor is introduced between the class reader and the class writer, and the code structure is roughly as follows:

ClassReader classReader = new ClassReader("com/zh/asm/TestService");
ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS);
/ / processing
ClassVisitor classVisitor = newAddFieldAdapter(classWriter...) ; classReader.accept(classVisitor,0);
Copy the code

The architecture corresponding to the above code is shown below:

An adapter is provided to add attributes. You can override the visitEnd method and write the new attributes as follows:

public class AddFieldAdapter extends ClassVisitor {
    private int fAcc;
    private String fName;
    private String fDesc;
    // Whether an attribute with the same name already exists
    private boolean isFieldPresent;

    public AddFieldAdapter(ClassVisitor cv, int fAcc, String fName,
                           String fDesc) {
        super(ASM4, cv);
        this.fAcc = fAcc;
        this.fName = fName;
        this.fDesc = fDesc;
    }

    @Override
    public FieldVisitor visitField(int access, String name, String desc,
                                   String signature, Object value) {
        // Check if there is a field with the same name. If not, add it to visitEnd
        if (name.equals(fName)) {
            isFieldPresent = true;
        }
        return cv.visitField(access, name, desc, signature, value);
    }

    @Override
    public void visitEnd(a) {
        if(! isFieldPresent) { FieldVisitor fv = cv.visitField(fAcc, fName, fDesc,null.null);
            if(fv ! =null) { fv.visitEnd(); } } cv.visitEnd(); }}Copy the code

Depending on the order in which each method of the ClassVisitor is called, if there are multiple attributes in the class, visitField is called multiple times, each time checking to see if the field to be added already exists, and then saved in the isFieldPresent identifier. This determines whether a new attribute needs to be added in the last visitEnd;

ClassVisitor classVisitor = new AddFieldAdapter(classWriter,ACC_PUBLIC + ACC_FINAL + ACC_STATIC,"id"."I");
Copy the code

Public static final int ID; We can write a byte array to a class file and decompile it:

public class TestService {
    public static final intid; . }Copy the code

Utility class

In addition to the above core classes, ASM also provides some utility classes for users to use:

  • Type TypeObject represents a kind ofJavaType, which can either be constructed by a type descriptor or byClassObject construction;TypeThe class also contains static variables that represent primitive types;
  • TraceClassVisitorExtend theClassVisitorClass, and builds a textual representation of the class being accessed; useTraceClassVisitorTo get a readable trace of the actual generated content;
  • CheckClassAdapter ClassWriterA class does not verify that its methods are called in the right order and that the arguments are valid; Therefore, it is possible to generate invalid classes that are rejected by the Java virtual machine validator. To detect some of these errors as early as possible, you can useCheckClassAdapterClass;
  • ASMifierThis class isTraceClassVisitorThe tool provides an optional backend (by default, it uses oneTextifierBack end, producing the type of output shown above). This backend is enabledTraceClassVisitorEach method of a class prints the Java code used to call it.

methods

In the introduction above the ClassVisitor will access the complexity of the widget by returning the helper visitor classes, which include: AnnotationVisitor, FieldVisitor, MethodVisitor; Take a look at the Java Virtual Machine execution model before introducing the MethodVisitor;

Execution model

As each method is executed, the Java virtual machine synchronously creates a Stack Frame to store information about local variables, operand stacks, dynamic connections, method exits, and so on. Each method is called until the execution is completed, corresponding to a stack frame in the virtual machine stack from the stack to the stack process;

  • Table of local variables: contains variables that can be accessed by their indexes in random order;
  • Operand stack: the stack of values used as operands by bytecode instructions;

Look at an execution stack with 3 frames:

Frame 1: contains 3 local variables, maximum operand stack 4, contains 2 values;

Frame 2: contains 2 local variables, operand stack maximum 3, contains 2 values;

Frame 3: contains 4 local variables, operand stack Max 2, contains 2 values;

Bytecode instruction

A bytecode instruction consists of an opcode identifying the instruction and a fixed number of parameters:

  • Opcode: Is an unsigned byte value, identified by a mnemonic. For example, the opcode value 0 is designed by the mnemonic NOP and corresponds to an instruction that does not perform any operation.
  • Parameters: are static values that determine the exact instruction behavior. They are given immediately after the opcodes.

Bytecode instructions fall into two categories:

  • A small number of instructions are used to move values from local variables to the operand stack;
  • The other instructions only work on the operand stack: they pop values from the stack, evaluate the results from those values, and then push them back onto the stack;

Local variable instruction:

  • ILOAD: used to load a Boolean, byte, CHAR, short, or int local variable;
  • LLOAD, FLOAD, DLOAD: used to load long, float, or double values, respectively;
  • ALOAD: used to load arbitrary non-primitive values, that is, object and array references;

Operand stack instruction:

  • ISTORE: Pops a Boolean, byte, CHAR, short, or int local variable value from the operand stack and stores it in the local variable specified by its index I;
  • LSTORE, FSTORE, DSTORE: pops long, float, or double values, respectively;
  • ASTORE: used to pop up any non-primitive value;
  • GETFIELD,PUTFIELD:GETFIELD owner name descPop up an object reference and push itnameField value;PUTFIELD owner name descPops up a value and an object reference and stores the value in itnameIn the field. In both cases, the object must beownerType, whose fields must bedescType.GETSTATICandPUTSTATICIs a similar directive, but for static fields.
  • INVOKEVIRTUAL, INVOKESTATIC, INVOKESPECIAL, INVOKEINTERFACE, INVOKEDYNAMIC:INVOKEVIRTUAL owner name descCall the classownerDefined in thenameMethod, whose method descriptor isdesc.INVOKESTATICFor static methods,INVOKESPECIALFor private methods and constructors,INVOKEINTERFACEFor methods defined in the interface. Finally, for the java7 class,INVOKEDYNAMICFor the new dynamic method invocation mechanism.

MethodVisitor

The ASM API for generating and transforming compiled methods is based on the MethodVisitor abstract class; It is returned by the visitMethod method of the ClassVisitor; This class also defines a method for each bytecode instruction class based on the number and type of arguments to those instructions; These methods must be called in the following order:

visitAnnotationDefault? ( visitAnnotation | visitParameterAnnotation | visitAttribute )*( visitCode( visitTryCatchBlock | visitLabel | visitFrame | visitXxx Insn |visitLocalVariable | visitLineNumber )*visitMaxs )? visitEndCopy the code

Let’s look at an example of transforming an existing method to add a start and end log to the method.

  1. Prepare the instance that needs to be transformed, adding the log before and after the Query method processing;

    public class TestService {
    	public void query(int param) {
    		System.out.println("service handle..."); }}Copy the code
  2. Override the visitMethod in ClassVisitor

    public class MyClassVisitor extends ClassVisitor implements Opcodes {
        public MyClassVisitor(ClassVisitor cv) {
            super(ASM5, cv);
        }
    
        @Override
        public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
            MethodVisitor methodVisitor = cv.visitMethod(access, name, desc, signature,
                    exceptions);
            if(! name.equals("<init>") && methodVisitor ! =null) {
                methodVisitor = new MyMethodVisitor(methodVisitor);
            }
            returnmethodVisitor; }}Copy the code

    Filter out the

    method, all other methods are wrapped by MyMethodVisitor, and override the methods of MethodVisitor;

  3. Overloading MethodVisitor

    public class MyMethodVisitor extends MethodVisitor implements Opcodes {
        public MyMethodVisitor(MethodVisitor mv) {
            super(Opcodes.ASM4, mv);
        }
    
        @Override
        public void visitCode(a) {
            super.visitCode();
            mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
            mv.visitLdcInsn("start");
            mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false);
        }
    
        @Override
        public void visitInsn(int opcode) {
            if ((opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN)
                    || opcode == Opcodes.ATHROW) {
                // Method prints "end" before returning
                mv.visitFieldInsn(GETSTATIC, "java/lang/System"."out"."Ljava/io/PrintStream;");
                mv.visitLdcInsn("end");
                mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream"."println"."(Ljava/lang/String;) V".false); } mv.visitInsn(opcode); }}Copy the code

    VisitCode method call before accessing, visitInsn needs to determine whether the operator is a method RETURN, general method before returning will execute mv.visitinsn (RETURN) operation, this time can be determined by opCode;

  4. View the new bytecode file generated

    public class TestService {
        public TestService(a) {}public void query(int var1) {
            System.out.println("start");
            System.out.println("service handle...");
            System.out.println("end"); }}Copy the code

Utility class

Some utility classes are also provided under methods:

  • LocalVariablesSorterThe method adapter renumbers local variables used in a method in the order in which they appear in the method and makes them availablenewLocalMethod to create a new local variable;
  • AdviceAdapterThis method adapter is an abstract class that can be used at the beginning of a method as well as anyRETURNorATHROWInsert code before instruction; The main advantage is that it also applies to constructors, where code cannot be inserted just at the beginning of the constructor, but after the super constructor is called.

Usage scenarios

ASM is used in many projects. Here are two common usage scenarios: AOP and Substitute reflection;

AOP

Section-oriented programming is mainly used to solve some system level problems in program development, such as log, transaction, permission waiting; The key technology is proxy, including dynamic proxy and static proxy, there are many ways to achieve:

  • AspectJ: Static weaving, the principle is static proxy;
  • JDK dynamic proxy:JDKDynamic proxy has two core classes:ProxyandInvocationHandler;
  • Cglib dynamic proxy: encapsulatedASMCan dynamically generate new ones at run timeClass; Function thanJDK dynamic proxyMore powerful;

The Cglib dynamic proxy relies on ASM, and we have seen the bytecode enhancements of ASM in the example above.

Instead of reflection

FastJson is known for its speed, including the use of ASM instead of Java reflection; There is also a ReflectASM package designed to replace Java reflection;

ReflectASM is a very small Java class library that provides high performance reflection handling through code generation, automatically providing access classes for GET/SET fields that use bytecode manipulation instead of Java’s reflection technology and are therefore very fast.

Take a look at a simple usage of ReflectASM:

TestBean testBean = new TestBean(1."zhaohui".18);
MethodAccess methodAccess = MethodAccess.get(TestBean.class);
String[] mns = methodAccess.getMethodNames();

for (int i = 0; i < mns.length; i++) {
    System.out.println(methodAccess.invoke(testBean, mns[i]));
}
Copy the code

A temporary TestBeanMethodAccess is generated internally using ASM. The invoke method is overridden internally, and the discompiled method looks like this:

public Object invoke(Object var1, int var2, Object... var3) {
        TestBean var4 = (TestBean)var1;
        switch(var2) {
        case 0:
            return var4.getName();
        case 1:
            return var4.getId();
        case 2:
            return var4.getAge();
        default:
            throw new IllegalArgumentException("Method not found: "+ var2); }}Copy the code

You can see that Invoke is actually a normal call inside, which is definitely faster than using Java reflection.

Reference documentation

asm4-guide.pdf

ASM4 Manual Chinese version

Thank you for attention

You can pay attention to the wechat public account “Roll back the code”, read the first time, the article continues to update; Focus on Java source code, architecture, algorithms, and interviews.