Java Programming Dynamics, Part 7

Design bytecode with BCEL

Apache BCEL lets you drill down into the details of class manipulation in the JVM assembly language

Series contents:

This is part 7 of this seven-part series: Java Programming dynamics

In the last three articles in this series, I showed you how to manipulate classes using the Javassist framework. This time I’ll manipulate bytecode in a very different way — using Apache Byte Code Engineering Library (BCEL). Unlike the source code interface supported by Javassist, BCEL operates at the level of actual JVM instructions. The underlying approach makes BCEL useful when you want to control every step of a program’s execution, but it also makes BCEL much more complicated to use than Javassist when both are adequate.

I’ll start with the BASIC BCEL architecture, and then spend most of this article on an example of rebuilding my first Javassist class operation with BCEL. Finally, a brief look at some of the tools provided in the BCEL package and some of the applications developers build with BCEL.

BCEL class access

BCEL gives you all the basic capabilities that Javassist provides for analyzing, editing, and creating Java binary classes. One obvious difference with BCEL is that everything is designed to work at the level of JVM assembly language rather than the source code interface provided by Javassist. Beyond the superficial differences, there are some deeper differences, including the use of two different hierarchies of components in BCEL — one for examining existing code, the other for creating new code. I assume that you are already familiar with Javassist from previous articles in this series (see the sidebar and don’t miss the rest of this series). So I’ll focus on the differences that might confuse you when you start using BCEL.

Like Javassist, BCEL’s capabilities for class analysis essentially duplicate those provided directly by the Java platform through the Relfection API. This duplication is necessary for class manipulation toolkits because you generally don’t want to load the classes you are working on before they have been modified.

BCEL in org. Apache. BCEL package provides some basic constants defined, but in addition to these definitions, all analysis related code in the org. Apache. BCEL. Classfile package. The starting point in this package is the JavaClass class. This Class is used to access Class information with BCEL in the same way that java.lang.class is used with regular Java reflection. JavaClass defines methods to get field and method information for this class, as well as structural information about the parent class and interface. Unlike java.lang.class, JavaClass also provides access to the internal information of aClass, including a constant pool and attributes, as well as a full binary Class representation as a byte stream.

JavaClass instances are typically created by parsing the actual binary classes. BCEL provides org. Apache. BCEL. The Repository class for handling parsing. By default, the BCEL parse and buffer in the JVM classpath find classes, said from the org. Apache. BCEL. Util. Repository instance to get the actual binary class representation (pay attention to the different package name). Org. Apache. Bcel. Util. The Repository is actually a binary class representation of the source code of the interface. Where the classpath is used in the default source code, it can be replaced with another path to query the class file or another way to access the class information.

Change the class

In addition to the reflection form of access to the component class, org. Apache. Bcel. Classfile. JavaClass also provides a way to change the class. You can use these methods to set any component to a new value. However, they are generally not used directly because the other classes in the package do not support building new versions of components in any reasonable way. . On the contrary, in the org. Apache. Bcel generic package have completely separate a set of classes, it provides org. Apache. Bcel. Classfile class represents an editable version of the same components.

Like org. Apache. Bcel. Classfile. JavaClass is the starting point of using bcel analysis existing classes, org. Apache. Bcel. Generic. The ClassGen starting point is to create a new class. It is also used to modify existing classes — to handle this, there is a constructor that takes a JavaClass instance and initializes ClassGen class information with it. After you modify the class, you can get a usable class representation from the ClassGen instance by calling a method that returns JavaClass, which can be converted to a binary class representation.

Sound a bit messy? I think so. In fact, turning between packages is one of the main disadvantages of using BCEL. Repetitive class structures get in the way, so if you use BCEL a lot, you might want to write a wrapper class that hides some of these differences. In this article, I’ll focus on using the org.apache.bcel.generic package classes and avoid using wrappers. But keep this in mind as you develop on your own.

In addition to ClassGen, the org.apache.bcel.generic package also defines classes that manage the structure of different classes of components. These structural classes include ConstantPoolGen for handling constant pools, FieldGen and MethodGen for fields and methods, and InstructionList for handling a series of JVM instructions. Finally, the org.apache.bcel.generic package also defines classes that represent each type of JVM instruction. Can create instances of these classes directly, or in some cases using org. Apache. Bcel. Generic. InstructionFactory helper classes. The advantage of using InstructionFactory is that it handles many of the bookkeeping details of instruction construction (including adding items to the constant pool as required by the instructions). You’ll see how to make all of these classes work together in the next section.

Use BCEL for class operations

As an example of using BCEl, I’ll use a Javassist example from Part 4 that measures the time it takes to execute a method. I even did it the same way I did with Javassist: I created a copy of the original method to be timed with a changed name, and then replaced the body of the original method with code wrapped around the time calculation by calling the renamed method.

Choose an experiment

Listing 1 shows an example method for demonstration purposes: the buildString method of the StringBuilder class. As I said in Part 4, this approach builds a String in the same way that all Java performance experts warn you against — it repeatedly appends a single character to the end of the String to create a longer String. Because strings are immutable, this approach means that each time the loop builds a new string, copying data from the old string and adding a character at the end. The overall effect is that it incurs more and more overhead as you create longer strings using this method.

Listing 1. Methods to be timed

public class StringBuilder { private String buildString(int length) { String result = ""; for (int i = 0; i < length; i++) { result += (char)(i%26 + 'a'); } return result; } public static void main(String[] argv) { StringBuilder inst = new StringBuilder(); for (int i = 0; i < argv.length; i++) { String result = inst.buildString(Integer.parseInt(argv[i])); System.out.println("Constructed string of length " + result.length()); }}}Copy the code

Listing 2 shows the source code equivalent of a class-manipulation change with BCEL. Here the wrapper method simply saves the current time, then calls the renamed original method and prints the time report before returning the result of the original method call.

Listing 2. Add timing to the original method

public class StringBuilder { private String buildString$impl(int length) { String result = ""; for (int i = 0; i < length; i++) { result += (char)(i%26 + 'a'); } return result; } private String buildString(int length) { long start = System.currentTimeMillis(); String result = buildString$impl(length); System.out.println("Call to buildString$impl took " + (System.currentTimeMillis()-start) + " ms."); return result; } public static void main(String[] argv) { StringBuilder inst = new StringBuilder(); for (int i = 0; i < argv.length; i++) { String result = inst.buildString(Integer.parseInt(argv[i])); System.out.println("Constructed string of length " + result.length()); }}}Copy the code

Write conversion code

Implement the code to add method timing using the BCEL API I described in the BCEL Class Access section. Working at the JVM instruction level makes the code much longer than the Javassist example in Part 4, so I’m going to go through it paragraph by paragraph here before providing the full implementation. In the final code, all the fragments form a method that takes two arguments: Cgen — it’s org. Apache. Bcel. Generic. ClassGen class an instance, using the modified class initialization, the existing information and methods — to timing Method of org. Apache. Bcel. Classfile. Examples of Method.

Listing 3 is the first piece of code for the transformation method. You can see from the comments, the first part is the initialization to use the basic BCEL components, it includes information on use to timing method initializes a new org. Apache. BCEL. Generic. MethodGen instance. I set up an empty instruction list for this MethodGen, which I’ll populate with the actual timing code later. In the second part, I use the original method to create the second org. Apache. Bcel. Generic. MethodGen instance, and then delete the original method from the class. In the second MethodGen instance, plus I just let the name “$impl” suffix, and then call getMethod () will be available to modify the information into the Method of fixed form org. Apache. Bcel. Classfile. Examples of Method. Then call addMethod() to add the renamed method to the class.

Listing 3. Adding interceptor methods

// set up the construction tools
InstructionFactory ifact = new InstructionFactory(cgen);
InstructionList ilist = new InstructionList();
ConstantPoolGen pgen = cgen.getConstantPool();
String cname = cgen.getClassName();
MethodGen wrapgen = new MethodGen(method, cname, pgen);
wrapgen.setInstructionList(ilist);
     
// rename a copy of the original method
MethodGen methgen = new MethodGen(method, cname, pgen);
cgen.removeMethod(method);
String iname = methgen.getName() + "$impl";
methgen.setName(iname);
cgen.addMethod(methgen.getMethod());Copy the code

Listing 4 shows the next piece of code for the transformation method. The first part here calculates how much space the method call parameters take up on the stack. I need this code because in order to store the start time on the stack frame before calling the wrapper method, I need to know what offset values can be used for local variables (note that I can get the same effect with BCEL’s local variable handling, but in this article I choose to use the explicit approach). The second part of this code is generated for Java. Lang. System. CurrentTimeMillis () call, in order to get the start time, and save it to the stack frame to calculate the local variable offset.

You may wonder why you check to see if the method is static at the beginning of the parameter size calculation, and if it is, initialize the stack frame slot to zero (the opposite is true if it is not static). This approach is related to how Java handles method calls. For non-static methods, the first (hidden) argument of each call is the target object’s this reference, which I take into account when calculating the size of the full argument set in the stack frame.

Listing 4. Setting up the wrapper call

// compute the size of the calling parameters Type[] types = methgen.getArgumentTypes(); int slot = methgen.isStatic() ? 0:1; for (int i = 0; i < types.length; i++) { slot += types[i].getSize(); } // save time prior to invocation ilist.append(ifact.createInvoke("java.lang.System", "currentTimeMillis", Type.LONG, Type.NO_ARGS, Constants.INVOKESTATIC)); ilist.append(InstructionFactory.createStore(Type.LONG, slot));Copy the code

Listing 5 shows the code that generates the call to the wrapper method and saves the result, if any. The first part of this code checks again if the method is static. If the method is not static, generate code that loads the reference to this object onto the stack and sets the method call type to virtual (instead of static). The for loop then generates code that copies all of the call parameter values onto the stack, the createInvoke() method generates the actual call to the wrapped method, and finally the if statement stores the result value in another local variable located in the stack frame (if the result type is not void).

Listing 5. Calling wrapped methods

// call the wrapped method int offset = 0; short invoke = Constants.INVOKESTATIC; if (! methgen.isStatic()) { ilist.append(InstructionFactory.createLoad(Type.OBJECT, 0)); offset = 1; invoke = Constants.INVOKEVIRTUAL; } for (int i = 0; i < types.length; i++) { Type type = types[i]; ilist.append(InstructionFactory.createLoad(type, offset)); offset += type.getSize(); } Type result = methgen.getReturnType(); ilist.append(ifact.createInvoke(cname, iname, result, types, invoke)); // store result for return later if (result ! = Type.VOID) { ilist.append(InstructionFactory.createStore(result, slot+2)); }Copy the code

Now start packing. Listing 6 generates code that prints out the number of milliseconds that have elapsed since the actual start time was calculated as a formatted message. This part may seem complicated, but most operations are really just writing out the parts of the output message. It does show several types of operations that I didn’t use in the previous code, including field access (to java.lang.system.out) and several different instruction types. Most of these are easy to understand if you think of the JVM as a stack-based processor, so I won’t go into details here.

Listing 6. Calculate and print the time used

// print time required for method call
ilist.append(ifact.createFieldAccess("java.lang.System", "out", 
    new ObjectType("java.io.PrintStream"), Constants.GETSTATIC));
ilist.append(InstructionConstants.DUP);
ilist.append(InstructionConstants.DUP);
String text = "Call to method " + methgen.getName() + " took ";
ilist.append(new PUSH(pgen, text));
ilist.append(ifact.createInvoke("java.io.PrintStream", "print",
    Type.VOID, new Type[] { Type.STRING }, Constants.INVOKEVIRTUAL));
ilist.append(ifact.createInvoke("java.lang.System", 
    "currentTimeMillis", Type.LONG, Type.NO_ARGS, 
    Constants.INVOKESTATIC));
ilist.append(InstructionFactory.createLoad(Type.LONG, slot));
ilist.append(InstructionConstants.LSUB);
ilist.append(ifact.createInvoke("java.io.PrintStream", "print", 
    Type.VOID, new Type[] { Type.LONG }, Constants.INVOKEVIRTUAL));
ilist.append(new PUSH(pgen, " ms."));
ilist.append(ifact.createInvoke("java.io.PrintStream", "println", 
    Type.VOID, new Type[] { Type.STRING }, Constants.INVOKEVIRTUAL));Copy the code

Once the timing message code is generated, all that is left in Listing 7 is to save the call result value (if any) of the wrapped method, and then close the wrapper method you built. This last part involves several steps. Calling stripAttributes(true) simply tells BCEL not to generate debugging information for the built method, while setMaxStack() and setMaxLocals() call to calculate and set stack usage information for the method. Once you’ve done this, you can actually generate the final version of the method and add it to the class.

Listing 7. Complete the wrapper

// return result from wrapped method call if (result ! = Type.VOID) { ilist.append(InstructionFactory.createLoad(result, slot+2)); } ilist.append(InstructionFactory.createReturn(result)); // finalize the constructed method wrapgen.stripAttributes(true); wrapgen.setMaxStack(); wrapgen.setMaxLocals(); cgen.addMethod(wrapgen.getMethod()); ilist.dispose();Copy the code

Complete code

Listing 8 shows the complete code (formatted slightly to fit the display width), including the main() method that takes the name of the class file and the method to convert:

Listing 8. The complete conversion code

public class BCELTiming
{
    private static void addWrapper(ClassGen cgen, Method method) {
         
        // set up the construction tools
        InstructionFactory ifact = new InstructionFactory(cgen);
        InstructionList ilist = new InstructionList();
        ConstantPoolGen pgen = cgen.getConstantPool();
        String cname = cgen.getClassName();
        MethodGen wrapgen = new MethodGen(method, cname, pgen);
        wrapgen.setInstructionList(ilist);
         
        // rename a copy of the original method
        MethodGen methgen = new MethodGen(method, cname, pgen);
        cgen.removeMethod(method);
        String iname = methgen.getName() + "$impl";
        methgen.setName(iname);
        cgen.addMethod(methgen.getMethod());
        Type result = methgen.getReturnType();
         
        // compute the size of the calling parameters
        Type[] types = methgen.getArgumentTypes();
        int slot = methgen.isStatic() ? 0 : 1;
        for (int i = 0; i < types.length; i++) {
            slot += types[i].getSize();
        }
         
        // save time prior to invocation
        ilist.append(ifact.createInvoke("java.lang.System",
            "currentTimeMillis", Type.LONG, Type.NO_ARGS, 
            Constants.INVOKESTATIC));
        ilist.append(InstructionFactory.
            createStore(Type.LONG, slot));
         
        // call the wrapped method
        int offset = 0;
        short invoke = Constants.INVOKESTATIC;
        if (!methgen.isStatic()) {
            ilist.append(InstructionFactory.
                createLoad(Type.OBJECT, 0));
            offset = 1;
            invoke = Constants.INVOKEVIRTUAL;
        }
        for (int i = 0; i < types.length; i++) {
            Type type = types[i];
            ilist.append(InstructionFactory.
                createLoad(type, offset));
            offset += type.getSize();
        }
        ilist.append(ifact.createInvoke(cname, 
            iname, result, types, invoke));
         
        // store result for return later
        if (result != Type.VOID) {
            ilist.append(InstructionFactory.
                createStore(result, slot+2));
        }
         
        // print time required for method call
        ilist.append(ifact.createFieldAccess("java.lang.System",
            "out",  new ObjectType("java.io.PrintStream"),
            Constants.GETSTATIC));
        ilist.append(InstructionConstants.DUP);
        ilist.append(InstructionConstants.DUP);
        String text = "Call to method " + methgen.getName() +
            " took ";
        ilist.append(new PUSH(pgen, text));
        ilist.append(ifact.createInvoke("java.io.PrintStream",
            "print", Type.VOID, new Type[] { Type.STRING },
            Constants.INVOKEVIRTUAL));
        ilist.append(ifact.createInvoke("java.lang.System", 
            "currentTimeMillis", Type.LONG, Type.NO_ARGS, 
            Constants.INVOKESTATIC));
        ilist.append(InstructionFactory.
            createLoad(Type.LONG, slot));
        ilist.append(InstructionConstants.LSUB);
        ilist.append(ifact.createInvoke("java.io.PrintStream",
            "print", Type.VOID, new Type[] { Type.LONG },
            Constants.INVOKEVIRTUAL));
        ilist.append(new PUSH(pgen, " ms."));
        ilist.append(ifact.createInvoke("java.io.PrintStream",
            "println", Type.VOID, new Type[] { Type.STRING },
            Constants.INVOKEVIRTUAL));
             
        // return result from wrapped method call
        if (result != Type.VOID) {
            ilist.append(InstructionFactory.
                createLoad(result, slot+2));
        }
        ilist.append(InstructionFactory.createReturn(result));
         
        // finalize the constructed method
        wrapgen.stripAttributes(true);
        wrapgen.setMaxStack();
        wrapgen.setMaxLocals();
        cgen.addMethod(wrapgen.getMethod());
        ilist.dispose();
    }
     
    public static void main(String[] argv) {
        if (argv.length == 2 && argv[0].endsWith(".class")) {
            try {
             
                JavaClass jclas = new ClassParser(argv[0]).parse();
                ClassGen cgen = new ClassGen(jclas);
                Method[] methods = jclas.getMethods();
                int index;
                for (index = 0; index < methods.length; index++) {
                    if (methods[index].getName().equals(argv[1])) {
                        break;
                    }
                }
                if (index < methods.length) {
                    addWrapper(cgen, methods[index]);
                    FileOutputStream fos =
                        new FileOutputStream(argv[0]);
                    cgen.getJavaClass().dump(fos);
                    fos.close();
                } else {
                    System.err.println("Method " + argv[1] + 
                        " not found in " + argv[0]);
                }
            } catch (IOException ex) {
                ex.printStackTrace(System.err);
            }
             
        } else {
            System.out.println
                ("Usage: BCELTiming class-file method-name");
        }
    }
}Copy the code

Give it a try

Listing 9 shows the result of first running the StringBuilder program in unmodified form, then running the BCELTiming program to add timing information, and finally running the StringBuilder program after modification. You can see how StringBuilder starts reporting execution times after modification, and how time increases faster than the length of the string being built, due to the inefficiency of the string building code.

Listing 9. Run the program

[dennis]$ java StringBuilder 1000 2000 4000 8000 16000 Constructed string of length 1000 Constructed string of length 2000 Constructed string of length 4000 Constructed string of length 8000 Constructed string of length 16000 [dennis]$ java -cp bcel.jar:. BCELTiming StringBuilder.class buildString [dennis]$ java StringBuilder 1000 2000 4000 8000 16000 Call to method buildString$impl took 20 ms. Constructed string of length 1000 Call to method buildString$impl took 79 ms. Constructed string of length 2000 Call to method buildString$impl took 250 ms. Constructed string of length 4000 Call to method buildString$impl took 879 ms. Constructed string of length 8000 Call to method buildString$impl took 3875  ms. Constructed string of length 16000Copy the code

Packaging BCEL

BCEL has more support than the basic class operations I’ve covered in this article. It also includes a complete validator implementation to ensure that the binary class is effective for the JVM specification (see org. Apache. Bcel. The verifier. VerifierFactory), a generating good framing and link the JVM level the binary class view disassembler, Or even a BCEL program generator that prints out source code for BCEL programs to compile supplied classes. (org. Apache. Bcel. Util. BCELifier class does not include in the Javadocs, so its use depends on the source code. This feature is attractive, but the output is probably too cryptic for most developers.

When I use BCEL myself, I find THE HTML disassembler particularly useful. If you want to try it, as long as the executive BCEL JAR of org. Apache. BCEL. Util. Class2HTML class, used to the disassembly of the class file path as command line parameters. It generates HTML files in the current directory. For example, below I’ll disassemble the StringBuilder class used in the timing example:

[dennis]$ java -cp bcel.jar org.apache.bcel.util.Class2HTML StringBuilder.class Processing StringBuilder.class... Done.Copy the code

Figure 1 is a screenshot of the frame output generated by the disassembler. In this snapshot, the large frame in the upper right shows the breakdown of timing wrapper methods added to the StringBuilder class. The full HTML output is in the download file — to actually view it, just open the stringBuilder.html file in your browser window.

Figure 1. Disassemble StringBuilder

Currently, BCEL is probably the most used framework for Java class operations. Other projects using BCEL are listed on the Web site, including the Xalan XSLT compiler, AspectJ extensions to the Java programming language, and several JDO implementations. Many other projects not listed also use BCEL, including my own JiBX XML data binding project. However, several of the projects listed by BCEL have moved on to other libraries, so don’t take this list as an absolute guide to BCEL’s popularity.

The biggest benefit of BCEL is its business-friendly Apache license and its rich JVM instruction-level support. These features, combined with their stability and longevity, make it a very popular choice for class action applications. However, BCEL does not appear to be designed to be very fast or easy to use. For the most part, Javassist offers a friendlier API and is nearly as fast (or faster), at least in my simple tests. If your project can use the Mozilla Public License (MPL) or the GNU Lesser General Public License (LGPL), Javassist is probably a better choice (it works under both licenses).

The next article

I’ve already introduced Javassist and BCEL, and the next article in this series will delve into more versatile class manipulation applications than we’ve covered so far. In Part 2, I showed that method call reflection is much slower than direct calls. In Part 8, I’ll show you how to use Javassist and BCEL to dramatically improve performance by replacing reflection calls with dynamically generated code at runtime. Come back next month for another article on Java programming dynamics for more details.

On the topic