Bytecode and reference detection

1.1 Java bytecode

Bytecode in this chapter focuses on Java bytecode, an instruction format executed by the Java virtual machine. You can view a class bytecode file by using the javap -c -v xxx.class(class file path) command, as shown in the following figure:

1.2 Bytecode detection

The essence of bytecode detection is to analyze and detect Class files generated after the compilation of. Java or. Kt files. Before introducing the principle and practice of bytecode analysis in reference detection, the background of technical pre-research of bytecode reference detection is introduced.

2. Background of bytecode detection

The whole pre-research background needs to start with the software architecture of the APP in charge of the author — the domestic official website APP.

2.1 APP software architecture of the official website for domestic sale

Currently, there are 12 sub-silos of the official APP for domestic sale, which are independently compiled into AAR files for APP project. The software architecture diagram is as follows:

Below APP, the upper layer is the business layer in light blue, the middle layer is the component layer in green, and the bottom layer is the base framework layer in dark blue:

  • Business layer: Business modules (such as mall, community, service) located at the top layer of the architecture, divided by business lines, corresponding to the product business.

  • Component layer: it is the basic functions of APP (such as login and self-upgrade) and common components of business (such as sharing, address management and video playback), providing certain reuse capabilities.

  • Infrastructure layer: Provides full reuse capabilities through infrastructure components that are completely business independent (such as tripartite frameworks, self-encapsulated common capabilities).

2.2 Development mode of domestic official website APP client

  • At present, the official website APP is mainly divided into three business lines. Parallel development of multi-business versions is normal, so modularization is very necessary.

  • The modular sub-storehouse of the official website APP has been used by APP in AAR form, and the upper AAR depends on the lower AAR.

  • The modularization and warehouse optimization work of the official website APP is interspersed in each business version, which is developed in parallel, and the underlying warehouse is inevitably modified.

  • When the business versions of the official website APP are developed in parallel, generally, only the repository that needs to modify the code of the current version will be selected, and other repositories will continue to rely on the AAR of the old version.

2.3 Runtime crashes caused by class, method, or property reference errors

Assume the following scenario:

During the development of version 5.0 of official website APP5.0, the HardWare of the previous version 4.9.0.0 was continued to be used as there was no business modification in the HardWare warehouse (during version development, only the warehouses that need to be modified will be pulled again, and the warehouses that do not need to be modified will continue to use the old version), but there was code modification in the Core warehouse. So we pulled the new 5.0 branch and modified the code to remove a fun1 method from the CoreUtils class, as shown below:

Note: Hardware detection module V4.9.0.0 AAR uses fun1 method in core storehouse coreutils. class, other storehouses including the main APP project do not use fun1 method.

Please think about whether there will be a problem with the compilation of the above scenario project.

A: No problem with compilation

APP main warehouse relies on AAR file compiled by HardWare warehouse of version 4.9.0.0. This AAR file was compiled as early as version 4.9 and has not been touched, so HardWare warehouse has no compilation problem.

The main APP warehouse relies on the 5.0.0.0 version of Core warehouse, and the HardWare relies on the 4.9.0.0 version of Core warehouse. The final compilation will take the higher 5.0.0.0 version of Core warehouse to participate in the compilation of APP project. The APP warehouse does not use the deleted FUN1 method, and there is no compilation problem.

Will there be any problems during the operation of the above scenario project after it is compiled?

A: Yes.

A runtime crash occurs when the APP runs to the HardWare bin and calls the fun1 Method of the CoreUtils class: Method Not Found.

Because the 5.0.0.0 version of the Core warehouse that ultimately participated in APP project compilation has deleted the fun1 method, there will be runtime errors.

Real cases:

1) Can’t find a way

2) Class cannot be found

Fortunately, the above problems were found and fixed in time during the development and testing stage. If they flow online, they will be very serious when running a certain function.

If all the modules of your APP are source dependent, the compiler will alert you if there is a reference problem, so in general there is no need to worry (unless there is a reference problem with the underlying SDK), but if it is a software architecture like the official website, you need to pay attention.

2.4 Current situation analysis and thinking

There have been runtime exceptions caused by reference problems in the local testing process. The detection of such runtime exceptions is not enough only by manual, but must be checked by an automated detection tool. Traditional findBugs, Lint, and others are code static detection tools that cannot detect runtime exceptions caused by potential reference problems. Static code detection cannot solve this problem. So the research of automatic detection tools is imminent!

Three, bytecode detection solutions

If we can during the APK compiled by automated tools to all the jars, each class in the AAR package to do it again, check whether the use of the methods, properties of call reference problems will detect the suspected problem to hint at compile time, it is necessary to direct an error under the condition of termination of compilation, and check the output error log to remind developers, Prevent run-time exceptions from flowing into the line.

Principle: After the Java classes (or Kotlin classes) of each subbin are translated into AAR or JAR, there will be Class files of all classes in AAR and JAR. We actually need to analyze the Class files generated after compilation.

How to do bytecode analysis on Class files?

We recommend using JavaAssist or ASM. We know that the Android compilation process is mainly controlled by Gradle. To analyze the Class file bytecode, we need to implement Gradle Transform. Here we make Gradle plugin directly.

During compilation, the system automatically analyzes whether the Class bytecode has method reference, attribute reference, Class reference cannot be found or the current Class is not accessible. If the problem is found, the compilation is stopped and relevant logs are output to remind the developer of analysis and support the configuration of the plug-in.

Here, the main framework of the whole scheme is relatively clear, as shown in the figure below:

3.1 Principle of method and attribute reference detection

Identification of method and attribute reference problems:

How do I identify a problem with a method reference?

  • The method was deleted and the related method name could not be found.

  • No method with the same signature can be found, which means that the number and type of input parameters cannot match.

  • Method is a non-public method that the current class does not have access to.

How do I identify a problem with an attribute (field) reference?

  • The attribute was deleted and related attributes and fields could not be found.

  • Property is a non-public property that the current class does not have access to.

Permission modifiers:

Bytecode detection of method and attribute references: We can use JavaAssist, ASM and other libraries that support bytecode manipulation to scan methods and attributes in all classes and analyze whether there are reference problems in method calls and attribute references.

3.2 Method and attribute reference test actual combat

The following codes have been written by Kotlin, which omit the specific processes of Gradle Plugin and Transform and directly check the code of the function. Method, field reference detection:

// Gradle Plugin and custom Transform will not be described here
// Method reference check
// Iterate through each method in each class (including constructor addBy Qihaoxin)
classObj.declaredBehaviors.forEach { ctMethod ->
    // Iterate over all methods in the current class
    ctMethod.instrument(object : ExprEditor() {
        override fun edit(m: MethodCall?) {
            super.edit(m)
            // Each method call is called back to this method, which is checked in this method
            // Reference check function
            try {
                // Not every method needs to be validated, filter out system methods we don't need to deal with, third-party SDK methods, etc., and only validate our own business logic code
                if (ctMethod.declaringClass.name.isNeedCheck()) {
                    return
                }
                if (m == null) {
                    throw Exception("MethodCall is null")}// The package name that does not need to be checked
                if (m.className.isNotWarn() || classObj.name.isNotWarn()) {
                    return
                }
                // If method is not found, the exception will be thrown directly, including method deletion and method signature mismatch
                m.method.instrument(ExprEditor())
                This method is not public and is not visible to the class currently calling it
                if(! m.method.visibleFrom(classObj)) {throw Exception(${m.thod. name} is invisible to the class ${classobj.name})}}catch(e: Exception) { e.message? .let { errorInfo +=${e.message} \n"
                }
                errorInfo += "- methods to analyze exception occurs in the ${ctMethod. DeclaringClass. Name} this class of ${m? LineNumber}, ${ctMethod. Name} this method \ n"
                errorInfo += "------------------------------------------------\n"
                isError = true; }}/** * the analysis of member variable calls mainly includes: * the problem that the variable can not be found after being deleted * private variable can only be defined by the class of the variable ** protected variable can be accessed by the class's own/subclass/package name ** /
        override fun edit(f: FieldAccess?) {
            super.edit(f)
            try {
                if (f == null) {
                    throw Exception("FieldAccess is null")}// The package name that does not need to be checked
                if (f.className.isNotWarn() || classObj.name.isNotWarn()) {
                    return
                }
                // There is no nullation, if the field is not found (the property was dropped), the bottom will throw the NotFoundException
                val modifiers = f.field.modifiers
                if (ctMethod.declaringClass.name == classObj.name) {
                    // Only handle methods defined in this class, otherwise methods in the base class will also be handled -- the class does not actually access the base class private variable, but the error occurs
                    if (ctMethod.declaringClass.name == classObj.name) {
                        if(! f.field.visibleFrom(classObj)) {throw Exception(${f.field.name} is not visible to the class ${classobj.name})}}}}catch(e: Exception) { e.message? .let { errorInfo +="-- Exception Message: ${e.message} \n"
                }
                errorInfo += "-- Field analysis exception occurs when ${classobj.name} class uses ${f?. LineNumber} on line ${f?.
                errorInfo += "------------------------------------------------\n"
                isError = true}}})}Copy the code

In the above code implementation, it is traversed all the methods, the method call within the method, field access was detected. So how do we check global variables?

class BillActivity {...private String mTest1 = CreateNewAddressActivity.TAG;
    private static String mTest2 = new CreateNewAddressActivity().getFormatProvinceInfo("a"."b"."c"); . }Copy the code

For example, in the above code, how should the values of mTest1 and mTest2 be detected? This question has puzzled the author for a long time. Neither JavaAssist nor ASM can find relevant Api to obtain the current value of the attribute, nor can they find relevant ideas and materials to directly analyze the value of the attribute by Class bytecode.

After studying the knowledge of Class bytecode, doing a lot of experiments and doing a lot of logs, the solution slowly surfaced.

Let’s start with a bit of bytecode for BillActivity:

So here we have the global variable mTest1 defined, and you’ll notice that there’s an init Method in Method on the right, and Java actually generates an init Method in a bytecode file after compilation, called an instance constructor, and that instance constructor initializes a block, a variable, Operations such as calling the constructor of the parent class converge to the init method. What about our mTest2 global variable?

A search reveals that mTest2 is actually in the static code block, and it appears that the mTest2 assignment is not wrapped in the method, as shown below:

In fact, after a lot of reading, Java generates the Clinit method in a bytecode file, called a class constructor. The class constructor initializes static statement blocks, static variables, and conversions into the Clinit method. The clinit method is not shown in the Class bytecode shown above by Javap because Javap does not have an appropriate representation of it.

It was found by experimental Log that the initialization of mTest2 did appear in clinit method, and the same ByteCode with the clinit method identifier was displayed in ASMPlugin’s ByteCode, as shown in the following figure:

At this point, we actually know that mTest1 and mTest2 assignments actually occur in the init and clinit methods. So we walked through all the methods in the class to check that the method and attribute reference checks can override global variables.

The problem seems to have been solved perfectly at this point, but after taking a look at the global variable code here, I found a new problem:

class BillActivity {...private String mTest1 = CreateNewAddressActivity.TAG;
    private static String mTest2 = new CreateNewAddressActivity().getFormatProvinceInfo("a"."b"."c"); . }Copy the code

We were only concerned about the TAG attribute and the getFormatProvinceInfo method reference, but we didn’t do a reference check on the CreateNewAddressActivity class itself. Assuming the class was private, there would still be a problem. So we can’t forget to check class references.

3.3 Principle of Class Reference Check

How do I identify a problem with a class reference?

  • The class was deleted and the related class could not be found.

  • The class is non-public and the current class has no access to it.

3.4 class reference detection actual combat

Class reference check

// Class reference check
if(classObj.packageName.isNeedCheck()) { classObj.refClasses? .toList()? .forEach { refClassName ->try {
            if (refClassName.toString().isNotWarn() || classObj.name.isNotWarn()) {
                return@forEach
            }
            // The class was deletedval refClass = classPool.getCtClass(refClassName.toString()) ? :throw NotFoundException("Class cannot be found: $refClassName")
            // Check permissions
            / /... Omit... As with permission checking for methods and properties, I won't go over it here
        } catch(e: Exception) { e.message? .let { errorInfo +="-- Class reference analysis Exception Message: ${e.message} \n"
            }
            errorInfo += "-- Class reference analysis exception references $refClassName \n in class: ${classobj.name}"
            errorInfo += "------------------------------------------------\n"
            isError = true}}}Copy the code

Here is the bytecode reference detection principle and combat on the end.

3.5 Reflection on the solution

After implementing the reference detection function in the buildSrc of the official website for domestic sales, I learned that many other apps have been modularized, and I thought that other apps might also adopt the modularized architecture similar to the official website, and there would be similar pain points. Reflecting on the current technical implementation does not have universal access ability, I felt deeply that this matter is not actually finished. After solving the pain points of our own APP, we need to horizontally empower other apps to solve the pain points faced by large teams, so we have the independent Gradle plug-in behind.

Independent Gradle plugin

If you need an APP module for reference detection during compilation, you are welcome to access this Bytecode reference detection Gradle plug-in developed by me.

4.1 Independent Gradle plugin target

1) Independent Gradle plug-in for easy access of all apps;

2) Support common development configuration items, support plug-in function switch, exception skipping configuration;

3) Conduct reference check on Java and Kotlin compiled bytecode, and when the APK package is compiled on CI and Jenkins and reference problems are found, errors will be reported and specific information of reference problems will be output for development analysis and solution.

4.2 Plug-in Functions

1) Method reference detection;

2) Attribute (field) reference detection;

3) Class reference detection;

4) The plug-in supports common configuration, which can be turned on or off.

For example, Class Not Found \Method Not Found or Field Not Found problems can be detected. The running time of the whole plug-in is very short during compilation. Taking the domestic official website APP as an example, the running time of the plug-in is about 2.3 seconds during COMPILATION of APP, which is very fast. There is no need to worry about increasing compilation time.

4.3 Plug-in Access

Add a dependency to the main project root directory build.gradle:

dependencies {
        ...
        classpath "com.byteace.refercheck:byteace-refercheck:35-SNAPSHOT" // The current version is a trial run, the version needs iteration; We welcome your suggestions and questions to help improve the plugin functionality
}
Copy the code

Use the plugin in your APP project build.gradle and set the configuration information:

// a bytecode reference check plugin developed on the official website
apply plugin: 'com.byteace.refercheck'
// bytecode reference check plugin - configuration item
referCheckConfig {
        enable true // Whether to enable reference checking
        strictMode true // Control whether the build stops when a problem is found,
        check "com.abc.def" // Check the package name of the class we need to check. Because many SDKS or third-party libraries are used in the project, we usually do not check, only check the package name of the class we need to pay attention to
        notWarn "org.apache.http,com.core.videocompressor.VideoController" // Package name that does not need to be reported after manual check
}
Copy the code

4.4 Plug-in Configuration Items

Enable: indicates whether to Enable the reference check function. If the value is false, no reference check is performed

StrictMode: If StrictMode is enabled, compilation is interrupted if an exception is detected. (If StrictMode is disabled, only exception information is recorded in the compilation log, but compilation is not terminated if a reference problem is detected.)

Suggestion: Set enable and strictMode configured in Build. gradle to true when Jekins or CI releases packages.

Check: specifies the name of the package to be checked. Generally, you only need to Check the current APP package name. If you need to Check third-party SDKS, configure the Check as required.

NotWarn: find reference problems do not report errors in the whitelist, in the developer to check the plug-in error reported problems and that the actual will not lead to a crash, can be the current reference to the class name configuration here, can skip the check. If class A does not reference A method in class B, you can configure the class name of class B here, and no error will be reported.

4.5 NotWarn configuration items in the domestic sales official website APP

Domestic website APP will org. Apache. HTTP and com. Core. Videocompressor. VideoController joined no error in the white list. Org.apache. HTTP actually uses the package in the Android system, which does not participate in APK compilation. If this configuration item is not added, an error will be reported, but the actual operation does not error.

Com. Core. Videocompressor. VideoController the without complains: FileProcessFactory referenced in less than CompressProgressListener class. Check the FileProcessFactory code. Line 138 of the FileProcessFactory class calls the convertVideo method, and the last listner argument is passed null.

The bytecode Class file of the Class looks like this, which automatically casts the last converVideo input parameter, NULL:

The CompressProgressListener is not public and is the default package. Also, the FileProcessFactory class and CompressProgressListener are not in the same package, so an error is reported. But the actual runtime does not crash, so you need to add the class name to the error-free whitelist.

If there is a case that should not be reported wrong during the use of the plug-in, you can skip it through whitelist control. Meanwhile, you want to send the case back to me, so that I can analyze the case and update the plug-in iteratively.

Five, the summary

In the process of pre-research, the bytecode knowledge is deep, and there are many bytecode petting and code generation tutorials on the network, but the data of bytecode analysis is too little. Therefore, it is necessary to be familiar with bytecode knowledge, slowly experiment and explore in practice, and slowly polish the details.

In the pre-research process, I actively thought about the universality and configurability of the solution, finally developed the universal Gradle plug-in, actively promoted the access of other modules, and took this precious opportunity to carry out horizontal technology enabling, striving for the success of the large team.

At present, there are two APP access plug-ins. The plug-ins will be maintained and iterated continuously. After the plug-ins are stable, they will be integrated into CI and Jenkins. APP with need is welcome to access Gradle plug-in for reference detection. We hope it can help APP and team with reference detection pain point.

Author: Vivo official website mall client team -Qi Haoxin