This is the 14th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

The highly abstract nature of Java as an interpreted language means that it is easy to decompile, easy to decompile, and of course there are anti-decompile measures. Today read a related article, benefit a lot, know your friend!! The reason I’m interested in Java decompilation is because I often need to learn from others’ work (you know…). . Perhaps decompiling someone else’s code isn’t ethical, well……

Commonly used protection techniques

Java bytecodes are easier to decompilate because of their high level of abstraction. This section describes several common methods for protecting Java bytecode from decompilation. In general, these methods do not prevent a program from being decomcompiled, but rather make it more difficult because they have their own contexts and weaknesses.

1. Isolate Java programs

The simplest way to do this is to make Java Class programs inaccessible to users. This is the most fundamental approach, and can be implemented in several ways. For example, a developer can put a critical Java Class on the server side, and the client gets the service by accessing the relevant interface of the server, rather than directly accessing the Class file. There is no way for hackers to decompile Class files. At present, there are more and more standards and protocols to provide services through interfaces, such as HTTP, Web Service, RPC, etc. However, many applications are not suitable for this type of protection. For example, Java programs cannot be isolated for stand-alone applications. This type of protection is shown in Figure 1.

Figure 1 Isolation of a Java program schematic

2. Encrypt the Class file

To prevent Class files from being decompiled directly, many developers encrypt key Class files, such as registration codes and classes related to serial number management. Before using these encrypted classes, the program first decrypts them and then loads them into the JVM. Decryption of these classes can be done either by hardware or by software.

In implementation, developers usually load encrypted classes by customizable ClassLoader classes (note that applets do not support custom classLoaders for security reasons). The custom ClassLoader first finds the encrypted class, decrypts it, and finally loads the decrypted class into the JVM. A custom ClassLoader is a key class in this protection. Because it is not encrypted itself, it could be the first target for hackers. The encrypted classes can be easily decrypted if the relevant decryption keys and algorithms are overcome. The schematic diagram of this protection mode is shown in Figure 2.

Figure 2. Encrypt the Class file

3. Convert the local code

Converting programs to native code is also an effective way to prevent decompilation. Native code is often difficult to decompile. Developers can choose to convert the entire application to native code, or they can choose to convert key modules. If only the key parts of the module are converted, the Java program will need to use JNI technology to make calls when using these modules.

Of course, by using this technique to protect Java programs, you sacrifice the cross-platform nature of Java. We need to maintain different versions of native code for different platforms, which adds to the software support and maintenance effort. However, for some critical modules, this solution is sometimes necessary.

To ensure that the native code is not modified or replaced, it is often necessary to digitally sign the code. Before they can be used, local code often needs to be authenticated to ensure that it has not been changed by hackers. If the signature check passes, the relevant JNI method is called. The schematic diagram of this protection mode is shown in Figure 3.

Figure 3. Transformation cost code diagram

4. Code confusion

Code obfuscating is the reorganization and processing of a Class file so that the processed code performs the same function (semantically) as the pre-processed code. However, the obfuscated code is difficult to decompilate, that is, the resulting code is very difficult to understand and obscure, so it is difficult for the decompiler to get the true semantics of the program.

In theory, obfuscated code can still be cracked if a hacker gives it enough time, and there are even anti-obfuscation tools being developed. However, in practice, due to the diversified development of obfuscation technology and the maturity of obfuscation theory, obfuscation Java code can still prevent decompilation. We’ll cover obfuscation in detail below, because obfuscation is an important technique for protecting Java programs. Figure 4 is a diagram of code obfuscation.

Figure 4. Code obfuscation diagram

A summary of several techniques

The above technologies have different application environments and each has its own weaknesses. Table 1 is the comparison of relevant characteristics.

Table 1 Comparison table of different protection technologies

So far, obfuscation technology is the most basic protection method for Java programs. Java obfuscation tools are also numerous, including commercial, free, and open source ones. Sun also offers its own obfuscation tool. Most of them obfuscate Class files, but a few amplify the obfuscation by processing the source code first and the Class later.

Commercially successful obfuscation tools include JProof’s 1stBarrier series, Eastridge’s JShrink, and 4thPass.com’s SourceGuard. The main obfuscation technologies can be classified as follows according to the obfuscation objectives, They are Lexical Obfuscation, Data Obfuscation, Control Obfuscation, and Prevent Transformation.

Symbol confusion

There is a lot of information in Class that has nothing to do with the execution of the program itself, such as method names, variable names, and symbols with meanings. For example, if a method is called getKeyLength(), it is likely to return the length of the Key. Symbol obfuscation is to scramble this information into meaningless representations, such as numbering all variables starting with vairANT_001; Start numbering with method_001 for all methods. This makes decompilation difficult. For private functions, local variables, it is usually possible to change their symbols without affecting the operation of the program. However, for some interface names, public functions, and member variables, if other external modules need to reference these symbols, we often need to keep these names, otherwise the external modules will not find the methods and variables with these names. Therefore, most obfuscation tools provide rich options for symbol obfuscation, allowing the user to choose whether and how to do it.

Data confusion

Figure 5 changing data access

Data obfuscation is the obfuscation of data used by a program. There are also many methods of confusion, which can be divided into Store and Encode Transform and Access Transform.

Changing the data store and encoding can disrupt the data store used by the program. For example, break an array of 10 members into 10 variables and scramble the names of the variables; Convert a two – dimensional array into a one – dimensional array. For some complex data structures, we will scramble the data structure, such as replacing a complex class with multiple classes, etc.

Another approach is to change data access. For example, when accessing the index of an array, we can perform certain calculations, as shown in Figure 5.

In practical obfuscation processing, these two approaches are usually used together, disrupting both the storage of data and the way it is accessed. By obfuscating the data, the semantics of the program become complicated, which makes decompilation more difficult.

Control the confusion

Control obfuscation is the confusion of the control flow of the program, which makes the control flow of the program more difficult to decompile. Usually, the change of the control flow requires some extra calculation and control flow, so it will have a certain negative impact on the performance of the program. Sometimes, there is a trade-off between the performance of the program and the degree of obfuscation. The techniques for controlling obfuscation are the most complex and tricky. These technologies can be divided into the following categories:

Add obfuscation control By adding additional, complex control flows, the original semantics of a program can be hidden. For example, for two statements A and B executed in sequence, we can add A control condition to determine the execution of B. This makes disassembly more difficult. But all interference control should not affect B’s execution. Figure 6 shows three ways to add obfuscation control to this example.

Figure 6 three ways to add obfuscation control

Regrouping control flow regrouping control flow is also an important obfuscation method. For example, if a program calls a method, after confusion, the method code can be embedded in the calling program. Conversely, a piece of code in a program can be turned into a function call. In addition, for the control flow of a cycle, it is a control flow that can split multiple cycles or convert the cycle into a recursive process. This method is the most complex and has a large number of researchers.

Preventive confusion

Such obfuscations are usually designed for specialized decomcompilers, which typically exploit weaknesses or bugs in the decomcompilers to design obfuscations. For example, some decomcompilers do not decompile instructions following a Return, while some obfuscation schemes place code just after a Return statement. The effectiveness of this obfuscation is not always the same for different decomcompilers. A good obfuscation tool usually uses a combination of these obfuscation techniques.

Case analysis

In practice, securing a large Java program often requires a combination of these methods, rather than a single one. This is because each approach has its weaknesses and application context. The combination of these methods makes the protection of Java programs more effective. In addition, we often need to use other related security technologies, such as security authentication, digital signature, PKI, etc.

The example given in this article is a Java application, which is a mock examination software of SCJP(Sun Certificate Java Programmer). The application comes with a large number of mock questions, all of which are encrypted and stored in files. Because the question bank it carries is the core part of the software, so the access and access to the question bank becomes a very core class. Once these related classes are decompiled, all question banks are cracked. Now, let’s consider how to protect these question banks and related classes.

In this example, we consider using a combination of protection techniques, including native code and obfuscation techniques. Because the software is primarily distributed on Windows, only one version of the native code needs to be maintained after the conversion. In addition, obfuscation is also very effective for Java programs, which are suitable for such independently distributed applications.

In the specific scheme, we will divide the program into two parts, one is written by the local code to access the question bank module, the other is developed by Java other modules. This protects the title management module from decompilation to a higher degree. For Java developed modules, we still use obfuscation techniques. See Figure 7 for a schematic of this scheme.

FIG. 7 SCHEMATIC diagram of SCJP protection technology

For the topic management module, because the program is mainly used under Windows, so use C++ to develop the question bank access module, and provide a certain access interface. To protect the question bank access interface, we also added an initialization interface for initialization before each use of the question bank access interface. Its interfaces fall into two main categories:

1. Before using the question bank module, we must first call the initialization interface. When calling this interface, the client needs to provide a random number as an argument. The question bank management module and the client generate the same SessionKey according to a certain algorithm through this random number, which is used to encrypt all the data input and output later. In this way, only authorized (valid) clients can connect to the correct connection and generate the correct SessionKey for accessing the question bank information. Illegal clients have difficulty generating the correct SessionKey and therefore cannot obtain the information of the question bank. If a higher level of security is required, two-way authentication can also be used.

2. After the data access interface authentication is completed, the client can access the question bank data normally. However, the input and output data are all SessionKey encrypted data. Therefore, only the correct question bank management module can use the question bank management module. FIG. 8 sequence diagram shows the interaction process between question bank management module and other parts.