Host guide:
In recent years, the number of mobile apps has exploded, and hacking has moved from PC to mobile, with increasingly sophisticated reverse attack methods. Code obfuscation is one of the most effective ways to resist reverse attack in solving the problem that hardened products are easy to be unhulled. But can current mobile hardening technologies really protect against hackers?
This report will share the development of application reinforcement ability of Security Department of Alibaba Group, focusing on how Android reinforcement can achieve automatic deployment and analysis of business risk control on the end, so as to perceive security risks more quickly, so as to make quick response and reduce unnecessary business losses.
Welcome alibaba security experts chaos.
Text of guest speech:
Thank you very much for being here and welcome to the Computing Conference Mobile Security Forum.
Today’s big title is called “NEW direction of APP reinforcement”, and the subtitle is mainly about alibaba’s basic introduction of Android reinforcement, as well as some problems encountered in the process of developing this product and the adaptation process of some complex scenes. As the conference gave me 25 minutes to 30 minutes, the whole Android reinforcement involved a lot of technical points, in order to make this share a more focused point, so this share is mainly about the protection of Android Java code, that is to say, after compiling into Android installation files, Some basic protection points for Dex files.
I share this time mainly have three parts, the first part is mainly to introduce the Android Java code protection technology. The second part describes the challenges and problems of application hardening in complex service scenarios. The third part of the future for Android Java code protection of some ideas, to share with you, maybe we will encounter in the future, or have some corresponding inspiration.
The first part is mainly about the technical protection of Android Java code. Just now, the host has done a small questionnaire survey. Many students here are not developing, so I will briefly introduce this context.
Android phone itself is no stranger to everyone, Apple and Android, is currently the industry’s mainstream two smart operating systems. Android’s own development environment, developed in the Java language, has one feature. Since Java files are executed in the virtual machine, they must retain a lot of semantics, and the virtual machine retains a lot of semantics when it is able to recognize executable files. This poses a problem, since compilation preserves a lot of semantics by producing the Dex format that Google is fully open to, and malevolent people can decompile to see the original Java code. Whether Alibaba or other security hardening friends, a protection of Android Java code, also is the protection of the Dex file on android, from the hardening service to now has been the focus.
I will briefly introduce Android Java code and Dex code protection iteration, and the industry mainly summarizes the four generations of protection schemes.
The first generation of hardening itself, when it first came out, probably somewhere between 2013 and 2014, the hardening itself protected the executable file of Android itself, the Dex file, which was like encrypting the Dex when you were packaging, when you were generating the entire Android installation package. There are various encryption algorithms, you can use AES, you can use other. Decrypted at run time by a custom classloader, the actual run time is the complete Dex file compiled by the original application developer. The feature of this reinforcement is clear at a glance. When you get the file, you don’t know the key, nor the encryption algorithm, and you can’t see the whole logic of the file. At this time, you can prevent it to some extent and reverse analyze the Dex file through some open source tools.
The disadvantage is also obvious, because it’s simple, because the runtime uses a custom class loader to decrypt it at load time, and if you block that, you get rid of the shell.
After a period of development, the second generation of protection schemes emerged, namely, class-level Dex protection. In fact, I introduced the Dex file before, its format is public, public means that some classes, some functions in the protection of the code where, in fact, everyone can know. Therefore, the second generation protection scheme is equivalent to extracting the core functions to be protected in Dex to generate another file, using a virtual machine class loading mechanism. A feature of the virtual machine itself is that it will inevitably fall into a method in this class. We have left these core functions in a known location when packaging, called our repair function through this function, and then repaired all the business logic. In fact, the main principle of such a scheme is to use the characteristics of virtual machine class loading mechanism to achieve a certain effect of protection. Of course, there is one feature of this protection, and if we simply say, even if I pull it out, the last command that runs is the Dex standard command that Google supports, you should pay attention to that.
The third generation is completely different from the previous two generations, because Google’s Dex command is open source, regardless of the first generation and the second generation, running in memory is Google’s standard command, so in principle there must be a way to completely reverse this command, and then decompile it out. But the third generation has a qualitative difference. In fact, the first step and the second generation is the same, is also at the time of compilation of packaging will be the core functions of Dex, after pulling away, translated into a custom command, use own a compiler directive for translation, the instruction is a kind of, into other instructions, this time to run through his interpreter to interpret the execution, Is the relevant instructions defined by themselves, which is qualitatively different from the second generation. I run in memory instructions, in some protected functions are certainly not Google standard instructions, this can be very effective to prevent direct copy of memory and other cracking schemes.
The fourth generation, which is currently recognized by the industry, is the Java 2C protection scheme, which is more simple and direct, and its principle is very clear. If we as a developer, when developing Java code, no matter the original Java virtual machine on the traditional PC or Google virtual machine, Java code must be translated into C code. For example, to write a Java code function, in principle, as long as IT is not too much trouble, I can use the virtual machine interface leakage to write C code, this protection solution directly from the root of the problem. The functions that you think the core needs to protect are directly translated into C language codes during compilation and packaging, and then compiled into a SO file by the compiler, which is a binary code supported by the CPU, thus achieving better protection.
This is basically my introduction to the current android mobile terminal for Java code protection of the fourth generation of technology.
Since the first and second generation technologies are relatively simple, I will introduce the whole framework flow chart of the third and fourth generation technologies.
The first part of the Dex protection method of the custom interpreter is no different from the ordinary application packaging, and the final result is an executable file that the Android system can recognize. But when the second and the third step is a little difference, the second step should pass a reinforcement tool chain, because is also an android knowability of files, the first thing to find a Dex file, extracting core function instruction, and then buried order some hook interface, then the apk packaging back, after signature application. The blue part is a logic that is executed on the user’s phone when it is running, the yellow part is a logic that is not installed in the user’s hand, and the blue part is already published. Because the hook interface is buried in the second step, a customized interpreter is entered, which translates the binary code into the logic of the Java code that the original Dex file wants to protect, thus completing the protection effect of the third generation. The real interpretation in the interpretation of the execution is this variant of the instruction, and then to achieve the normal execution of business logic, this is the third generation of a custom interpreter Dex protection scheme introduction.
Say a fourth generation java2C protection method introduction, the first two steps are the same as the third generation, is a developer compiled android installer. The difference is that it is straightforward and crude to translate the Dex core functions that need to be protected directly into C code. For example, a compiled Dex file, directly compile this function into C code, you can customize a compiler to translate a C code. The C code is mature enough to use a variety of compilers, including Google and third-party compilers, to completely remove the core instructions and just look at the Dex file supported by the phone or the architecture. Inside this running time is different, because this time is the need to reach inside the APP, SO as the code is already into SO, want to add into the apk, SO the core functions generally add a native label, transferred to itself SO, behind the execution of the natural, is the performance of local instruction protection function, Business logic that can achieve full normal execution.
The first part is basically finished, including the current more popular new technology, that is, the third and fourth generation, but actually I would like to talk about the third and fourth generation of some shortcomings.
From the introduction of just look more beautiful, but by the third generation of the custom of the interpreter, at this time due to some interface to hook system, the application of ordinary developers to many fragments of the problems, to reinforce the private API calls to a large number of system security services, you may encounter problems more abrupt, SO basically in the third generation of the custom protection scheme of the interpreter, not to say that is not strong, but you may encounter fragmentation what is more, to the fourth generation but fragmentation are small, why people are promoting java2C protection scheme, because translate Java code into C code compiled into again SO, is completely in line with the development of many virtual machine specification, Such compatibility problems are minimal. Of course, there is a problem with these two scenarios. The compiled functions may become larger and less efficient, but these are some details that I won’t go into in detail today.
The second part describes the challenges of hardening in complex business scenarios.
Android from a few years ago, we are not particularly optimistic about an intelligent operating system to now become the world’s largest intelligent operating system, its own operating system is a continuous iteration, including the latest release of Android O this year, we can see its progress. On android development of various business actually has now become very complicated, compare with our alibaba company flagship types of applications, such as mobile phone taobao and alipay applications, application development process and development of the use of the technology of all kinds of technology, is no less than some of the above traditional PC above some of the more complex client development degree, So hardening also faces some challenges in complex business scenarios while serving these applications. I’ll give you a few brief introductions and give you two examples.
In the Hotpatch application scenario, if some bugs are found after your application is released to the user’s mobile phone, according to the traditional solution, the user can upgrade and update a version and install again to solve the problem. But for many complex applications, such as mobile phone taobao Tmall or pay treasure, itself a installation package is already more than 70 megabytes, such as there is a small bug, such as which display a page is wrong, let users to fully upgrade again, to download the upgrade, the user experience is not very good, Their business side started working on a technique, and when I found a problem with a particular piece of code in one place, I would just change that part of the code without the user having to install the entire distribution. Our original application has three classes, divided into A/B/C. When I have A problem, for example, I find A bug in A code. Suppose there is A bug in class B, I just need to make it into A Dex file, send it down, modify the Dex file in the original APP after hot deployment. Make it only A and C, these two classes, and then through some class loader for Dex order inside, after reaching executed first my hot deployment of the file, when you need to call B to B class – this class, to repair the purpose of the original application business logic bugs, or I would like to update to achieve the effect of the update, Of course, this scheme is currently running on Android or more, including Ali also issued hot patches. It used to be available on Apple as well, but apple doesn’t allow it on the whole. Google thinks it may restrict this technology completely in the P version next year, but there is no such restriction in the O version so far. However, this solution actually conflicts with the hardening principle. The DEX file may have some problems after hot deployment. This is an example of a complex scenario found during the development of the hardening service.
The second scenario is more familiar, due to business development more and more complex, development scale slightly bigger application, requires the deployment of plug-in, taobao integrates various services such as mobile phones and multiple team collaborative development at this time, it is inevitable to take advantage of what we call a pluggable thoughts, then developed step by step. By this time the traditional reinforcement is faced with a problem, first we mainly is to protect the main Dex file some Dex file under the root directory, the plug-in will be buried deep under the lib to reinforce the blind spots, as a generalization of a reinforcement scheme set all the Dex file protection, also hard to do but some is the core. In the two examples I just mentioned, both Alibaba and many other Chinese companies have solved the problem well. I won’t go into details.
The last part introduces the ideas that the industry thinks will be a way to protect Android Java code in the future.
The fourth generation java2C protection scheme is compiled into a standard SO file by the compiler. In fact, the developers of the standard SO file are well aware that the ELF format is also transparent. Equivalent to itself, the format is completely standardized, transparent, and can be reversed. Since you can translate DEX files into C code, you can also use a compiler that supports the VMP shell to compile so that is protected by the VMP shell.
Security risk control equipment, we do from principle for, if not particularly care about cost, in fact as long as the run on side of things, especially Google open source, in principle is actually able to crack, but you spend time length and crack of the cost of high and low, so in the mobile phone taobao and the application of the electricity, More and more, end-to-end protection is preferred for the combination of cloud and end-to-end protection. Many points are embedded in the end during packaging, either in the form of security ADK or its own reinforcement tool chain, to obtain a unique identifier on the end, which we call a device fingerprint. Something like an order would add a request for this device’s fingerprint, and then the server would use that fingerprint to do risk control, so for example, a device that we think is controlled by a malicious person could be blacklisted and not crash the app. For example, you can’t grab a red envelope, you can’t buy something, give the user a feedback, this kind of protection effect is quite good.
That’s the end of my sharing. Thank you for your listening. Thank you.
— — — — — — —