Hello, everyone, I am Eight two, this year is a special shiye year for the game industry, anti-addiction, privacy compliance afflicts countless practitioners. As a programmer who can afford to eat, I need to do some research on policy. Anti-addiction channels have launched their own SDK, it is not good, the country also gave the API, nothing to say. But it hurts to get kicked back for compliance issues. More painful is, if the channel did not hit back, online by the country found violations, you can directly go to the financial department to get the “bonus”. In line with the idea of not bothering the little sister in the financial department of the company, Baytwo students have been researching a self-inspection mechanism for compliance schemes since early on.


Project research

Before throwing out the plan, I take you to think about the common plan, if you think about yourself, you can directly skip to see the plan part. First of all, as we all know, detection code is divided into dynamic detection and static detection. But the country said, sensitive permissions is not not to call you, you just tell the user clearly why you adjust, the user agreed to the privacy agreement, you adjust how you like. Baytwo has studied for a long time. It is not difficult to find sensitive calls in static detection. What is difficult is to judge the call chain of sensitive information and then judge whether it is compliant or not through the call chain. And in case CP confused the code, add a solid or something, nothing can be swept. Analysis down the static detection like ice in the spring opened up, is not impossible, but the technical difficulty is extremely high and unnecessary. As a lazy programmer, let’s give up and take a look at dynamic detection next door

The core meaning of Gradle Transform and Xposed. Gradle Transform is to replace the code at compile time, the sensitive API code, replace into some can be detected with marked code, Tag code can be a warning log, buried point upload backend, however you like, how to come. The advantages and disadvantages of this plan are obvious. As a game industry person, a lot of times “compile time” doesn’t matter to me, and convincing other departments upstream and downstream to plug in a set of plug-ins is as hard as manually checking it out, and most of the time, it’s other companies that have to be convinced. A simple Gradle Transform solution is not suitable for our miserable game publishing company.

Of course, drawing on the idea of Gradle Transform, we can decomcompile the final package, scan and replace sensitive API, and then run the APP for detection. This variant scheme is feasible. Unfortunately, the inspection time will be a little longer, and after we solve all the problems that decompilation and recompilation may encounter, we still need to prove that the decompilation package has compliance problems, and whether the original package must have compliance problems. She’s nice, but she looks like a rich girl. I don’t deserve her. It is recommended that those who already have a mature decompilation framework and code scanning framework at home try to catch up with her.

How about we take a look at xposed scheme? Xposed itself as an excellent Android Hook framework, indeed can solve the problem of Running Gradle Transform at compile time and in the detection of APK is the need to be detected APK. She is very beautiful, but she has long retired from the world and no longer talks about marriage. Exposed and Miss Tai Chi, who came out of her home, were too young to be able to catch up, either to endure the tedious testing work or to be too stubborn to carry out the necessary automated training.


Eight two schemes

Frida is a full-platform Hook framework. Its principle is to create a daemon process on the target machine. When the Hook process is started, the call memory of the process is modified to call the Hook function defined by ourselves. As we all know, the underlying Android system is also Linux system, and the function call of Android system is the access to a specific process address in essence, Frida is in this access process, through the daemon cheated the Android system, to replace.

Not surprisingly, root permissions are required to do this, although Frida also provides a way to Hook so into the APK package to apply permissions. However, in line with the idea that root mobile phone can be bought now and it takes time and effort to repackage, we adopt the method of Hook directly by opening the daemon process on the target device.

After the framework is determined, let’s clarify our goal. There is a lot of work to be done in compliance. Our initial assumption is that APP will not call any sensitive system API before APK pops up user privacy policy and users agree with it. The solution is to Hook all sensitive system APIS through Frida daemon, and inject our buried code into sensitive system APIS. Once called, we will output special logs. Here, buried code needs to output call stack, and then output detection report through call stack for troubleshooting (SHIbi) problems.

With that in mind, let’s get started: Frida is divided into server and client. The server is the daemon that resides on the mobile device, and the client is the device that requests the server to detect the specified process and inject the replaced code into the server. It doesn’t matter if you don’t know. It doesn’t matter. Just remember to start the daemon before you can use Frida Hook.


Install the Frida client

The Frida client uses JS code to inject Hook functions into the Frida server, but for privacy detection systems, we don’t need very complicated Hook logic. Just check whether the target function is called and print its call stack. Before we get started, we noticed that Frida itself provides a Python wrapper on top of JS and a frida-Trace library just right for our needs. The installation is simple, if you have python:

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple frida
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple frida-tools
Copy the code

After installing, run frida –version to check and see if you can see the version


Configure the Frida server

To configure, we need to do two things: get a root phone and download the Frida server. Root mobile phone please bring your own, frida server to download here: github.com/frida/frida… Find frida-server-xx.x.x-Android-YY XX to indicate the version number, and yy to indicate the framework. Download the supporting mobile phone architecture. Download and unzip:

unxz frida-server.xz
Copy the code

Adb then connects to your root phone and starts the daemon by running the following command

Adb push frida-server /data/local/ TMP / Adb shell command run su # to switch to root chmod 755 /data/local/ TMP /frida-server /data/local/ TMP /frida-server &Copy the code

After that, you can exit adb shell normally. For a quick smoke test, open your computer terminal and run Frida-ps-U. Normally, you can see all the processes on your phone on the terminal.

Detection-sensitive API

The first step is to check whether sensitive system methods are being called. This is simple, and frida-trace supports it naturally. This is in the frida-Tools we installed. Mac address of one of the get method is android.net.wifi.WifiInfo.getMacAddress, I run a command directly:

frida-trace -U -f com.frida.test -j 'android.net.wifi.WifiInfo*! *getMacAddress'Copy the code

(Don’t forget the single quotation marks around the method) You will find that the com.frida.test application is started on the mobile phone. If you manually trigger the operation of obtaining IMEI, you can see it in the terminal:

Instrumenting...                                                        
WifiInfo.getMacAddress: Auto-generated handler at "/Users/kaelluo/Documents/__handlers__/android.net.wifi.WifiInfo/getMacAddress.js"
Started tracing 1 function. Press Ctrl+C to stop.                       
           /* TID 0x4698 */
  9464 ms  WifiInfo.getMacAddress()
  9465 ms  <= "02:00:00:00:00:00"
Copy the code

The last two lines give the time when the APP went to get the Mac address. If there is correlation output, it means that the correlation was detected. It is also easy to monitor multiple methods simultaneously

frida-trace -U -f com.frida.test -j 'android.net.wifi.WifiInfo*! *getMacAddress' -j 'another.class*! *method'Copy the code

At this point, the core logic of a privacy detection system is complete, and you can detect any sensitive API calls you care about.


Output sensitive API stack

Wait, it seems that we are still missing something, although it is easy to know whether the APP calls sensitive API, but do not know the call stack, how to check the information. To output stack information, we need to know a little bit about how the Frida client works, or about the encapsulation of the Frida-Trace tool. For those of you who are careful, you may have noticed that after we run frida-trace, you can find a __handlers__ folder in the directory where you run it. Open and see, there is a android.net.wifi.WifiInfo directory structure, including a getMacAddress. Js file. Yes, that’s right. The Frida client creates some JS code through the Python API and injects the JS code into the Frida server. The Frida server parses and processes the JS code, and when it detects that the target method is called, it executes an additional replacement function.

This is easy to do, just find the place where frida-trace generated the JS code and modify it. How to do this? It’s not hard to: The first step is to try to find the source code for frida-trace. Create a Random Python project, import frida-tools, hyperlink it, and easily find a tracer.py file. The _create_stub_native_handler and _create_stub_JAVa_handler methods were quickly found. Obviously _create_stub_JAVA_handler is what we’re looking for. The onEnter method in the string is used as the return value. The onEnter method in the string is used as the return value. The onEnter method is used as the return value of the function.

onEnter(log, args, state) {
  log(`%(display_name)s(${args.map(JSON.stringify).join(', ')})`);
},
Copy the code

Let’s add print stack information:

onEnter(log, args, state) { var Log = Java.use('android.util.Log'); var Exception = Java.use('java.lang.Exception'); var String = Java.use('java.lang.String') var stack = String.valueOf(Log.getStackTraceString(Exception.$new())).replaceAll("\\n", 'newLine'); log(`%(display_name)s(${args.map(JSON.stringify).join(', ')})` + "policy_stacktrace: " + stack.replaceAll("/(? :\\r\\n|\\r|\\n)/g", 'newLine')); },Copy the code

At this point, run our modified trace method to print the function call stack at the terminal.


Advantages and Limitations

The advantage is simple: once you configure the system, you can detect all Android apps, whether they are developed by our family or downloaded from anywhere in the market, as long as they work properly on the phone, without any modifications to the APK itself. The ability to output the call stack provides plenty of ammunition for the debate over which company, which department, which should change. As for limitations, for compliance, this scheme only detects whether users violated the sensitive API before agreeing to the privacy policy. There is no analysis of whether the privacy policy text omitted to declare permissions, omitted to declare third-party SDKS (this is another story). This is the limitation of the scheme for compliance itself. Another is the limitation of dynamic detection itself. Dynamic detection can only determine whether there is a violation within the detection time of the dynamic operation, which will lead to the omission of some violation events, such as weak network and other special scenarios that trigger the violation, which may not be detected. Although I think Wu Zuwen is big, can he stay big? Can he guarantee that he won’t be small in other circumstances? Something like this. Of course, we can improve the fault tolerance rate by simulating different scenarios such as weak network for multiple detection, or extending the detection time, although it will never reach 100%. But as long as we do this more strictly than the state, there will be no problem. This wave of national compliance, we game industry compatriots including APP industry compatriots together.


conclusion

This paper only Outlines the general ideas and core methods for sensitive API detection. In practical implementation, we need to solve many problems. For example, when detecting many methods at the same time, the detection process will have a delay of 200-400ms, while the APP may start faster. Some such delays running directly in Application#onCreate might be missed. You also need to deal with when to stop testing, how to use the log output test report and tested application installed on mobile phones, at the same time app testing need to line up what you want, but python is simple and effective language features as well as many third party libraries, can let you quickly write to run not how fast but the stability of the program, Thus solve the detection of upstream and downstream some complex work. If you are interested, you can also consider publishing your current privacy detection system as an open source or open service when you are free.