background

One of the company’s apps was recently rejected after it was pointed out that IO reads and writes were too frequent. However, IO read and write operations are very fragmented, and many third-party frameworks have write operations, so it is very difficult to monitor and modify. Is there a very simple way to help us locate this problem?

After that, I referred to the IOCanary monitoring component of Tencent’s Matrix. Its principle is to hook the IO read/write operation through the hook(ELF Hook) mechanism, and then print out the call stack, so as to help the developers locate problems.

Generally speaking, an Apm(Application Performance Monitor) system is divided into several parts, such as development stage tools, test stage tools and online data collection, etc. IO monitoring is one of the development and test phase tools.

IOCanary principle analysis

Before we move on to the introduction of IOCanary, let’s first introduce some strange and dark technologies that allow us to complete IO monitoring systems and explain how IOCanary is implemented.

Dynamic Hook

At this point, you might think I’m going to write about Aop slicing or something. But sorry you guessed wrong, there are many other ways to do plug-free Hook calls. Aop slicing does, after all, do bytecode modification, and is a bit too complex for a debugging tool.

A brief introduction to dynamic Hook, we can through the mechanism of Art virtual machine, before and after a method call Hook operation, and then we need some dynamic monitoring operations, has reached our ability to monitor the code dynamically. Since Hook is at the virtual machine level, we can monitor not only our own code, but also all calls from third-party libraries and even source code.

Xposed, for example, but this framework depends on the Root of the mobile phone. In addition, Epic can also achieve dynamic Hook on Android, and I heard that Tencent’s IOCanary is referring to the principle of IQiyi’s xHook.

From the ART method call principle described above, you can get a natural Hook method ———— to directly replace EntryPoint. By replacing entryPoint of the ArtMethod object corresponding to the original method with entryPoint of the target method, the original method can get the entry of the target method when fetching entryPoint, and then jump directly to the code segment of the target method. So as to achieve the purpose of Hook.

The above is the introduction of Epic, you can read this article directly if you are interested. I’ll continue Dexposed for a second — on Method AOP implementation at runtime on ART. This article was shared by the author himself, and there was an introduction to its principle, but I did not understand this part. As an Android geek, I still go for OOP because the bar is too high for me to work with.

IOCanary monitoring

Does monitoring IO mean that there only needs to be a way to monitor the write/read flow of files? Let’s take a brief look at how Tencent’s Matrix IOCanary is implemented.

The USE of HOOK (ELF Hook) scheme to collect IO information, code non-intrusion, so that developers can access without perception. The program mainly adopts four key file operation interfaces of Hook OS POSIX:

int open(const char *pathname, int flags, mode_t mode);// Return fd on success
ssize_t read(int fd, void *buf, size_t size);
ssize_t write(int fd, const void *buf, size_t size);
int close(int fd);
Copy the code

As seen above, most of the key operation information can be obtained through hook interfaces. Here is an open example to illustrate the principle. For simplicity, I will only use the Android M code and the most commonly used FileInputStream analysis. The key is to find out where POSIX Open is called. From top to bottom, the following general call relationship:

Open -> libcore.os. open -> blockguardos. open -> posix. open ↓ jni: libcore_io_Posix.cpp 
static jobject Posix_open(...). {...int fd = throwIfMinusOne(env, "open".TEMP_FAILURE_RETRY(open(path.c_str(), flags, mode))); . }Copy the code

As you can see, the Android framework’s FileInputStream is finally switched to POSIX’s open interface in libcore_io_posix. CPP. Nativecode. mk = LOCAL_MODULE := libJavacore

So just hook libjavore. So open symbol ok. The purpose of finding hook target SO is to minimize the influence range of hook as much as possible. Similarly, write, read, and close are similar. Different Versions of Android will have some holes to fill in, I will not go into details here, currently compatible with Android P.

In this way, you can collect information related to file reading and writing, such as file path, FD, and buffer size, as well as time consuming and operation times. Based on this information, some strategies can be set for detection and judgment.

The C++ code is basically similar to xhook. The easier class to learn is io_canary_jni.cc.

namespace iocanary {
	// Hook the three core so packages, where all IO streaming operations are in these three so.
    const static char *TARGET_MODULES[] = {
            "libopenjdkjvm.so"."libjavacore.so"."libopenjdk.so"
    };
    
    // Open operation of hook stream
  	int ProxyOpen(const char *pathname, int flags, mode_t mode) {... }// jni enable dynamic hook through xhook, hook IO open write close etc
    Java_com_bilibili_apm_io_core_IOCanaryJniBridge_doHook(JNIEnv *env, jclass type) {
        __android_log_print(ANDROID_LOG_INFO, kTag, "doHook");

        for (int i = 0; i < TARGET_MODULE_COUNT; ++i) {
            const char *so_name = TARGET_MODULES[i];
            __android_log_print(ANDROID_LOG_INFO, kTag, "try to hook function in %s.", so_name);
			// Pass the so package above to xhook
            void *soinfo = xhook_elf_open(so_name);
            if(! soinfo) { __android_log_print(ANDROID_LOG_WARN, kTag,"Failure to open %s, try next.",
                                    so_name);
                continue;
            }
			// IO opens the operation and proxies it as a custom method of the current class
            xhook_hook_symbol(soinfo, "open", (void *) ProxyOpen, (void **) &original_open);
            xhook_hook_symbol(soinfo, "open64", (void *) ProxyOpen64, (void **) &original_open64);
			
            bool is_libjavacore = (strstr(so_name, "libjavacore.so") != nullptr);
            if (is_libjavacore) {
            	// hook read operation
                if (xhook_hook_symbol(soinfo, "read", (void *) ProxyRead,
                                      (void**) &original_read) ! =0) {
                    __android_log_print(ANDROID_LOG_WARN, kTag,
                                        "doHook hook read failed, try __read_chk");
                    if (xhook_hook_symbol(soinfo, "__read_chk", (void *) ProxyReadChk,
                                          (void**) &original_read_chk) ! =0) {
                        __android_log_print(ANDROID_LOG_WARN, kTag,
                                            "doHook hook failed: __read_chk");
                        xhook_elf_close(soinfo);
                        returnJNI_FALSE; }}// hook write operation
                if (xhook_hook_symbol(soinfo, "write", (void *) ProxyWrite,
                                      (void**) &original_write) ! =0) {
                    __android_log_print(ANDROID_LOG_WARN, kTag,
                                        "doHook hook write failed, try __write_chk");
                    if (xhook_hook_symbol(soinfo, "__write_chk", (void *) ProxyWriteChk,
                                          (void**) &original_write_chk) ! =0) {
                        __android_log_print(ANDROID_LOG_WARN, kTag,
                                            "doHook hook failed: __write_chk");
                        xhook_elf_close(soinfo);
                        returnJNI_FALSE; }}}// close the hook operation
            xhook_hook_symbol(soinfo, "close", (void *) ProxyClose, (void **) &original_close);

            xhook_elf_close(soinfo);
        }

        __android_log_print(ANDROID_LOG_INFO, kTag, "doHook done.");
        returnJNI_TRUE; }}Copy the code

The above is the official description of Matrix of Tencent. I just copied it briefly. In fact, the principle is basically similar to the Epic framework we introduced at the beginning. Through the implementation of dynamic Hook, we can dynamically monitor some methods.

Here is a simple list of the OVERALL SDK process:

1. Initialize the IOCanaryJniBridge and complete the basic initialization.

2. The JNI calls Native xhook code, hook Native so libopenjdkjvm. So, libjavacore. So, libopenjdk. So the open write read close method.

  1. When the read/write operation is invoked, the Java method record is invoked through JNI Native.

  2. When the close method is triggered, an IO data structure is recorded.

Secondary encapsulation is performed on the basis of IOCanary

The IOCanary of Matrix is only compatible with Android9, so we actually encounter a lot of problems in the actual use. At the same time, because of the insecurity and instability of hook, it is recommended that you do not bring this function to the line, but in the debug version of a debugging ability.

In practice, the IOCanary only monitors I/O reads and writes for the main thread, not enough to help us locate all I/O reads and writes in the project, so we redeveloped it.

  1. Remove thread judgment logic
  2. Change the IO stack from close to open
  3. All stream writes are summarized at the Java layer and the write size is calculated uniformly.

Remove main thread judgment

    ssize_t ProxyWriteChk(int fd, const void *buf, size_t count, size_t buf_size) {
      /* if (! IsMainThread()) { return original_write_chk(fd, buf, count, buf_size); } * /

        int64_t start = GetTickCountMicros(a);ssize_t ret = original_write_chk(fd, buf, count, buf_size);

        long write_cost_us = GetTickCountMicros() - start;

        __android_log_print(ANDROID_LOG_DEBUG, kTag,
                            "ProxyWrite fd:%d buf:%p size:%d ret:%d cost:%d", fd, buf, buf_size,
                            ret,
                            write_cost_us);

        iocanary::IOCanary::Get().OnWrite(fd, buf, count, ret, write_cost_us);

        return ret;
    }
Copy the code

In the c++ code for io_canary_jni.cc, we simply mask the thread checking logic in several proxy methods. This will fetch all thread IO operations.

Stack print

In the Matrix IOCanary, there is an IOCanaryJniBridge, which is the class that jNI calls. It also has another function, which is to convert the stack in the IO operation to hook.

An entity class is constructed to throw an exception that is responsible for retrieving the stack information for the current IO operation. Because the order of code calls is actually collected inside the thread, this construct is executed inside the Open method that our IO monitors.

    private static final class JavaContext {
        private final String stack;
        private String threadName;

        private JavaContext(a) {
            stack = IOCanaryUtil.getThrowableStack(new Throwable());
            if (null! = Thread.currentThread()) { threadName = Thread.currentThread().getName(); } HasakiLog.i(TAG,"JavaContext:"+ threadName); }}Copy the code

The Matrix IOCanary will report the JavaContext object only when a stream Close is generated, which will contain the memory stack. However, we found in the actual test that the IO Close operation of xHook is not triggered well on the device of higher version, which I am really not good at. So we printed the stack at construction time.

Sizing adjustment

Due to the actual development, we encountered many devices because the Close function is not triggered, resulting in the IO monitoring data is not accurate. We added additional JNI calls to the write function.


// Declare the writeFrame method
 static jmethodID kMethodIDWriteFrame;
 
 // Get the writeFrame method of the Java class via JNI
 kMethodIDWriteFrame = env->GetStaticMethodID(kJavaBridgeClass, "writeFrame"."(Ljava/lang/String; Ljava/lang/String;) V");
 
 void writeFrame(int frame, long size) {
        JNIEnv *env = NULL;
        // Check whether the instance exists
        kJvm->GetEnv((void **) &env, JNI_VERSION_1_6);
        if (env == NULL| |! kInitSuc) { __android_log_print(ANDROID_LOG_ERROR, kTag,"writeFrame env null or kInitSuc:%d",
                                kInitSuc);
        } else {
        	// Call Java method records for write time and write size
            __android_log_print(ANDROID_LOG_DEBUG, kTag,
                                "writeFrame size:%d frame:%d", size, frame);
            char charSize[256];
            char charFrame[256];
            sprintf(charSize, "%d", size);
            sprintf(charFrame, "%d", frame);
            jstring str1 = env->NewStringUTF(charFrame);
            jstring str2 = env->NewStringUTF(charSize);
            env->CallStaticVoidMethod(kJavaBridgeClass, kMethodIDWriteFrame, str1, str2); }}Copy the code

We made a partial change in the proxyWrite method to summarize all write sizes and times in the Java layer. As a result of the limitation of written to let go of the thread, so we put this part of the record operation in a Executors. NewSingleThreadExecutor () in the record.

conclusion

As a c++ chicken, I only know how to use these hook frameworks, but I still have no idea about the principles and how to optimize them. After changing this monitoring, at least let me have a more in-depth understanding of JNI call, but still have to sigh with emotion, the water is too deep, do not bully me 7 old Android, so once is too fast.