preface
In our last talk about Android Memory Monitoring (1), we talked about LeakCanary, wechat’s Matirx, and Meituan’s Probe, which have different application scenarios. For example, in developing test environments, we would prefer to LeakCanary, Because it can provide the most perfect memory leak mechanism and the most detailed log, it is very convenient to locate problems. However, its shortcoming is that it has a great impact on performance. Therefore, if it is applied to the online production environment, we usually consider Matrix and Probe. In addition to providing Activity/Fragment object leak detection, Matrix also supports repeated Bitamp detection, and it also clips hprof files to greatly improve the success rate of uploads. However, the highlight of Probe is that it does not cut the source hprof file, but hooks the native method of generating hprof and directly generates the trimmed Hprof. This optimization can reduce the memory occupation of analyzing hprof file and provide the success rate of analysis. Unfortunately, Probe is a closed source project, so there’s no way to use it.
About dumpHprofData
Both Matrix and Probe have a pain point. They do not solve the problem of stuck blocking caused by calling debug.dumphprofData (). DumpHprofData is used to generate hprof files. Users are not able to operate, as for why will bring such an impact, we can go through the source code to find out.
Debug.dumphprofdata () will eventually be called to native methods via JNI:
// art/runtime/native/dalvik_system_VMDebug.cc
static void VMDebug_dumpHprofData(JNIEnv* env, jclass, jstring javaFilename, jint javaFd) {
// Only one of these may be null.
// Ignore some judgment code
hprof::DumpHeap(filename.c_str(), fd, false);
}
// art/runtime/hprof/hprof.cc
void DumpHeap(const char* filename, int fd, bool direct_to_ddms) {
// Ignore some judgment code
ScopedSuspendAll ssa(__FUNCTION__, true /* long suspend */);
Hprof hprof(filename, fd, direct_to_ddms);
// The Dump operation starts
hprof.Dump(a); }Copy the code
A ScopedSuspendAll object is constructed to suspend all threads before resuming them in the destructor:
// Suspend all threads
ScopedSuspendAll::ScopedSuspendAll(const char* cause, bool long_suspend) {
Runtime::Current() - >GetThreadList() - >SuspendAll(cause, long_suspend);
}
// Restore all threads
ScopedSuspendAll::~ScopedSuspendAll() {
Runtime::Current() - >GetThreadList() - >ResumeAll(a); }Copy the code
This suspension operation is a great harm to the user experience, which can be avoided through clever ways, such as opening a new process to display the loading page, and APP retreating to the background to perform dump, etc., but this problem is not really solved.
KOOM
Recently, Kuaishou open source a memory monitoring library called KOOM, this library has many highlights:
- The native method of generating Hprof by Hook is proposed in Probe: OOM problem locating component on Android online
- To solve the
dumpHprofData
Method blocking problem - The Hprof analysis library shark used in LeakCanary2 is used
Tailoring hprof
KOOM implements PLT hook through xhook, and implements clipping hprof through hook two virtual machine methods open() and writ() :
JNIEXPORT void JNICALL
Java_com_kwai_koom_javaoom_dump_StripHprofHeapDumper_initStripDump(JNIEnv *env, jobject jObject) {
hprofFd = - 1;
hprofName = nullptr;
isDumpHookSucc = false;
xhook_enable_debug(0);
/** ** android 7.x, write method in libc.so * android 8-9, write method in libart.so * android 10, The write method in libartbase.so * libbase.so is an insurance operation in case neither of the first two so hooks fail (: * * For Android 7-10, the open method is in libart.so * libbase.so and libartbase.so, for insurance operation */
xhook_register("libart.so"."open", (void *)hook_open, nullptr);
xhook_register("libbase.so"."open", (void *)hook_open, nullptr);
xhook_register("libartbase.so"."open", (void *)hook_open, nullptr);
xhook_register("libc.so"."write", (void *)hook_write, nullptr);
xhook_register("libart.so"."write", (void *)hook_write, nullptr);
xhook_register("libbase.so"."write", (void *)hook_write, nullptr);
xhook_register("libartbase.so"."write", (void *)hook_write, nullptr);
xhook_refresh(0);
xhook_clear(a); }Copy the code
You can see that the location of the hook varies from Android version to Android version.
After the hook method, we implement the clipping of the specified hprof file, so we mark it with the JNI method hprofName() first:
JNIEXPORT void JNICALL Java_com_kwai_koom_javaoom_dump_StripHprofHeapDumper_hprofName( JNIEnv *env, jobject jObject, jstring name) {
hprofName = (char *)env->GetStringUTFChars(name, (jboolean *)false);
}
Copy the code
Next, get the FD of the file in the hook_open() method:
int hook_open(const char *pathname, int flags, ...) {
va_list ap;
va_start(ap, flags);
int fd = open(pathname, flags, ap);
va_end(ap);
if (hprofName == nullptr) {
return fd;
}
if(pathname ! =nullptr && strstr(pathname, hprofName)) {
/ / for FD
hprofFd = fd;
isDumpHookSucc = true;
}
return fd;
}
Copy the code
Finally, filter out the data to be clipped in the hook_write() method. We won’t discuss the details of the code here, but you can look at the source code if you are interested.
Resolve Dump blocking
As mentioned above, the dumpHprofData method suspends all threads in the process and then resumes, so even putting this operation on an asynchronous thread cannot solve the problem.
KOOM solves this problem in a nice way by forking the dumpHprofData method. The fork process uses the “Copy On Write” technique to allocate a separate Copy of memory for the child only when writing. By default, the child can share the same memory space as the parent process, so when we execute dumpHprofData, The dumpHprofData method is executed in the child process, and the parent process can continue to run normally.
try {
int pid = trySuspendVMThenFork();
if (pid == 0) {
Debug.dumpHprofData(path);
KLog.i(TAG, "notifyDumped:" + dumpRes);
//System.exit(0);
exitProcess();
} else {
resumeVM();
dumpRes = waitDumping(pid);
KLog.i(TAG, "hprof pid:" + pid + " dumped: "+ path); }}catch (Exception e) {
e.printStackTrace();
}
Copy the code
TrySuspendVMThenFork is a JNI method that suspends a thread and forks:
JNIEXPORT jint JNICALL Java_com_kwai_koom_javaoom_dump_ForkJvmHeapDumper_trySuspendVMThenFork( JNIEnv *env, jobject jObject) {
if (suspendVM == nullptr) {
initForkVMSymbols(a); }if(suspendVM ! =nullptr) {
suspendVM(a); }return fork();
}
bool initForkVMSymbols(a) {
bool res = false;
void *libHandle = kwai::linker::DlFcn::dlopen("libart.so", RTLD_NOW);
if (libHandle == nullptr) {
return res;
}
suspendVM = (void (*)())kwai::linker::DlFcn::dlsym(libHandle, "_ZN3art3Dbg9SuspendVMEv");
if (suspendVM == nullptr) {
__android_log_print(ANDROID_LOG_ERROR, "KOOM"."suspendVM is null!");
}
resumeVM = (void (*)())kwai::linker::DlFcn::dlsym(libHandle, "_ZN3art3Dbg8ResumeVMEv");
if (resumeVM == nullptr) {
__android_log_print(ANDROID_LOG_ERROR, "KOOM"."resumeVM is null!");
}
kwai::linker::DlFcn::dlclose(libHandle);
returnsuspendVM ! =nullptr&& resumeVM ! =nullptr;
}
Copy the code
Before execution, initForkVMSymbols() is called to initialize, in this case to obtain suspendVM and resumeVM method references, and then suspendVM is called to suspend all threads of the parent process before fork.
Fork (pid == 0); fork (pid == 0); dumpHprofData ();
if (pid == 0) {
Debug.dumpHprofData(path);
KLog.i(TAG, "notifyDumped:" + dumpRes);
//System.exit(0);
exitProcess();
}
Copy the code
After the dumpHprofData method is executed, exit and close the subprocess to save resources.
When the pid! = 0 yes, this is in the parent process, in which case we only need to resume the suspended thread:
resumeVM();
dumpRes = waitDumping(pid);
KLog.i(TAG, "hprof pid:" + pid + " dumped: " + path);
Copy the code
Meanwhile, we continue to wait for the child to finish the Dump operation in the asynchronous thread.
Here’s a flowchart from the KOOM team:
In this way, we can optimize the time it takes to fork the child process when the main process blocks by calling the dumpHprofData method. Here’s another benchmark from the KOOM team:
The use of Shark
LeakCanary2 has rewritten a library for parsing hprof files, called Shark, to replace the original HAHA, which is officially 10 times less memory and six times faster than Haha. In addition to using Shark to parse, KOOM has also made some optimizations based on this to reduce memory usage.
summary
The emergence of KOOM provides a solution to a major obstacle in the use of online memory monitoring tools. We can develop an online memory monitoring tool of our own based on KOOM and combining the advantages of Matrix and Probe.