1. KOOM profile

2.KOOM basic usage process

3.KOOM dump trigger time

4.KOOM high-performance fork dump

5. Performance comparison of online collection


1. KOOM profile

KOOM is the first open source solution to the OOM problem of online memory overflow.

KOOM has been applied in Kuaishou full service, and the OOM rate has been reduced by more than 80%.


The OOM issue on the client is more difficult to deal with. General crash problem, as long as the acquisition of instantaneous crash data, such as exception type, stack, etc., is relatively easy to solve, while OOM is often the accumulation of multiple factors together, and has a great relationship with the user operation process. Without a mature online memory monitoring system, we can only use offline reproduction, which is very inefficient and cannot meet the demand.


The mainstream LeakCanary can optimize OOM problems for activities or fragments, but due to performance limitations, it cannot be used in large numbers online. It is used to locate problems offline. There are other solutions, such as ResouceCanary of Tencent Matrix, which is also optimized based on LeakCanary and fails to completely solve the performance problems in the monitoring process.


KOOM is also following the industry thinking, based on LeakCanary self-developed transformation, to complement the original shortcomings, to create a one-stop monitoring system.



After the client completes the memory monitoring, KOOM automatically or manually uploads the report to the cloud. The file is optimized and the size of the file is at the KB level. There is no perception when running, and the traffic occupation is very small, which can be applied on a large scale.


2.KOOM basic usage process

Making address:

Github.com/KwaiAppTeam…

Nearly 2000 Star


The resulting JSON report form, such as when CommonUtils holds an Activity

{ "analysisDone":true, "classInfos":[ { "className":"android.app.Activity", "instanceCount":4, "leakInstanceCount":3 }, { "className":"android.app.Fragment", "instanceCount":4, "leakInstanceCount":3 }, { "className":"android.graphics.Bitmap", "instanceCount":115, "leakInstanceCount":0 }, { "className":"libcore.util.NativeAllocationRegistry", "instanceCount":1513, "leakInstanceCount":0 }, { "className":"android.view.Window", "instanceCount":4, "leakInstanceCount":0 } ], "gcPaths":[ { "gcRoot":"Local variable in native code", "instanceCount":1, "leakReason":"Activity Leak", "path":[ { "declaredClass":"java.lang.Thread", "reference":"android.os.HandlerThread.contextClassLoader", "referenceType":"INSTANCE_FIELD" }, { "declaredClass":"java.lang.ClassLoader", "reference":"dalvik.system.PathClassLoader.runtimeInternalObjects", "referenceType":"INSTANCE_FIELD" }, { "declaredClass":"", "reference":"java.lang.Object[]", "referenceType":"ARRAY_ENTRY" }, { "declaredClass":"com.kwai.koom.demo.CommonUtils", "reference":"com.kwai.koom.demo.CommonUtils.context", "referenceType":"STATIC_FIELD" }, { "reference":"com.kwai.koom.demo.LeakActivity", "referenceType":"instance" } ], "signature":"569fc01daea06b6cc679bd61725affd163d022c3" } ], "runningInfo":{ "analysisReason":"RIGHT_NOW", AppVersion :"1.0", "buildModel":"MI 9 Transparent Edition", "currentPage":"LeakActivity", "dumpReason":"MANUAL_TRIGGER", "jvmMax":512, "jvmUsed":2, "koomVersion":1, "manufacture":"Xiaomi", "nowTime":"2021-09-07_16-07-34", "pss":32, "rss":123, "sdkInt":29, "threadCount":17, "usageSeconds":40, "vss":5674 } }Copy the code

Mainly: class information, GC reference path, basic running information


2.1 Access Mode

2.2 the initialization

2.3 Obtaining the Java-OOM Report

If the memory is abnormal, a JSON file will be generated after the memory image is collected and analyzed

The manual for

Real-time monitor report generation

Set the Uploader

    

2.4 Customize requirements

Configure KConfig. Set required parameters

The default heapRatio setting is adjusted to a more reasonable value based on the maximum memory

   

  • Application Memory >512 MB 80%
  • Application memory >256M Set it to 85%
  • Application memory >128 MB Set 90%

3.KOOM monitoring trigger time

LeakCanary and Matrix both trigger leak detection on the Activity’s onDestroy, while KOOM uses threshold detection.


KOOM switch to threshold detection, what are the benefits?

The traditional solution is to fire two GC’s after onDestroy() and check the reference queue to determine whether the Activity is leaking. However, frequent GC’s can cause a noticeable delay. KOOM has designed a new monitoring module for non-sensitive triggering, which triggers image collection through memory threshold monitoring with no performance loss.

If the determination of object leakage is delayed to the time of parsing, the threshold monitoring only needs to obtain several memory indicators of interest periodically in child threads, and the performance loss is ignored.

KOOM 1.1.0 source code overall structure

What do you have in mind

Analysis: Memory image parsing module, analysis of hprof files

Dump: the memory image collection module dumps hprof files

Monitor: memory monitoring module

Report: File report module

MonitorThread.java

  • Moniter.istrigger () Whether to trigger collection
  • Polling detection is performed

IsTrigger core method

Now that the triggering is done, it’s time to do the dump.

Schematic diagram of trigger timing process

4.KOOM high-performance collection fork dump

Dump Hprof, like LeakCanary, is implemented using the dumpHprofData API provided by the VM, and this process “freezes” the entire application process, rendering it unusable for several or even more than 10 seconds.

KOOM is a little different from LeakCanary and Matrix, which have basic application and offline monitoring because the whole dump process will affect the main process. KOOM has proposed the concept of fork dump. Memory leaks can be analyzed in dump without affecting the main process, so it is suitable to use online monitoring.


KOOM High performance dump:

KOOM uses Linux copy-on-write mechanism (COW) to solve this problem by forking the child process to dump the memory image. After the fork succeeds, the parent process immediately resumes running on the VM and the child process is not affected by the parent process’s data changes during the memory image dump.

COW mechanism:

Learn about two functions: fork() and exec(), which is a collective name for a set of related functions, including execl(), execlp(), execv(), execle(), and so on.

Fork () creates a child that is a copy of the parent;

Exec () reloads the program and empties data;

Fork () copies the parent’s data directly to the child, and then exec() is executed. The data segments and stacks between the parent and child are independent.

Copy-on-write Replication during write

To save the memory consumption and time consumption of the fork child process, the fork child process does not copy the memory of the parent process, but shares the memory space with the parent process. The system allocates new memory for the parent process only when the parent process writes to the parent process.

This means that the child process retains the memory image of the parent process during the fork, and subsequent memory modification by the parent process does not affect the child process.

After the above isTrigger to dump process, onTigger callback to dump process

DoHeapDump Enters the dump process

Dump () for KOOM forkjVMheapDumper.java

The copy-on-write fork creates a child process that shares memory space with the parent process. The mirror data is retained, and the child process dump does not affect the main process.

KOOM fork dump process diagram


The actual fork process:

To stop a VIRTUAL machine, you need to call the system library, but Google has restricted the use of the system library since Android 7.0. Based on this, Kuaishou developed its own kwai-linker component to circumvent this restriction


Then there is the parsing process

Add the result to GCPath

Different detectors have different isLeak strategies


5. Collection performance comparison

Memory image collection requires suspending the VM to ensure that the reference relationship does not change during the process of copying memory data to disk. The pause time is usually more than 10 seconds, which is unacceptable to users and is one of the reasons why LeakCanary is not recommended for use online.

KOOM uses Linux copy-on-write mechanism to fork child process dump, greatly improving dump efficiency.

Memory mirroring of online real users, common dump and fork child process dump block time comparison

The time difference is more than 100 times

Analytical performance optimization

KOOM does not use the LeakCanary1.0 HAHA parsing engine. When using the HAHA parsing engine, it is very OOM easy and the parsing speed is very slow. LeakCanary2.0 uses the Shark, which is used by KOOM for LeakCanary2.0.

compatibility

Android R or higher is not supported

Only AndroidX is supported, not Android Support Library

summary

The memory threshold detection method delays the determination of whether an object is leaking to the time of parsing, avoiding the traditional frequent active GC

The COW mechanism is used to fork child processes to perform dump operations, which greatly reduces the blocking time

The appendix

Blog.csdn.net/Kwai_tech/a… Kuaishou Technical team about OOM governance

Github.com/KwaiAppTeam… Lot of KOOM