Today, we’ll take a look at garbage collection in Android.
As you should know, the JVM and Dalvik garbage collection mechanisms are not exactly the same. The garbage collection mechanism has always been a necessary skill in work and interview. Only with a deep understanding of GC can we better reduce the occurrence of GC at the code level. After all, each GC will cause a certain delay to the main thread operation, thus affecting the user experience.
1 introduction
This section will introduce the specific idea of “garbage collection” under the optimization of alipay Android client startup speed.
Application startup time is an important user experience link for mobile apps. Compared with ordinary mobile apps, Alipay is too large, which will inevitably affect the startup speed. Some conventional optimization methods have been perfected in Alipay, so this paper attempts to further optimize the startup speed of Alipay from the GC level.
2 background
Compared with C, Java language has some features, such as the developer does not have to consider the allocation and reclamation of memory, however, process memory management is an essential part of the compromise result, Java language designers put object allocation and reclamation in the Java virtual machine, here we want to clear a concept: There is a cost to GC, and this cost includes: Google engineers are aware of the impact of GC on applications, so they output GC logs to Logcat by default. We often see the following types of GC logs in Logcat:
1.GC_EXPLICIT: Dalivk provides an API for developers to actively trigger GC. Readers can see the design of Google Maps to see how this API is used
2.GC_FOR _ALLOCK: IS the GC triggered when the allocation of objects fails, and this GC suspends all Java threads applied until the GC ends.
3.GC_CONCURRENT: Is a GC triggered by the Java VIRTUAL machine based on the current state of the heap. This GC runs in the Dalvik separate GC thread and does not affect the running of the application Java thread for part of the time. Alipay startup is a typical critical path scenario where we would like to see as little GC_ CONCURRENT as possible (GC_ FOR_ ALLOCK should also be reduced to a minimum if possible), however, Through Logcat we can see very bad GC behavior – a large number of GC_ FOR_ ALLOCK and an alarming Java thread blocked by WAIT_ FOR_ CONCURRENT_ GC. As shown in the figure below, by simply counting the time consumed by these GC, we can conclude that GC has a significant impact on application startup time.
3 Design Idea
Alipay is an application of Android system. How can it shorten startup time by affecting Dalvik’s GC behavior? This question can be broken down into two steps:
1. Whether Alipay can influence Dalvik’s behavior
2. How to improve Dalvik and shorten startup time
The answer to the first question is yes. The design idea of Android system is that each Android application has an independent Dalvik instance, and the code and data in its own process space can be modified after the application is started. Therefore, Alipay influences Dalvik’s behavior by modifying Dalvik library file libdvm.so in memory.
The difficulty of the second problem lies in the input-output ratio: the code and data to modify the process space are binary oriented, which is much more difficult than the source code, that is to say, it is impossible to improve Dalvik in a slightly complicated way.
Based on the above two points, this paper proposes an assumption: GC suppression at startup, allowing heap growth until the developer stops GC suppression voluntarily or OOM stops GC suppression. This is a “space for time” strategy, in which more memory consumption is used to exchange for shorter startup time. This strategy is feasible on two premises: Firstly, the device manufacturer did not encrypt the Dalvik library file in memory, and secondly, the device manufacturer did not change the Dalvik source code of Google (or a small amount of changes). Theoretically, all devices can be covered by whitelist, but the implementation and maintenance costs are very high.
Implementation of 4GC suppression
The premise of GC suppression is that Dalvik is familiar with it and knows how to change GC behavior. The solution is as follows: First, find the changes to suppress GC at the source level, such as changing the branch. Second, find the “instruction fingerprint” in the binary code for the conditional branch A, and the binary code used to change the branch, say override_A. Scan the libdvm.so in memory after the application is started. Locate the modification position according to the “instruction fingerprint” and overwrite it with override_A. It should be noted here that the definition of “instruction fingerprint” requires some knowledge of the compiler and arm instruction set. GC suppression is mainly implemented in the following four parts:
-
Cancel softLimit detection
-
Unwake the GC thread
-
Cancel the GC routine function
-
OOM Stop GC suppression
-
Cancel softLimit detection:
The purpose of canceling softLimit detection is to maximize the allocation of objects. The following figure shows the arm instruction fragment corresponding to softLimit check, located in dvmHeapSourceAlloc function, OXE057 corresponds to the branch of “Return NULL”. If we want to never enter the “return NULL” branch, we can change the result of the CMP instruction. In the implementation, we recognize “0X42” as the “instruction fingerprint” and change it to “CMP r0, r0” to disable the SoftLimit check.
- Unwake GC thread
The purpose of canceling GC thread wake up is to prevent thread jitter caused by frequent GC thread wake up. Below is the corresponding C++ code and arm instruction fragment, which is also in the dvmHeapSourceAlloc function. In the implementation, we scan the dynstr, dynsym, rel. PLT, and PLT fields of libdvVM. So to get the address of pthreadcondsignal@plt, and then traverse all branch hops in dvmHeapSourceAlloc. Calculates the forward destination ADDRESS.
If pthreadcondsignal@plt and the current branch destination address are found, delete this command.
- Cancel the GC routine function
To cancel GC routine function, hook technology is adopted. GC suppression is encapsulated into two native interfaces, doStartSuppressGC and doStopSuppressGC. And further encapsulated as a JNI interface, easy for developers to call in Java. The general application method is that the developer sees through the log that Alipay will trigger a large number of GC in a certain scene and this GC will affect the user experience (slow response time or animation lag), and then inserts doStartSuppressGC and doStopSuppressGC before and after this scene.
Taking the cold start scenario of Alipay as an example, doStartSuppressGC was inserted into the attachBaseContext function of container Quinox, and doStopSuppressGC was inserted at the end of home page loading.
- OOM Stop GC suppression
If you only consider suppressing GC during alipay startup, you do not need to consider OOM’s implementation of stopping GC suppression, as Alipay startup is not enough to trigger OOM. But we expect GC suppression to be a base module that can be applied to more scenarios. If the program triggers OOM before calling doStopSuppressGC, GC suppression needs to be stopped before OOM occurs. Different from simply changing the direction of the branch, we need to inject a new branch before OOM occurs. The code of this new branch is implemented by us. The main function of the new branch is to call doStopSuppressGC, remove the injected new branch, and finally jump back to Dalvik and execute OOM.
- The implementation also uses traditional hook technology. In dvmCollectGarbageInternal hook function:
When the conditions are not met, it can directly return to achieve the purpose of canceling GC.
Condition is met, and perform the original dvmCollectGarbageInternal cancel hook.
The implementation uses an open source binary injection framework: github.com/crmulliner/… .
It is important to note that the performance overhead of using the pre_hook and post_hook provided by this framework in hot functions is very high. The design in this article uses pre_hook only once, so there are no performance issues.
The reader may ask, is this method of “command fingerprinting” reliable? My answer is that missing errors do not affect correctness, and misjudgments are theoretical but extremely unlikely (misjudgments are the “instruction fingerprint” that locates the wrong code). Even if a miscalculation occurs, we have one final layer of protection — the disaster recovery mechanism implemented by the infrastructure team. When the program is abnormal due to misjudgment and cannot be started normally, Alipay will be restarted and GC suppression will be directly abandoned in subsequent startup.
Effect of 5
The startup time data in the figure above was obtained on an internal Android 4.x test device (no release indicates the debug version). According to the chart, the startup time of Alipay client is reduced by 15% to 30%.
I Java development 4 years Android development 5 years, regularly share Android advanced technology and experience to share, welcome to pay attention to ~ (like the article’s point of praise to encourage speakers ~ thank you.)
The reader’s welfare
-
Android cutting-edge technology – componentized framework design
-
BAT Mainstream Android advanced architecture technical outline + learning route + material sharing
Architecture technology details, learning routes and information sharing are in this blog article “the winter is not over”, Ali P9 architecture share Android essential technology points, let you get offer soft! (Including custom controls, NDK, architecture design, React Native (Weex), performance optimization, full commercial project development, etc.)
- A full set of systematic high-level architecture videos; Seven mainstream technology modules, video + source + notes