This article is mainly to learn geek time Zhang Shaowen Android development master class and Google official website article startup optimization notes ~
Reference article:
https://time.geekbang.org/column/article/73651 https://mp.weixin.qq.com/s/eaArt5Udc4WZ3NoH5RlEkQ https://juejin.cn/post/6844903459951476744 https://developer.android.google.cn/topic/performance/vitals/launch-time
Application startup Type
- Cold start
- Scenario: Start the application for the first time after startup or start it again after the application is killed
- Lifecycle: process. start->Application creation ->attachBaseContext->onCreate->onStart->onResume->Activity lifecycle
- Startup speed: The slowest of several startup types, and the biggest obstacle to optimizing startup speed
- Wen started
- Scenario: The application has been started
- Lifecycle: onCreate->onStart->onResume->Activity lifecycle
- Start speed: fast
- Warm start
- Scenario: The Home button minimizes applications
- Lifecycle: onResume->Activity lifecycle
- Startup speed: fast
As can be seen from the above summary, cold startup is the slowest and most time-consuming in the startup process of the application. The system and the application itself have a lot of work to deal with, so cold startup is the most challenging and necessary optimization for the startup speed of the application.
Cold start process
Cold start refers to the process from an application process that does not exist on the system to the process space created by the system to run the application. Cold start usually occurs in the following two situations:
- Start the application for the first time since the device started
- The system kills the application and starts it again
At the beginning of a cold start, the system needs to do three things:
- Load and launch the app
- A blank preview window appears immediately after the app launches
- Creating an APP process
Once the system has created the app process, the app process is then responsible for doing the following:
- Creating an Application object
- Create and start the main thread ActivityThread
- Create and start the first Activity
- Inflating views
- Layout of the screen
- Perform the first draw
Once the app process has finished drawing for the first time, the system process replaces the preview window shown earlier with the Main Activity, at which point the user can begin to interact with the app.
From the perspective of the process of cold start, we cannot intervene in system operations such as app process creation. We can intervene in the following aspects:
- The preview window
- Application lifecycle callback
- Activity lifecycle callback
Optimize analysis and measurement tools
For developers, startup speed is our “face”, it is clearly visible to all, and we all want our apps to be faster than the competition.
We need to find a tool or method suitable for startup optimization analysis.
- adb shell am start -W [packageName]/[ packageName. AppstartActivity]
When counting the startup time of app, the system provides adb command to output the startup time. ActivityManagerService calls back to this method after drawing, but allows us to measure TotalTime by starting the script multiple times and comparing startup times between versions. But the statistical time is not as accurate as Systrace.
- Code buried point
Use code burying points to accurately capture and record the execution time of each method, know where it takes time, and then optimize accordingly. For example, by adding time point records at key positions in the app startup life cycle, the measurement purpose can be achieved; For example, you can record the start time in the attachBaseContext method of the Application, and then the end time in the onWindowFocusChanged method of the first Activity started.
However, there are still many steps in the process from the user clicking the APP Icon to the Application creation and then to the rendering of the Activity, such as the process creation process of cold start, and this time cannot be counted with this version, so we have to bear the inaccuracy of this data.
-
Nanoscope
Nanoscope is very realistic, but for now only supports Nexus 6 and x86 emulators.
-
Simpleperf
Simpleperf’s fire diagram is not suitable for startup process analysis.
- TraceView
Two types of data can be obtained from TraceView: methods that take time to execute a single time and methods that execute more times. However, the Performance of TraceView is too high to reflect the actual situation.
- Systrace
Systrace tracks the time spent on critical system calls, such as system IO operations, kernel work queues, CPU load, Surface rendering, GC events, and the health of Android subsystems. However, time analysis of application code is not supported.
To sum up, each of these ways has its advantages and disadvantages, and we should master them all.
But is there an ideal compromise? There is.
- Systrace + Function Piling
In addition to seeing the time consuming of System calls such as GC, System Server and CPU scheduling, it can also automatically insert the piling function before and after the specified method and count the execution time of the method in the process of Android project compilation. By pegging, we can see the flow of function calls applying the main thread and other threads. It is implemented simply by inserting the following two functions into the entry and exit of each method by modifying the bytecode using the ASM framework.
class TraceMethod {
public static void i(a) {
Trace.beginSection();
}
public static void o(a) { Trace.endSection(); }}Copy the code
Of course, there are a lot of details to consider, such as how to reduce the impact of piling on performance, and which functions need to be excluded. The effect of the function after piling is as follows:
class Test {
public void test(a) {
TraceMethod.i();
// My old jobTraceMethod.o(); }}Copy the code
Only accurate data evaluation can guide optimization direction, this step is very important. Not fully evaluated or evaluated using the wrong method, and finally get the wrong direction, will lead to the final discovery of not reaching the desired optimization effect.
Start-up optimization method
Now that we have a full picture of the startup process, we can see exactly how the system, application processes, and threads are running during this time, and we are ready to start the real “work.”
The specific optimization methods are divided into preview window optimization, business combing, business optimization, multi-process optimization, thread optimization, GC optimization and system call optimization. Business combing, business optimization, thread optimization, GC optimization, system call optimization and layout optimization.
Preview window optimization
When users click on the desktop icon to start the application, using the Window displays in advance, fast display an interface, the user need only a short period of time can see page “preview”, the “also” feeling completely in high-end machine experience is very good, but for mid-range machine, will reduce the total splash screen time become longer.
If there is no response when you click the icon, the user will think that the mobile phone system is slow to respond. Therefore, it is recommended that the preview window be enabled only on Android 6.0 or above Android 7.0, so that users with good phones can have a better experience.
To display the preview window, just use the Activity’s windowBackground theme property to provide a simple custom drawable for the activity to start, as follows:
The Layout XML file:
<layer-list xmlns:android="http://schemas.android.com/apk/res/android" android:opacity="opaque"> <! -- The background color, preferably the same as your normal theme --> <item android:drawable="@android:color/white"/> <! -- Your product logo - 144dp color version of your app icon --> <item> <bitmap android:src="@drawable/product_logo_144dp" android:gravity="center"/> </item> </layer-list>Copy the code
The Manifest file:
<activity ...
android:theme="@style/AppTheme.Launcher" />Copy the code
When an activity starts, a preview window is displayed, giving the user a quick response experience. When your activity wants to restore the original theme, you can call setTheme(r.style.appTheme) before calling super.oncreate () and setContentView(), as follows:
public class MyMainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
// Make sure this is before calling super.onCreate
setTheme(R.style.Theme_MyApp);
super.onCreate(savedInstanceState);
// ...}}Copy the code
The business card
Instead of putting all the initialization work into the Application, you need to sort out every module currently running during the startup process, which modules are definitely needed, which can be cut, and which can be lazy to load. However, it should be noted that lazy loading should be prevented from centralization, otherwise it is easy to appear that users cannot operate after the home page is displayed. In general, the following four dimensions are used to sort out the start-up points:
- Necessary and time consuming: Initiate initialization, consider using threads for initialization.
- Necessary not time-consuming: home page drawing.
- Unnecessary but time-consuming: Data reporting, plug-in initialization.
- Unnecessary and time-consuming: No need to think about it, just remove this block and load it when you need it.
After the data is sorted out, the loading logic is implemented as required, and the strategy of step loading, asynchronous loading and deferred loading is adopted, as shown in the figure below:
In a nutshell, the key to improving startup speed is to do as little as possible during startup.
Business optimization
After combing through, all that is left are the modules that must be used to start the process. At this point, we can only bite the bullet and do further optimization. Early optimization needs to “catch the big and release the small”, first see where the main thread is slow. Ideally, a decryption operation that takes 1 second becomes 10 milliseconds. The next best thing is to consider whether these tasks can be accomplished with asynchronous thread preloading, but be aware that too much thread preloading can complicate our logic.
After business optimization, there are some architectural and historical baggage that can slow us down. It is common that some events will be monitored by various business modules, a large number of callbacks will lead to a lot of centralized execution, and some framework initialization is “too thick”, such as some plug-in frameworks, starting various reflections and hooks, the whole process takes at least a few hundred milliseconds. Some of the historical burdens are very heavy, and “pull the whole body”, the risk of change is relatively large. But I would like to say that we still need to be brave enough to repay these “historical debts” if the time is right.
Multiprocess optimization
The Android app supports multiple processes. By adding the Android: Process attribute to the component declaration in the Manifest, components can run in different processes at startup. Here’s an example: For multi-process apps, there may be a main process, plug-in process and download process, but developers can only declare one Application component in the Manifest. If the components corresponding to different processes are started, the system will create three processes and three Application objects. Life cycle callback methods such as attachBaseContext and onCreate are also called three times.
However, each process must have different contents to initialize. Therefore, in order to prevent the waste of resources, we need to distinguish processes in Application, and corresponding processes only initialize corresponding contents.
Thread optimization
Thread optimization has two aspects:
First, asynchronize time-consuming tasks. The child threads process time-consuming tasks, and the less the main thread does, the earlier it enters the Acitivity drawing phase, the earlier the interface is presented. For example, do not perform time-consuming operations such as I/O and network operations on the main thread. Note, however, that child threads cannot block the main thread.
Second, thread pools manage threads and control the number of threads. Too many threads compete with each other for CPU resources, reducing the time slice allocated to the main thread and causing slow startup. The data of thread voluntary_switches can be viewed in the Sched file learned from Caton optimization, where particular attention should be paid to the number of passive switches of NR_voluntary_switches.
proc/[pid]/sched: NR_voluntary_switches: Number of active context switches caused by a thread's inability to obtain the required resource, most commonly I/O. Nr_voluntary_switches: Number of passive context switches in which threads are forcibly scheduled by the system leading to context switches, such as a large number of threads preempting the CPU.Copy the code
Third, avoid lock blocking waits between the main thread and child threads. Once we put a time-consuming task in the main thread into the thread and found that it didn’t work at all. Careful examination reveals that the thread holds a lock internally, and the main thread soon has other tasks waiting for the lock. Systrace allows you to see lock wait events, and we need to examine whether these waits can be optimized, specifically to prevent the main thread from idling for long periods.
In particular, there are many startup frameworks that use the Pipeline mechanism to dictate the timing of business initialization based on business priority. Such as the internal use of wechatmmkernelAli recently open sourceAlphaStarting frames, which establish dependencies for each task, eventually form a directed acyclic graph. For tasks that can be concurrent, the startup speed is maximized through the thread pool. If the task dependencies are not configured, it is easy to see the following situation where the main thread waits for the taskC to finish, idling for 2950 ms.
Fourth, set the child thread priority.For unimportant tasks, set the child thread priority to THREAD_PRIORITY_BACKGROUND, so that the thread can get up to 10% of the time slice, giving priority to the main thread.
The GC optimization
In the startup process, the number of GC should be reduced as much as possible to avoid long-time delay of the main thread. Especially for Dalvik, we can independently check the GC time of the whole startup process through Systrace.
The startup process avoids a lot of string manipulation, especially serialization and deserialization. Some frequently created objects, such as Byte arrays and buffers in network libraries and image libraries, can be reused. If some modules require frequent object creation, consider moving to Native implementations.
Java object escape can also cause GC problems, which is easy to overlook when writing code. We should keep the object lifetime as short as possible and destroy it on the stack.
System call optimization
API usage on some systems is obstructive; files that are too small may not be perceived, and files that are too large or used too frequently may block. For example, the SharedPreference.Editor commit operation recommends using asynchronous apply instead of blocking COMMIT.
Through the System Service type of Systrace, we can see the CPU performance of the System Server during the startup process. During startup, we try not to make system calls, such as PackageManagerService operations, Binder calls, and so on.
Also do not pull up other application processes too early during startup. System Server and new processes compete for CPU resources. Especially when the system is out of memory, when we pull up a new process, it can be the “straw that breaks the camel’s back.” It may trigger the system’s low memorykiller mechanism, causing the system to kill and pull (keep alive) a large number of processes, affecting the foreground process’s CPU. For example, a previous program will pull download and video playback process in the startup process, after the change to on-demand pull, online startup time increased by 3%, for low-end machine optimization of 1GB or less, the entire startup time can be optimized by 5% ~ 8%, the effect is very obvious.
Layout optimization
The more complex the layout, the longer the measurement layout will take to draw. The main points are as follows:
- The fewer layers a layout has, the faster it loads.
- The fewer properties a control has, the faster it resolves, removing unwanted properties from the control.
- Use the
tag to load some unusual layouts as they are used. - Use the
tag to reduce the nesting level of the layout. - Use wrAP_content as little as possible. Wrap_content will increase the calculation cost of layout measure. If the width and height are known to be fixed, wrAP_content is not used.
Start the optimization progression method
Is there any way to optimize it further?
Data rearrangement
If we need to read 1KB of test. IO during startup and our buffer is accidentally written as 1byte, that’s a total of 1000 reads. Will the system actually launch 1000 DISK I/OS?
Actually 1000 reads is just the number of times we initiated, not the actual number of disk I/O. You can refer to the Linux file I/O process below.
When a Linux file system reads a file from the disk, it reads the file from the disk in blocks. Generally, the block size is 4KB. This means that at least 4KB of data is read and written to the disk at a time, and 4KB of data is put into the Page Cache. If the file data is already in the page cache the next time it is read, no actual disk I/O will occur and it will be read directly from the page cache, greatly increasing the read speed. So in the example above, we read 1000 times, but in fact only one disk I/O occurs, the rest of the data is in the page cache.
Dex files used to the classes and installation package APK in the various resource files are generally relatively small, but read very frequently. We can use the system mechanism to rearrange them in read order, reducing the number of real disk I/O.
In startup optimization, there are two main aspects of data rearrangement: class rearrangement and resource file rearrangement.
Class rearrangement
The implementation of class rearrangement adjusts the order of classes in the Dex through the Interdex of ReDex.
Don’t understand can read this article: Redex preliminary optimization with Interdex: Andorid cold start
According to the principle introduced by Interdex, we can know that three problems need to be solved to realize this optimization:
- How do I get the sequence of classes loaded at startup?
The solution in Redex is to dump the hprof file when the program is started and analyze the loaded classes from it, which is quite troublesome. Here we use the scheme is to hook classloader.findClass method, when the system loads the class log print out the class name, so that the analysis log can get the class sequence loaded at startup.
class GetClassLoader extends PathClassLoader { @Override protected Class<? Throws ClassNotFoundException {writeToFile(name, writeToFile)"coldstart_classes.txt");
returnsuper.findClass(name); }}Copy the code
- How do I put the classes I need into the main dex?
The method of Redex should be to parse out all the classes in dex, load the class sequence according to the configuration, and regenerate each dex from the master dex, so the original dex distribution will be disrupted. In hand Q, the dex rule is maintained in the compile script, so we can modify the subcontracting logic to put the required classes into the main dex.
- How do I adjust the order of classes in the main DEX?
Open source is good. The Android compiler converts.class to.dex by relying on dx.bat, which actually executes the Dx.jar from the SDK. We can modify the dx source, replace the JAR package, can execute custom DX logic. A brief description of the specific modification method:
Here need to know about dex file format for certain, no longer in detail, there are a very good article, interested can see http://blog.csdn.net/jiangwei0910410003/article/details/50668549
Resource files are rearranged
Facebook has long used “resource heat maps” to rearrange resource files. Recently, Alipay also detailed the principle and landing method of resource rearrangement in “Optimize Android Startup Performance through Package Rearrangement”.
Class loading
The process of loading a class includes the Verify class step, which requires each instruction of the method to be verified. This is a time-consuming operation.
Verify steps can be seen in this article: wechat Android hot patch practice evolution road
We can Hook away the Verify step, which improves startup speed by tens of milliseconds. However, I would say that the biggest optimization scenario is actually the first and overwrite installation. Taking Dalvik platform as an example, a Dex of 2MB normally takes 350 milliseconds, but after classVerifyMode is set to VERIFY_MODE_NONE, it only takes 150 milliseconds, saving more than 50% of the time.
// Dalvik Globals.h
gDvm.classVerifyMode = VERIFY_MODE_NONE;
// Art runtime.cc
verify_ = verifier::VerifyMode::kNone;Copy the code
However, ART platforms are much more complex, and hooks need to be compatible with several versions. And since most of the Dex is already optimized at installation time, removing Verify from the ART platform will only do some good to dynamically loaded Dex. The dalvik_hack-3.0.0.5.jar in Atlas can be removed by using the following method, but currently no ART platform is supported.
AndroidRuntime runtime = AndroidRuntime.getInstance();
runtime.init(context);
runtime.setVerificationEnabled(false);Copy the code
This hack can greatly reduce the speed of initial startup at the cost of having a slight impact on subsequent operations. Compatibility issues should also be considered, so it is not recommended to use it on ART platforms.
Black science and technology
Keep alive
When it comes to hacking, survival is probably the first thing that comes to mind. Keeping alive reduces Application creation and initialization time and makes a cold start a warm start. But after Target 26, it did get harder and harder to stay alive. For large manufacturers, cooperation opportunities may be sought.
Plug-in and hot fix
Are they really that good? In fact, most frameworks are designed with a large number of Hook and private API calls, leading to two major disadvantages:
- Stability. Although everyone claims to be compatible with 100% models, due to the compatibility of manufacturers, installation failure, dex2OAT failure and other reasons, there will still be some code and resource anomalies. Android P’s non-SDK-interface call restriction will only get harder and more expensive to adapt.
-
Performance. There are a lot of optimizations for each version of Android Runtime, and because of some of the dark technologies used in plugins and hot fixes, the underlying Runtime optimizations are not available to us. Tinker framework starts applications 5% to 10% slower after a patch is loaded.
In general, we need to be careful about hacking technology, when you know enough about their internal mechanics, you can use them selectively.
conclusion
The above is my learning process of startup optimization related content summary, thank you for reading here.
Starting optimization is a long – term task with heavy responsibilities and a long way to go.
Developers should be prepared to minimize the performance loss caused by startup during the coding process, and pay attention to the following:
- Try to avoid intensive and heavy work in the main thread during startup, such as I/O operations, deserialization, network operations, and lock waiting.
- Modules and third-party libraries are loaded on demand, adopting strategies such as step loading, asynchronous loading and delayed loading.
- Using thread pools to manage threads avoids creating a large number of threads, causing CPU contention and reducing the main thread time slice.
- During startup, try to avoid creating a large number of objects frequently to reduce the lag effect of GC on startup performance.
- Try to avoid blocking system calls during startup.
As for the optimization methods summarized in the section of “Initiating optimization advanced methods”, you should be careful to use them, because these methods may bring some bad effects. Before we use these methods, we need to know enough about their internal implementation mechanisms to evaluate them and use them selectively.
Finally, here are some good articles to help you understand:
Preview window:Displays the Activity launch window
Interdex introduction:
Redex preliminary optimization with Interdex: Andorid cold start
Verify class: verify class
The evolution of wechat Android hot patch practice
Resource file rearrangement:Optimization analysis of Alipay App construction: Optimize the startup performance of Android terminal by rearranging the installation package
Plug-in and hot repair:Android hotfixes aren’t as hard as you think
New to the nuggets, unfamiliar with life, like friends, a thumbs up to encourage novice bai ~