preface

In the last article, the author led you to learn the two parts of the stuck optimization analysis method and tools, automatic stuck detection scheme and optimization. If you are not familiar with this content, you are advised to take a look at “Deep Exploration of Android Caton Optimization (part 1)”. This article is the next part of an in-depth exploration of Android Caton optimization. The main content of this article is as follows:

  • 1. ANR analysis and actual combat
  • 2. Caton single point problem detection scheme
  • 3, efficient implementation of the interface seconds open
  • 4, elegant monitoring of time-consuming blind areas
  • 5. Summary of Caton optimization skills
  • 6. Summary of solutions to common stuck problems
  • 7. Common problems of Caton optimization

A long lag time is bound to cause ANR in the application. Next, we will start today’s exploration from ANR analysis and actual combat.

ANR analysis and actual combat

1. ANR introduction and actual combat

First, let’s review several common types of ANR, as follows:

  • KeyDispatchTimeout: The button event is not processed within 5s.
  • 2. BroadcastTimeout: The broadcast receiver does not respond within 10s in the BroadcastTimeout stage and 60s in the background stage.
  • 3. ServiceTimeout: Indicates that the service is not processed within 20 seconds in the foreground and 200 seconds in the background.

The specific time definition can be found in AMS (ActivityManagerService) :

// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

// How long we wait until we timeout on key dispatching.
static final int KEY_DISPATCHING_TIMEOUT = 5*1000;
Copy the code

Next, let’s look at the ANR execution flow.

ANR execution process

  • 1. First, ANR occurs in our application.
  • 2. Our process will then receive the abort message and start writing process ANR information, which is the application scenario information, which contains all the application stack information, CPU, IO, and so on.
  • 3. Finally, an ANR prompt box will pop up, depending on whether you want to continue to wait or quit the application. Note that this ANR prompt box may not pop up, depending on the ROM, its performance is also different. Because some mobile phone manufacturers will remove this prompt box by default, to avoid bad user experience.

After analyzing the implementation process of ANR, let’s analyze how to solve ANR and what can be a breakthrough point for us.

As mentioned above, when ANR occurs in an application, information about the scene where ANR occurs will be written to the file. Can we use this file to determine whether ANR occurs?

The investigation and solution of ANR problems according to ANR log have been explained by the author in the third section of ANR optimization for in-depth exploration of Android stability optimization, which is not further described here.

Online ANR monitoring mode

/data/ ANR/testamp.txt (); / data.txt (); / data.txt (); / data.txt (); The solution is to circumvent it only through overseas Google Play service and domestic Hardcoder. However, this is obviously not realistic in China, so, is there a better way to achieve it?

That is anR-watchdog, which I will introduce in detail.

Anr-watchdog project address

Anr-watchdog is a non-invasive ANR monitoring component, which can be used to monitor online ANR. Next, we will use anR-watchdog to monitor ANR.

First, add the following dependencies to our project’s app/build.gradle:

Implementation 'com. Making. Anrwatchdog: anrwatchdog: 1.4.0'Copy the code

Then add the following code to the Application onCreate method to start anr-watchdog:

new ANRWatchDog().start();
Copy the code

As you can see, its initialization is very simple, and its internal implementation is also very simple, the entire library has only two classes, one is ANRWatchDog, the other is ANRError.

Let’s take a look at how ANRWatchDog is implemented.

/**
* A watchdog timer thread that detects when the UI thread has frozen.
*/
public class ANRWatchDog extends Thread {
Copy the code

As you can see, ANRWatchDog actually inherits the Thread class, meaning that it is a Thread, and the most important thing for a Thread is its run method, as shown below:

private static final int DEFAULT_ANR_TIMEOUT = 5000; private volatile long _tick = 0; private volatile boolean _reported = false; private final Runnable _ticker = new Runnable() { @Override public void run() { _tick = 0; _reported = false; }}; @ Override public void the run () {/ / 1, the first, named threads | ANR - WatchDog |. setName("|ANR-WatchDog|"); // a default timeout interval is set to 5000ms. long interval = _timeoutInterval; // 3. Then post a _ticker Runnable via _uiHandler in the while loop. while (! IsInterrupted ()) {// 3.1 Here _tick defaults to 0, so needPost is true. boolean needPost = _tick == 0; _tick += interval; if (needPost) { _uiHandler.post(_ticker); } // Next, the thread will sleep for some time. The default is 5000ms. try { Thread.sleep(interval); } catch (InterruptedException e) { _interruptionListener.onInterrupted(e); return ; } // 4. If the main thread does not process Runnable (_tick is not set to 0), ANR has occurred. The second _reported flag is to avoid duplicate reporting of ANR that has already been processed. if (_tick ! = 0 &&! _reported) { //noinspection ConstantConditions if (! _ignoreDebugger && (Debug.isDebuggerConnected() || Debug.waitingForDebugger())) { Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))"); _reported = true; continue ; } interval = _anrInterceptor.intercept(_tick); if (interval > 0) { continue; } final ANRError error; if (_namePrefix ! = null) { error = ANRError.New(_tick, _namePrefix, _logThreadsWithoutStackTrace); } else {// 5, if the ANR_Watchdog thread name is not actively set, ANRError NewMainOnly method will be used by default to handle ANR. error = ANRError.NewMainOnly(_tick); } // The onAppNotResponding method will be called via ANRListener, and its default processing will directly raise the current ANRError, causing the program to crash. _anrListener.onAppNotResponding(error); interval = _timeoutInterval; _reported = true; }}}Copy the code

First of all, in note 1, we will thread naming to | ANR – WatchDog |. Next, in comment 2, you declare a default timeout interval, which is 5000ms by default. Then, at comment 3, a _ticker Runnable is posted via _uiHandler in the while loop. Note that the _tick defaults to 0, so needPost is true. Next, the thread will sleep for a while, with a default value of 5000ms. In comment 4, if the main thread does not process Runnable — that is, the _tick value is not set to 0 — ANR has occurred. The second _reported flag bit is to avoid repeating ANR that has already been processed. If ANR occurs, the following code is called, which initially handles the debug case, and then, as we see in comment 5, ANRError’s NewMainOnly method is used by default to handle ANR if the thread name is not actively set to ANR_Watchdog. The NewMainOnly method of ANRError looks like this:

/** * The minimum duration, in ms, for which the main thread has been blocked. May be more. */ public final long duration; Static ANRError NewMainOnly(long duration) {final Thread mainThread = Looper.getMainLooper().getThread(); final StackTraceElement[] mainStackTrace = mainThread.getStackTrace(); Return an instance containing the main thread name, the main thread stack information, and the minimum time for ANR to occur. return new ANRError(new $(getThreadTitle(mainThread), mainStackTrace).new _Thread(null), duration); }Copy the code

As you can see in comment 1, the stack information for the main thread is first retrieved, and then an instance is returned containing the main thread name, the main thread stack information, and the minimum time value for ANR to occur. (We can modify its source code at this point to add more lag field information, such as CPU usage and scheduling information, memory related information, I/O and network related information, etc.)

Next, we return to comment 6 in the Run method of ANRWatchDog, where its onAppNotResponding method is called through ANRListener and its default processing directly throws the current ANRError, causing the program to crash. The corresponding code is as follows:

private static final ANRListener DEFAULT_ANR_LISTENER = new ANRListener() { @Override public void onAppNotResponding(ANRError error) { throw error; }};Copy the code

Now that you know how ANRWatchDog works, let’s give it a try. First, we add the main thread sleep for 10 seconds to the hover button in MainActivity, as shown below:

@OnClick({R.id.main_floating_action_btn}) void onClick(View view) { switch (view.getId()) { case R.i.d.main_floating_action_btn: try {// Thread. Sleep (10000); } catch (InterruptedException e) { e.printStackTrace(); } jumpToTheTop(); break; default: break; }}Copy the code

We then re-installed the running project, hit the Hover button, and found that the on-screen click and touch events did not occur for 10 seconds, and that the application crashed immediately after 10 seconds. Next, we find fatal errors by typing the fatal keyword in the Logcat filter, as shown in log:

The 2020-01-18 09:55:53. 459, 29924-29969 /? E/AndroidRuntime: FATAL EXCEPTION: |ANR-WatchDog| Process: json.chao.com.wanandroid, PID: 29924 com.github.anrwatchdog.ANRError: Application Not Responding for at least 5000 ms. Caused by: com.github.anrwatchdog.ANRError?$_Thread: main (state = TIMED_WAITING) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:373) at java.lang.Thread.sleep(Thread.java:314) // 1 at json.chao.com.wanandroid.ui.main.activity.MainActivity.onClick(MainActivity.java:170) at json.chao.com.wanandroid.ui.main.activity.MainActivity_ViewBinding$1.doClick(MainActivity_ViewBinding.java:45) at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22) at android.view.View.performClick(View.java:6311) at android.view.View$PerformClick.run(View.java:24833) at android.os.Handler.handleCallback(Handler.java:794) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:173) at android.app.ActivityThread.main(ActivityThread.java:6653) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821) Caused by: com.github.anrwatchdog.ANRError?$_Thread: AndroidFileLogger./storage/emulated/0/Android/data/json.chao.com.wanandroid/log/ (state = RUNNABLE)Copy the code

As you can see, the collapse of the thread is | ANR – WatchDog |. We’ll focus on comment 1, where the crash occurred in MainActivity’s onClick method at 170 lines, which is where the thread went to sleep.

Next, we will analyze the implementation principle of ANR-watchdog.

2, ANR-watchdog principle

  • First, we call the start method of anr-watchdog, and the thread will start working.
  • We then post a message to the main thread Handler to add a value to the main thread.
  • So once the POST is done, our thread is going to sleep for a while.
  • After the sleep, it checks to see if we’ve changed the value, and if we’ve changed the value, that means we’ve executed the message on the main thread, which means the main thread hasn’t stalled, and if it hasn’t, it means the main thread has stalled.
  • Finally, the ANR-watchdog determines that ANR has occurred and throws an exception to us.

Finally, the workflow diagram for ANR-watchdog is shown below:

At the end of the article, we said that if the main thread is detected to be stuck, an ANR exception will be thrown, which will cause the application to crash. Obviously, you can’t bring this scheme online. So, is there any way to customize the handling of the last stuck thread?

In fact, anR-watchdog itself implements an ANRListener that we can also implement ourselves. Through it, we can do a custom processing of ANR events, such as storing compressed stack information locally and uploading it to the APM background at an appropriate time.

3, summary

Anr-watchdog is a non-invasive ANR monitoring solution that makes up for the fact that we do not have permissions to read localted.txt files in older versions. Note that we need to use a combination of these two solutions online.

Before, we also talked about AndroidPerformanceMonitor, so what is the difference between it and the ANR – WatchDog?

For AndroidPerformanceMonitor, it is to monitor our execution of each message in the main thread, it will in the main thread before and after each message to print a timestamp, then, we can according to calculate the execution time of each a message, However, it is important to note that the execution time of a message is usually very short, which means it is difficult to reach the ANR level. Then let’s look at the principle of anR-watchdog, it doesn’t care how the application executes, it just looks at the final result, after sleep 5s, I see if this value of the main thread has been changed. If it is corrected, ANR did not occur, otherwise, ANR did occur.

According to the principle of the two libraries, we can determine their application scenarios, respectively for AndroidPerformanceMonitor, it is suitable for monitoring caton, because each message it is not a long time. For ANR-watchdog, it is more suitable for ANR monitoring supplement.

In addition, while anr-watchdog solves the problem of not having access to /data/ ANR/testamp.txt files on older systems, getting all the thread stacks and various bits of information in the Java layer can be time consuming and may not be appropriate for a stacken scenario, which can lead to further user stacken. For applications with high performance requirements, the stack information of all threads can be obtained through Hook Native layer, which is detailed in the following two steps:

  • Call ThreadList::ForEach from libart.so, dlsym, and retrieve all Native thread objects.
  • Traverse the list of Thread objects, calling the Thread::DumpState method.

In this way, the process of printing ANR logs is roughly simulated. However, because the Hook method is adopted, some exceptions and even crashes may occur. At this time, it is necessary to fork the child process to avoid such problems. And using child processes to get stack information does not choke the main process at all.

In this case, you need to specify /proc/[parent process ID] to retrieve the stack information of the main application process.

Through Native Hook, we realized a set of “lossless” to obtain all the Java thread stack and detailed information of the lag monitoring system. To reduce the amount of reported data, it is recommended to use this scheme only when the main Java thread state is WAITING, TIME_WAITING, or BLOCKED.

2. Detection scheme of Caton single point problem

In addition to automated gridlock and ANR monitoring, we also need to carry out the detection of gridlock single point problem, because the above two detection schemes can not meet the detection requirements of all scenarios, here I give a small example:

For example, I have a lot of messages to execute, but the execution time of each message is less than the threshold value of the lag, then the automatic lag detection scheme cannot detect the lag, but for the user, the user will think that your App has some lag.Copy the code

In addition, in order to build a systematic monitoring solution, we need to expose problems as much as possible before we go live.

1. IPC single point problem detection scheme

Common single point problems include main thread IPC, DB operation, etc. Here I take the main thread IPC as an example, because IPC is actually a time-consuming operation, but in the actual development process, we may not pay enough attention to IPC operation, so we often do frequent IPC operation in the main program, so to say, This kind of time may not reach a threshold for you to set up a stall. Next, let’s look at what indicators we should monitor for IPC problems.

  • 1, IPC call type: such as PackageManager, TelephoneManager call.
  • 2. Call times and time consuming of each.
  • IPC call stack (indicating which line of code is called), the thread of occurrence.

Conventional scheme

The conventional solution is to add buried points before and after the IPC. However, this approach is not elegant, and we often forget the true purpose of a buried site during normal development, and it can be very expensive to maintain.

Next, let’s talk about IPC problem monitoring techniques.

IPC problem monitoring techniques

Offline, we can use ADB commands to monitor, as shown below:

Adb shell am trace- IPC start adb shell am trace- IPC start adb shell trace- IPC start Adb shell am trace-ipc stop-dump -file /data/local/ TMP /ipc-trace. TXT // adb shell am trace-ipc stop-dump -file /data/local/ TMP /ipc-trace. Adb pull /data/local/ TMP /ipc-trace.txtCopy the code

For those of you who have explored Android layout optimization in depth (part 1), there is an elegant implementation, ARTHook or AspectJ, where we need to monitor IPC operations, so which one is better? (ARTHook with Epic)

To answer that question, we need to have a good understanding of the ideas of ARTHook and AspectJ, and ARTHook, for example, can actually Hook some methods of the system, because we can’t change it in the system code, but we can Hook one of its methods, Add some code to its method body. However, for AspectJ, it can only be used for non-system methods, that is, our App’s own source code, or some jar or AAR package that we refer to. Since AspectJ is actually inserting the corresponding code into our specific methods, it can’t be used for our system methods, so we need ARTHook to monitor IPC operations.

Before using ARTHook to monitor IPC operations, let’s first think about what are IPC operations?

For example, we go through the PackageManager to get some information about our application, or to get such information as the DeviceId of the device and AMS related information, etc., which are actually related to the operation of IPC, and these operations will be carried out IPC in a fixed way. BinderProxy is finally called to Android.os. BinderProxy. Next, let’s look at its transact method, as shown below:

public boolean transact(int code, Parcel data, Parcel reply, int flags) throws RemoteException {
Copy the code

The first parameter is an action code of type int between FIRST_CALL_TRANSACTION and LAST_CALL_TRANSACTION. The second and third parameters are all parameters of type Parcel. The fourth argument is a token value of type int, 0 indicating a normal IPC call, otherwise indicating a one-way IPC call. We then Hook the Transact method of the Android.os. BinderProxy class using ARTHook in the onCreate method of the Application in the project as follows:

try {
        DexposedBridge.findAndHookMethod(Class.forName("android.os.BinderProxy"), "transact",
                int.class, Parcel.class, Parcel.class, int.class, new XC_MethodHook() {
                    @Override
                    protected void beforeHookedMethod(MethodHookParam param) throws Throwable {
                        LogHelper.i( "BinderProxy beforeHookedMethod " + param.thisObject.getClass().getSimpleName()
                                + "\n" + Log.getStackTraceString(new Throwable()));
                        super.beforeHookedMethod(param);
                    }
                });
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }
    
Copy the code

After the application is reinstalled, the following Log information is displayed:

2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ WanAndroidApp$1.beforeHookedMethod  (WanAndroidApp.java:160)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │    LogHelper.i  (LogHelper.java:37)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ [WanAndroidApp.java | 160 | beforeHookedMethod] BinderProxy beforeHookedMethod BinderProxy
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ java.lang.Throwable
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at json.chao.com.wanandroid.app.WanAndroidApp$1.beforeHookedMethod(WanAndroidApp.java:160)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at com.taobao.android.dexposed.DexposedBridge.handleHookedArtMethod(DexposedBridge.java:237)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at me.weishu.epic.art.entry.Entry64.onHookBoolean(Entry64.java:72)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at me.weishu.epic.art.entry.Entry64.referenceBridge(Entry64.java:237)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at me.weishu.epic.art.entry.Entry64.booleanBridge(Entry64.java:86)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.os.ServiceManagerProxy.getService(ServiceManagerNative.java:123)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.os.ServiceManager.getService(ServiceManager.java:56)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.os.ServiceManager.getServiceOrThrow(ServiceManager.java:71)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.UiModeManager.<init>(UiModeManager.java:127)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.SystemServiceRegistry$42.createService(SystemServiceRegistry.java:511)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.SystemServiceRegistry$42.createService(SystemServiceRegistry.java:509)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.SystemServiceRegistry$CachedServiceFetcher.getService(SystemServiceRegistry.java:970)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.SystemServiceRegistry.getSystemService(SystemServiceRegistry.java:920)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ContextImpl.getSystemService(ContextImpl.java:1677)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.view.ContextThemeWrapper.getSystemService(ContextThemeWrapper.java:171)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.Activity.getSystemService(Activity.java:6003)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatDelegateImplV23.<init>(AppCompatDelegateImplV23.java:33)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatDelegateImplN.<init>(AppCompatDelegateImplN.java:31)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatDelegate.create(AppCompatDelegate.java:198)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatDelegate.create(AppCompatDelegate.java:183)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatActivity.getDelegate(AppCompatActivity.java:519)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.support.v7.app.AppCompatActivity.onCreate(AppCompatActivity.java:70)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at me.yokeyword.fragmentation.SupportActivity.onCreate(SupportActivity.java:38)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at json.chao.com.wanandroid.base.activity.AbstractSimpleActivity.onCreate(AbstractSimpleActivity.java:29)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at json.chao.com.wanandroid.base.activity.BaseActivity.onCreate(BaseActivity.java:37)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.Activity.performCreate(Activity.java:7098)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.Activity.performCreate(Activity.java:7089)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1215)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2770)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2895)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ActivityThread.-wrap11(Unknown Source:0)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1616)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.os.Handler.dispatchMessage(Handler.java:106)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.os.Looper.loop(Looper.java:173)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at android.app.ActivityThread.main(ActivityThread.java:6653)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at java.lang.reflect.Method.invoke(Native Method)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:547)
2020-01-22 19:52:47.657 10683-10683/json.chao.com.wanandroid I/WanAndroid-LOG: │ 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:821)
Copy the code

As you can see, this pops up all the stack information for an IPC call in the application. In this case, ServiceManager’s getService method, which is an IPC invocation, is called specifically in AbstractSimpleActivity’s onCreate method. In this way, the application’s IPC calls can be easily captured.

As you can see, in this way we can easily get all the IPC operations in the application, and can get the type of IPC call, call time, number of times, call stack and a series of information. Of course, in addition to the IPC call problem, there are a series of single point problems such as IO, DB and View drawing that need to establish corresponding detection schemes.

2. Stuck problem detection scheme

For the construction of the stuck problem detection solution, ARTHook was mainly used to improve the offline detection tool and Hook the corresponding operation as much as possible to expose and analyze the problem. In this way, caton’s systematic solution can be better implemented.

Three, how to realize the interface second open?

The speed of interface opening is very important for user experience, so how to realize interface opening in seconds?

In fact, the interface seconds is a small startup optimization, its optimization ideas can learn from the startup speed optimization and layout optimization of some implementation ideas.

1. Interface is realized in seconds

First, we can use Systrace to observe the health of the CPU, such as whether the CPU is full. Then, we learned in boot optimization elegant asynchrony and elegant delayed initialization and some other schemes; Second, for our interface layout, we can use asynchronous constructors, X2C, other draw optimizations, and so on. Finally, we can preload the page data to avoid network or disk I/O speed, or we can put the data fetch method in the first line of the onCreate method.

So how do we measure how fast an interface opens?

Usually, we count the opening speed of the page through the interface opening rate in seconds, specifically calculating the time from onCreate to onWindowFocusChanged. Of course, in some specific scenarios, it is not accurate to use onWindowFocusChanged as the end point of the page opening, so we can implement a specific interface to match our Activity or Fragment. We can use that interface method as the end point of the page opening.

So, is there a better way to implement the interface in seconds than the one mentioned above?

That was the Lancet.

2, the Lancet

The Lancet is a lightweight Android AOP framework with the following advantages:

  • 1, fast compilation speed, support incremental compilation.
  • 2. Simple API, no extra code inserted into APK. (This is critical for package volume optimization)

And then, let me talk briefly about the use of the Lancet. The Lancet itself provides some annotations for hooks, as follows:

  • Prxoy: Hook usually used for system API calls.
  • Insert: Often used to manipulate App or Library classes.

The next step is to use the Lancet as a practical exercise.

First, we need to add the following dependencies in the project root directory: build.gradle

Dependencies {classpath 'me. Ele: lancet - plugin: 1.0.5'}Copy the code

Then, in the app directory ‘build.gradle’ add:

Name: 'me. Ele. Lancet 'dependencies {compileOnly 'me.Copy the code

Next, we can use the Lancet, where we need to create a new class for special Hook operations, as shown below:

public class ActivityHooker { @Proxy("i") @TargetClass("android.util.Log") public static int i(String tag, String msg) { msg = msg + "JsonChao"; return (int) Origin.call(); }}Copy the code

The above method is to Hook the I method of android.util.Log and add “JsonChao” string to all MSG. Note that we need to copy the I method from android.util.Log to make sure that the name of the I method matches the parameter information. Then, @targetClass and @proxy above the method specify the corresponding full path class name and method name respectively. Finally, we need to use the Origin class provided by the Lancet to call its call method to return the original call information. When finished, we re-run the project and the following log message appears:

The 2020-01-23 13:13:34. 124 7277-7277 /json.chao.com.wanandroid I/MultiDex: VM with version 2.1.0 has multidex supportJsonChao 13:13:34. 2020-01-23 124 7277-7277 /json.chao.com.wanandroid I/MultiDex: Installing applicationJsonChaoCopy the code

As you can see, we added strings to the end of the log, indicating that the Hook succeeded. Now we can use the Lancet to calculate the opening rate in seconds of the project interface, as shown in the following code:

public static ActivityRecord sActivityRecord; static { sActivityRecord = new ActivityRecord(); } @Insert(value = "onCreate",mayCreateSuper = true) @TargetClass(value = "android.support.v7.app.AppCompatActivity",scope = Scope.ALL) protected void onCreate(Bundle savedInstanceState) { sActivityRecord.mOnCreateTime = System.currentTimeMillis(); // Call the original logic in the current Hook method origine.callvoid (); } @Insert(value = "onWindowFocusChanged",mayCreateSuper = true) @TargetClass(value = "android.support.v7.app.AppCompatActivity",scope = Scope.ALL) public void onWindowFocusChanged(boolean hasFocus) { sActivityRecord.mOnWindowsFocusChangedTime = System.currentTimeMillis(); LogHelper.i(getClass().getCanonicalName() + " onWindowFocusChanged cost "+(sActivityRecord.mOnWindowsFocusChangedTime - sActivityRecord.mOnCreateTime)); Origin.callVoid(); }Copy the code

Above, we through @ TargetClass and @ Insert two annotations Hook the android. Support. V7. App. AppCompatActivity onCreate and onWindowFocusChanged method. Note that the @INSERT annotation can specify two parameters, and its source code looks like this:

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface Insert {
    String value();

    boolean mayCreateSuper() default false;
}
Copy the code

The second argument, mayCreateSuper, is set to true, indicating that if a method of the parent class is not overridden, it will be overridden by default. The corresponding @INSERT annotation method implemented in ActivityHooker is that if the current Activity does not override the onCreate and onWindowFocusChanged methods of the parent class, it will override the parent class by default. To avoid cases where the Hook fails because some activities do not have this method.

Then, we notice that @targetClass can also specify two parameters, and the source code is as follows:

@Retention(RetentionPolicy.RUNTIME)
@java.lang.annotation.Target({ElementType.TYPE, ElementType.METHOD})
public @interface TargetClass {
    String value();

    Scope scope() default Scope.SELF;
}
Copy the code

The value specified by the second parameter scope is an enumeration, and the optional values are as follows:

public enum Scope {

    SELF,
    DIRECT,
    ALL,
    LEAF
}
Copy the code

For scope. SELF, it represents only one matching class specified by the target value; For DIRECT, it represents a DIRECT subclass of the class specified by matching value; . If the Scope is ALL, it suggests that to match the value specified by the class ALL the subclasses, and the value which we specify values for android. Support. V7. App. AppCompatActivity, because the Scope specified in the Scope. The ALL, The specification matches all subclasses of AppCompatActivity. And the last scope. LEAF represents the final subclass of the specified class matching value. Because Java is single inheritance, the inheritance relationship is a tree structure, so this represents all the LEAF nodes of the inheritance tree whose specified class is the vertex.

Finally, we set up an ActivityRecord class to record the timestamps of onCreate and onWindowFocusChanged, as shown below:

Public class ActivityRecord {/** * public Boolean isNewCreate; public long mOnCreateTime; public long mOnWindowsFocusChangedTime; }Copy the code

Through sActivityRecord. MOnWindowsFocusChangedTime – sActivityRecord. MOnCreateTime get time to open the speed of the interface, and finally, to run the project, will get the log information as follows:

The 2020-01-23 14:12:16. 406 15098-15098 /json.chao.com.wanandroid I/WanAndroid - LOG: │ [null 57 | | json_chao_com_wanandroid_aop_ActivityHooker_onWindowFocusChanged] Json.chao.com.wanandroid.ui.main.activity.SplashActivity onWindowFocusChanged cost 257 2020-01-23 14:12:18. 930 15098-15098/json.chao.com.wanandroid I/WanAndroid-LOG: │ [null 57 | | json_chao_com_wanandroid_aop_ActivityHooker_onWindowFocusChanged] json.chao.com.wanandroid.ui.main.activity.MainActivity onWindowFocusChanged cost 608Copy the code

SplashActivity and MainActivity open at 257ms and 608ms respectively.

Finally, let’s take a look at the monitoring latitude of the interface.

3. Monitor latitude in seconds on the interface

The monitoring latitude of the interface is mainly divided into the following three aspects:

  • The overall time consuming
  • Life cycle time
  • Life cycle interval time

First, we monitor the overall time it takes to open the interface, that is, the time it takes to open the onCreate to onWindowFocusChanged method. Of course, if we are in a special interface, we need to know exactly when the interface will open, which we can use custom interface to achieve. Second, we also need to monitor a time in the life cycle, such as onCreate, onStart, onResume, etc. Finally, we also need to monitor the time between the life cycle interval, which is often overlooked. For example, the time between the end of onCreate and the start of onStart is also time lost, so we can monitor whether it is within a reasonable range. By monitoring the latitude of these three dimensions, we can detect every aspect of the page in a very fine-grained manner.

Four, elegant monitoring of time-consuming blind areas

Although we monitor a lot of time intervals in the application, there are some time intervals that we haven’t captured yet, such as the interval between onResume and list display, which can be easily overlooked in our statistics. Here’s an example:

If we post a message during the Activity's life cycle, it is likely to perform a time-consuming operation. Do you know the specific execution time of this message? If the message takes 1s, the display time of the list will be delayed for 1s. If it is 200ms, the automatic stuck detection we set will not find it, and the display time of the list will be delayed for 200ms.Copy the code

In fact, this kind of scenario is very common, and we will practice it in the project.

First, we add a post message to the onCreate of the MainActivity, which simulates a 1000ms delay for a time-consuming operation as follows:

New Handler().post(() -> {loghelper. I ("Msg execution "); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); }});Copy the code

Then, we printed the time displayed in the list in the corresponding Adapter of RecyclerView, as shown below:

if (helper.getLayoutPosition() == 1 && !mHasRecorded) {
        mHasRecorded = true;
        helper.getView(R.id.item_search_pager_group).getViewTreeObserver().addOnPreDrawListener(new ViewTreeObserver.OnPreDrawListener() {
            @Override
            public boolean onPreDraw() {
                helper.getView(R.id.item_search_pager_group).getViewTreeObserver().removeOnPreDrawListener(this);
                LogHelper.i("FeedShow");
                return true;
            }
        });
    }
Copy the code

Finally, let’s rerun the project to see the execution time of both. The log information is as follows:

The 2020-01-23 15:21:55. 076 19091-19091 /json.chao.com.wanandroid I/WanAndroid - LOG: │ [MainActivity. Java | 108 | lambda $initEventAndData $1 $MainActivity] Msg execution 15:21:56. 2020-01-23 264 19091-19091/json.chao.com.wanandroid I/WanAndroid-LOG: │ [null 57 | | json_chao_com_wanandroid_aop_ActivityHooker_onWindowFocusChanged] Json.chao.com.wanandroid.ui.main.activity.MainActivity onWindowFocusChanged cost 1585 2020-01-23 15:21:57. 207 19091-19091/json.chao.com.wanandroid I/WanAndroid-LOG: │ ArticleListAdapter $1. OnPreDraw (ArticleListAdapter. Java: 93) the 2020-01-23 15:21:57. 208 19091-19091/json.chao.com.wanandroid I/WanAndroid-LOG: │ LogHelper i. (LogHelper. Java: 37) the 2020-01-23 15:21:57. 208 19091-19091 /json.chao.com.wanandroid I/WanAndroid - LOG: ├ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ ┄ The 2020-01-23 15:21:57. 208 19091-19091 /json.chao.com.wanandroid I/WanAndroid - LOG: │ [ArticleListAdapter. Java | 93 | onPreDraw] FeedShowCopy the code

As you can see from the log information, MAinActivity’s onWindowFocusChanged method was called 1000ms later, while the list page was displayed 1000ms later. That is, the message of the post is executed before the interface or list display. Because any developer can go back and do some handler post in some lifecycle or phase and some third party SDK, its handler POST message is likely to be executed before our interface or list display. This time-consuming blind area is very common, and it is not easy to troubleshoot. Below, we analyze the difficulties in the time-consuming blind area.

1. Difficulty in time-consuming blind area monitoring

First, we can capture some time-consuming blind spots by fine-tuning monitoring, but we don’t know what it is doing in this blind spot. Secondly, for some time-consuming blind areas on the line, we cannot conduct investigation.

Here, we first look at how to set up an offline solution for time-consuming blind spot monitoring.

2. Offline scheme for time-consuming blind area monitoring

Here, we can directly use TraceView to detect, because it can clearly record what the thread does in a specific time, especially suitable for blind area monitoring within a period of time.

Then, let’s look at how to set up an online solution for time-consuming blind spot monitoring.

3, time-consuming blind area monitoring online scheme

We know that the main thread of all methods are carried out by the message, remember we learned before in a library: AndroidPerformanceMonitor, whether we can do by this mLogging blind spot detection? It is true that mLogging can know the message of our main thread, but it cannot get the specific call stack information through mLogging, because it gets the call stack information from the system callback, it does not know who threw the current message, so this scheme is not perfect.

So, can we cut Handler methods using AOP? SendMessage, sendMessageDeleayd, etc., so that we can know which stack of messages occurred, but the problem with this scheme is that it does not know the exact execution time, so we cut this handler method, Just know exactly where it was sent and its corresponding stack information, but can not get the exact execution time. If we want to know what messages are executed between onResume and list presentation, we can’t do that using AOP either.

So, an online solution to the ultimate time-consuming blind spot monitoring is to use a unified Handler with two custom methods, sendMessageAtTime and dispatchMessage. Because for sending message, whatever method is called will end up calling sendMessageAtTime, and for message, it will end up calling dispatchMessage. Then, we need to customize a Gradle plugin to automatically access our custom handler. This way, we can dynamically replace the handler customized for us by all the parent classes that use this handler at compile time. In this way, all sendMessage and handleMessage will pass through our callback method throughout the project. Next, let’s do some actual practice.

First, I present the custom global Handler class as follows:

public class GlobalHandler extends Handler {

    private long mStartTime = System.currentTimeMillis();

    public GlobalHandler() {
        super(Looper.myLooper(), null);
    }

    public GlobalHandler(Callback callback) {
        super(Looper.myLooper(), callback);
    }

    public GlobalHandler(Looper looper, Callback callback) {
        super(looper, callback);
    }

    public GlobalHandler(Looper looper) {
        super(looper);
    }

    @Override
    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
        boolean send = super.sendMessageAtTime(msg, uptimeMillis);
        // 1
        if (send) {
            GetDetailHandlerHelper.getMsgDetail().put(msg, Log.getStackTraceString(new Throwable()).replace("java.lang.Throwable", ""));
        }
        return send;
    }

    @Override
    public void dispatchMessage(Message msg) {
        mStartTime = System.currentTimeMillis();
        super.dispatchMessage(msg);

        if (GetDetailHandlerHelper.getMsgDetail().containsKey(msg)
            && Looper.myLooper() == Looper.getMainLooper()) {
            JSONObject jsonObject = new JSONObject();
            try {
                // 2
                jsonObject.put("Msg_Cost", System.currentTimeMillis() - mStartTime);
                jsonObject.put("MsgTrace", msg.getTarget() + " " + GetDetailHandlerHelper.getMsgDetail().get(msg));

                // 3
                LogHelper.i("MsgDetail " + jsonObject.toString());
                GetDetailHandlerHelper.getMsgDetail().remove(msg);
            } catch (Exception e) {
            }
        }
    }
}
Copy the code

The GlobalHandler above will be a parent of all the handlers in our project. In comment 1, the sendMessageAtTime method determines that if the message is sent successfully, the stack information corresponding to the current Message object will be stored in a ConcurrentHashMap. The code for the GetDetailHandlerHelper class is as follows:

public class GetDetailHandlerHelper { private static ConcurrentHashMap<Message, String> sMsgDetail = new ConcurrentHashMap<>(); public static ConcurrentHashMap<Message, String> getMsgDetail() { return sMsgDetail; }}Copy the code

In this way, we know who sent the message. Then, within the dispatchMessage method, we can calculate the time it takes to process the message and save it to a jsonObject in comment 2. At the same time, We can also get the stack information for this message via the ConcurrentHashMap object of the GetDetailHandlerHelper class and print it to the log console at comment 3. Of course, if it is online monitoring, it will save this information to the local, and then choose the appropriate time to upload. Finally, we can also make a judgment in the method body. We can set a threshold value, for example, the threshold value is 20ms. If the threshold value exceeds 20ms, the saved information will be reported to the APM background.

After using the Gradle plugin to replace all of the handler’s parent classes with our custom GlobalHandler, we can gracefully monitor the time gaps in our application.

For gradle plug-ins that implement a global replacement handler, in addition to the AspectJ implementation, one existing project is recommended: DroidAssist.

Then, rerun the project with the key log information shown below:

MsgDetail {"Msg_Cost":1001,"MsgTrace":"Handler (com.json.chao.com.wanandroid.performance.handler.GlobalHandler) {b0d4d48} \n\tat 
com.json.chao.com.wanandroid.performance.handler.GlobalHandler.sendMessageAtTime(GlobalHandler.java:36)\n\tat
json.chao.com.wanandroid.ui.main.activity.MainActivity.initEventAndData$__twin__(MainActivity.java:107)\n\tat"
Copy the code

From this information, we can not only know when the message was executed, but also know from the corresponding stack information where the message was sent. In this case, line 107 of the MainActivity is new Handler().post(). This way we can see which custom messages were executed before the list was displayed, and we can see at a glance which messages were not executing as we expected, such as messages that took too long to execute, or messages that could be executed later, This can be modified according to actual project and business requirements.

4. Summary of time-consuming blind area monitoring scheme

Time-consuming blind area monitoring is an indispensable part of our caton monitoring, but also an important guarantee of the comprehensiveness of caton monitoring. It is important to note that TraceView only applies to an offline scenario, and for TraceView, it can be used to monitor messages in our system. The dynamic replacement approach is actually online and monitors only one message from the application itself.

5. Summary of Caton optimization skills

1. Practical experience of Cartun optimization

If your app is stuttering, consider the following ways to optimize:

  • First, for time-consuming operations, we can consider asynchronous or lazy initialization, which can solve most problems. However, you must pay attention to the elegance of the code.
  • For layout loading optimization, AsyncLayoutInflater or X2C can be used to optimize the main thread IO and consumption caused by reflection. At the same time, it is necessary to pay attention to the redraw problem.
  • In addition, memory problems can also cause application interface to lag. We can reduce the number and time of GC by reducing the memory usage, and the number and time of GC can be checked by log.

Then, let’s look at tool building optimized by Caton.

2. Caton optimization tool construction

Tool building is often overlooked, but its benefits are very large, and it is also a focus of Caton optimization. First of all, for system tools, we must have an understanding, and must learn to use it, here we review.

  • In the case of Systrace, you can easily see its CPU usage. In addition, it costs less.
  • For TraceView, we can easily see what each thread does in a specific period of time, but TraceView has a relatively large overhead, sometimes it may be biased optimization direction.
  • Also, note that StrictMode is a very powerful tool.

Then, we introduce the automation tool construction and optimization scheme. We introduced the two tools, AndroidPerformanceMonitor and ANR — WatchDog. For AndroidPerformanceMonitor problems at the same time, we used the high frequency acquisition, to find a way of high repetition rate stack is optimized, in the learning process, we not only need to learn how to use tools, more to understand their implementation principle and their usage scenarios.

At the same time, we have refined the construction of caton optimization tools. For single point problems, such as IPC monitoring, we use Hook methods to find problems as soon as possible. For the monitoring of time-consuming blind areas, we replace the Handler method online to monitor the message execution time and call stack of all child threads.

Finally, let’s look at the indicators of Caton’s monitoring. We calculate the overall app holdup rate, the ANR rate, the interface turn-on rate, the exchange time, the life cycle time, and so on. When reporting ANR information, we also need to report environment and scene information, which is not only convenient for us to conduct horizontal comparison between different versions, but also can be combined with our alarm platform to detect anomalies in the first time.

Summary of solutions to common stuck problems

1. How to solve the lag problem caused by CPU resource competition?

At this point, our application should not only control CPU consumption for core functions, but also minimize CPU consumption for non-core requirements.

2. What are the inefficient apis provided in Android Java to be aware of?

For example, list. removeall internally iterates through the List of messages that need to be filtered. This will cause redundant use of CPU resources if there is already a circular List.

3. How to reduce the CPU consumption of graphics processing?

At this time, we need to use renderScript to perform related operations for graphics processing and convert CPU to GPU. For background on RenderScript, you can check out my in-depth exploration of Android layout optimization (below).

4. How to solve the problem caused by hardware acceleration of long Chinese font rendering?

The hardware acceleration of TextView can only be turned off, as shown below:

textView.setLayerType(View.LAYER_TYPE_SOFTWARE, null);
Copy the code

When open the hardware acceleration during long Chinese font rendering, will first call ViewRootImpl. The draw () method, finally will call GLES20Canvas. NDrawDisplayList () method to use JNI to Native layer. . In this way, will continue to call OpenGLRenderer drawDisplayList () method, it by calling the DisplayList replay method, in order to playback recorded in front of the DisplayList perform drawing operations.

The DisplayList replay method iterates over every action saved in the DisplayList. The operation name of render font is DrawText. When traversing a DrawText operation, OpenGLRender:: DrawText method will be called to render the font. Finally, the Font:: Render () method is called in the OpenGLRender::drawText method, and one of the key operations in this method is to retrieve the Font cache. We all know that each Chinese encoding is different, so the caching effect of Chinese is not ideal, but for English, only 26 letters need to be cached. Before Android 4.1.2, the Buffer Settings for text were too small, so this is a serious problem. If your application performs reasonably well in other versions, you can simply turn Android 4.0.x hardware acceleration off, as shown below:

// AndroidManifest <Applicaiton... Android: hardwareAccelerated = "@ bool/hardware_acceleration" > / / the value - value - v15 v14, set in the corresponding bool value < bool name="hardware_acceleration">false</bool>Copy the code

In addition, there are some other problems with hardware rendering that need to be noted, as shown below:

  • In the case of software rendering, if you want to redraw all of the child views of a Parent View, you can simply call the Parent View’s invalidate() method. However, with hardware acceleration enabled, this is not possible. You need to traverse the entire child View and call invalidate().
  • In the case of software rendering, Bitmap reuse is often used to save memory, but this will not work if hardware acceleration is enabled.
  • 3. When the UI with hardware acceleration enabled runs in the foreground, it consumes extra memory. This extra memory may not be released when the hardware-accelerated UI is switched to the background, which is mostly in Android version 4.1.2.
  • 4. Bitmaps that are longer or wider than 2048 pixels cannot be drawn and appear transparent. The reason is that OpenGL’s material size is limited to 2048 × 2048, so for bitmaps that are larger than 2048 pixels, you need to cut them into 2048 × 2048 chunks and put them together for display.
  • 5, when there is transition painting in the UI, it may occur, generally speaking, drawing less than 5 layers will not appear screen phenomenon, if there is a large red area should be very careful.
  • 6. It should be noted that LAYER_TYPE_SOFTWARE draws Bitmap as off-screen cache regardless of whether hardware acceleration is turned on or not, but the difference is that when hardware acceleration is turned on, The Bitmap will eventually be rendered by hardware-accelerated drawDisplayList.

7. Common problems of Caton optimization

1. How do you do Caton optimization?

From the initial stage of the project to the growth stage, and finally to the maturity stage, each stage has done a different treatment for caton optimization. What each phase does is as follows:

  • 1. Positioning and solving system tools
  • 2. Automation caton scheme and optimization
  • 3. Construction of online monitoring and offline monitoring tools

I do caton optimization is also went through some stages, initially appeared some of our project module after caton, I through the positioning system tools, I used the Systrace, then watch the CPU caton cycle conditions, at the same time combining with the code, the module for the refactoring, asynchronous and delay is made for the part of the code, This is how problems were solved early in the project.

However, with the expansion of our project, there are more and more problems of offline lag. At the same time, there are also online feedback of lag, but it is difficult for us to reproduce the online feedback of lag. Therefore, we started to look for automatic lag monitoring scheme, which is based on the message processing mechanism of Android. Any code executed by the main thread goes back to the Looper.loop method, which has a mLogging object that is called before and after each message. This is the time we use to automate the monitoring scheme. At the same time, at this stage, we also improved the online REPORTING of ANR. The way we adopted was to monitor the information of ANR, and combined with ANR-watchdog, as a supplementary scheme for the higher version without file permissions.

After finishing this caton detection scheme, we also did the construction of online monitoring and offline detection tools, and finally realized a complete set of comprehensive and multi-dimensional solutions.

2. How do you automate the acquisition of caton information?

The main thread executes any code that goes into the Looper. Loop method. This function has a mLogging object that gets called before and after each message processing. So we have to execute the time consuming code in the dispatchMessage method, so before we execute the message, we can postDelayed a task in the child thread, and the Delayed time is the threshold that we set, If the main thread messaege completed within the threshold value, then cancel this task among the child thread, if the main thread of the message within the threshold value has not been completed, the task of child thread will be implemented, it will get to the current one of the main thread execution stack, then we can know where the card.

Through practice, we found that the scheme for the stack information it is not necessarily accurate, because access to the stack information it is likely to be the main thread in one location, and really took place in fact already completes, then, we did some optimization on the plan that we take the high frequency sampling scheme, Also is in a cycle we will collect the main thread’s stack information for many times, if there is a card, then we will this caton information reported to the APM backstage after compression, then find out the repeated stack information, these recurring stack is caton happened a big probability position, thus improve the caton information a accuracy.

3. How does Caton’s whole solution work?

Firstly, for Caton, we adopted a combination of online and offline tools. For offline tools, we needed to expose problems as early as possible, while for online tools, we focused on the comprehensiveness of monitoring, automation and sensitivity of abnormal perception.

At the same time, the Caton problem has many problems. For example some code, it is less than you, a threshold value of the card but too much, or it mistakenly carried out many times, it can also lead to the user senses a card, so we in the offline through the way of AOP has carried on the Hook to common time-consuming code, and then for a period of time to obtain the data for analysis, We can know when and how many times these time-consuming pieces of code occur and how long they take. Then, we see if it meets one of our expectations, and if it doesn’t, we can go offline and change it. At the same time, caton monitors it with a lot of blind areas that are easy to be ignored, such as an interval in the life cycle. For this particular problem, we used compile-time annotations to modify all the Handler parent classes in the project, and monitored two of the methods. We know the execution time of the main thread Message and their call stack.

For online lag, in addition to the calculation of App lag rate, ANR rate and other conventional indicators, we also calculated the page open rate in seconds, the execution time of life cycle and so on. In addition, we also saved as much information about the current scene as possible at the moment of the deadlock, which left a basis for us to solve or reproduce the deadlock later.

Eight, summary

Congratulations, if you look at this, you’ll see that caton optimization is not an easy thing to do, it requires you to have a systematic knowledge to build upon. Finally, let’s review the nine themes we have explored in the face of Caton optimization:

  • 1. Kadon optimization analysis method and tools: Background introduction, kadon analysis method using shell command to analyze CPU time, kadon optimization tool.
  • 2, automatic caton detection scheme and optimization: optimization principle, the AndroidPerformanceMonitor actual combat and caton test plan.
  • 3. ANR analysis and actual combat: ANR execution process, online ANR monitoring mode, ANR-watchdog principle.
  • 4. Single point problem detection scheme: IPC single point problem detection scheme and single point problem detection scheme.
  • 5. How to realize the interface opening in seconds? : Interface second open implementation, Lancet, interface second open latitude monitoring.
  • 6, elegant monitoring time-consuming blind area: time-consuming blind area monitoring difficulties and online and offline monitoring schemes.
  • 7. Summary of Carden optimization skills: Carden optimization practice experience, carden optimization tool construction.
  • 8︎ summary of solutions to common stacken problems
  • 9. Common problems of Caton optimization

I believe that seeing here, you must harvest a lot, but remember, no matter how good the plan is, it is only by doing it yourself that you can really master it. The only correct way to learn is to pay attention to practice, make full use of perceptual and cognitive potential and temper oneself in projects. In practice, deliberate practice on some key movements will also achieve twice the result with half the effort.

Reference links:

1, domestic Top team bull take you to play Android performance analysis and optimization chapter 6 Caton optimization

2, Geek time Android development master class stuck optimization

3. “Android Mobile Performance Combat” chapter 4 CPU

4. Android Mobile Performance In Action, Chapter 7 fluency

5, Android dumpsys cpuInfo information interpretation

6. How to explain the definition of “UV and PV” clearly and easily?

7, Nanoscope -An extremely accurate Android Method tracing tool

8, DroidAssist-A Lightweight Android Studio Gradle plugin based on Javassist for Editing

9, Lancet -A Lightweight and FAST AOP Framework for Android App and SDK developers

10. MethodTraceMan- Used to quickly find time-consuming methods to solve Android App lag problems

11. CPU usage of processes in Linux

12. Use fTrace

Profilo -A library for performance functions from production

14. Introduction to Ftrace

Atrace source code

16, AndroidAdvanceWithGeektime/Chapter06

17, AndroidAdvanceWithGeektime/Chapter06 – plus


Thank you for reading this article and I hope you can share it with your friends or technical group, it means a lot to me.