This is the fourth day of my participation in the November Gwen Challenge. See details: The last Gwen Challenge 2021.

preface

In the previous section, we popularized some knowledge about the ANR principle of APP caton system. In fact, we first understood the principle and then implemented the scheme.

App Caton series 1: Handler synchronization barrier

App Caton series II: Learning screen refresh mechanism

App Caton series three: ANR principle

The monitoring process of ANR is summarized as follows:

It takes a long time from system monitoring to ANR to dumpTrace logs. Therefore, trace logs captured by the system may or may not be correct. In order to better analyze the running status of APP when ANR occurs, it is necessary to supplement the ANR collection messages of the system. Toutiao has published the implementation scheme of monitoring, but there is no corresponding open source framework. I have implemented relevant codes according to toutiao’s implementation scheme.

Solution landing:

In toutiao ANR optimization Practice series – Monitoring tools and analysis ideas, Raster mainly monitors the scheduling process of the main thread.

  1. Monitor the time spent on each message distribution process and aggregate the messages
  2. Monitor the scheduling capability of the main thread
  3. Gets the stack awaiting scheduling

According to the content of message aggregation:

According to the description of the headline scheme, we divide the aggregation of messages into the following types

Public static final int MSG_TYPE_NONE = 0x00; Public static final int MSG_TYPE_INFO = 0x01; Public static final int MSG_TYPE_WARN = 0x02; public static final int MSG_TYPE_WARN = 0x02; Public static final int MSG_TYPE_ANR = 0x04; Public static final int MSG_TYPE_JANK = 0x08; public static final int MSG_TYPE_JANK = 0x08; Public static final int MSG_TYPE_GAP = 0x10; Public static final int MSG_TYPE_ACTIVITY_THREAD_H = 0x20; public static final int MSG_TYPE_ACTIVITY_THREAD_H = 0x20;Copy the code

Time stack acquisition

For the main thread’s time-consuming monitoring, Printer is set for the main thread’s Looper, and the start time is initialized every time it starts to process messages. The sub-thread watchDog is used to check whether the main thread’s message processing times out regularly. Once the timeout is found, ANR message is triggered.

One important point to note here is that the child thread found that the main thread did not complete within the specified time, triggering the ANR message monitoring log, but at this point we can’t get the main thread execution time (or I can’t find the corresponding API). Also, since this monitoring is not triggered by the system, the following situations will follow

  1. The time-consuming message ends quickly and does not trigger the next ANR
  2. The time consuming message is very long and triggers the next ANR monitoring
  3. System ANR monitoring is triggered
  4. App was killed for some other reason

To better process these messages, when ANR is triggered, a message is sent to collect ANR information. ANR is notified of the completion of message collection once the message is processed. That is, an ANR message will be notified once or more.

Get the main thread stack:

Thread provides a getStackTrace method that records the stack of function calls for each Thread.

public StackTraceElement[] getStackTrace() { StackTraceElement ste[] = VMStack.getThreadStackTrace(this); return ste ! = null ? ste : EmptyArray.STACK_TRACE_ELEMENT; }Copy the code

Get penging messages

There are two main ways to get pengding messages from message queues

  1. Fetch each element of the message queue by reflection
  2. Get messages on the current message queue by Looper#dump ().

The advantage of mode 1 is that we can get the details of each message on the message queue through reflection, but reflection is costly.

Method 2: The information obtained through the API provided by the system can be regarded as a string, which is not flexible enough to handle.

Finally, method 2 is selected. Although the information obtained is not flexible enough, the returned message can basically achieve the purpose we want.

Main thread scheduling ability detection

According to the Checktime principle, we will send messages to the main thread in a loop, and reflect the scheduling ability of the main thread from the side by judging the difference between the message processing time and the preset time.

, in time, before each scheduling for current system and then subtract we set the delay time, can get real time interval before the thread scheduling, such as setting the thread scheduling once every 300 ms, the results showed that the actual response time interval is sometimes more than 300 ms, if the deviation, the greater the instructions were not timely thread scheduling, Further, the system response capability deteriorates.

Example Add system ANR monitoring

Through learning the principle of ANR, we can know that it is impossible to monitor all anR scenarios in the system if we only monitor the time of each message distribution in the main thread.

For example, there are a large number of time-consuming messages waiting to be processed in the message queue. As a result, system monitoring cannot be removed in time. So when ANR happens, our monitoring tools don’t respond at all.

Therefore, we need to make corresponding response when the system monitors the OCCURRENCE of ANR in APP.

This can be done by listening on SIGNALQUIT.

About jank’s surveillance

We know from previous studies that when the system prints

Log.i(TAG, "Skipped " + skippedFrames + " frames!  "               + "The application may be doing too much work on its main thread.");
Copy the code

UI update cannot be performed in time because the main thread is performing other time-consuming tasks. Because of the synchronization barrier, having more messages waiting on the message queue does not cause UI lag (except for asynchronous messages).

Our Jank monitor is different from the system printing the main thread, which monitors the interval between sending a message and its execution. We are monitoring the entire execution time of doFrame. Based on what we learned about the screen refresh mechanism. DoFrame basically reflects the time consuming of the View’s three processes from the side.

About whether or not a message in the update UI, we found by reading the source code only will update the UI handler for android. The Choreographer FrameHandler. The callback for android. View. ChoreographerFrameHandler. The callback for android. View. ChoreographerFrameHandler. The callback for android. View. ChoreographerFrameDisplayEventReceiver

/** * public static Boolean isBoxMessageDoFrame(BoxMessage message){return message! = null && "android.view.Choreographer$FrameHandler".equals(message.getHandleName()) && message.getCallbackName().contains("android.view.Choreographer$FrameDisplayEventReceiver"); }Copy the code

Overall code design

The implementation of the code is roughly divided into four parts:

  1. BlockConfig configures monitoring parameters
  2. (CTRL C + CTRL V) Iqiyi xCrash sends anR signals to the BlockMonitor and then sends signals to the SIGNAL Catcher thread. Thus restore the original ANR processing flow.
  3. BlockMonitor is the core of the whole monitoring, and internally implements Looper to monitor the distribution of every message. At the same time, anR is monitored by watchDog idea, and the scheduling ability of the main thread is monitored by sending messages to the main thread.
  4. Sample is used as the export for collecting log information. Custom log processing can be implemented by configuring corresponding listeners based on BlockConfig.

Project named

The headline explains it like this:

Tool introduction:

The tool monitors messages in the main thread scheduling process and aggregates them according to certain policies to minimize the impact of the monitoring tool on application performance and memory jitter. At the same time, the message execution process of the four components is monitored to facilitate the tracking and recording of the scheduling and time consumption of such messages. In addition, statistics are made on the messages currently being scheduled and messages to be scheduled in the message queue, so that the overall scheduling situation of the main thread can be played back when problems occur. In addition, we migrate the CheckTime mechanism of system service to the application side and apply it as the thread CheckTime mechanism, so as to predict the system load and scheduling situation in the past period from the timeliness of thread scheduling when the system information is insufficient.

So the tool can be summed up in one sentence: from point to surface, playback past, present and future.

Because of its realization principle and the effect of message aggregation, the tool is named Raster because it visually displays the time-consuming fragments of different lengths in the main thread scheduling process, just like a Raster.

Although the toutiao framework is not open source, in order to differentiate itself from toutiao’s solution, it also plays back the past, present and future from point to point according to the tool’s overview —-. I’m going to call this the Moonlight Box

When we say Bono bono, it goes back in time and tells us how the main thread message queue was scheduled before anR occurred.

Code Address:

Github.com/xiaolutang/…

The release is official

At this stage, the main code features are complete, and there are still some documentation and details to be worked out. It is also hoped that experienced players will be able to give some guidance and see what features still need to be worked on before the official release.

Article Reference:

Toutiao ANR Optimization Practice series – Design principles and influencing factors

Toutiao ANR Optimization Practice series – Monitoring tools and analysis ideas

\