At present, several effective monitoring methods in the industry are as follows:

  • Child threads constantly poll the main thread.

  • Looper Printer

  • Choreographer FrameCallback

  • The entry and exit of the function are recorded in the way of piling

Method 1

We can open a child thread to poll the main thread continuously. The principle and implementation method is very simple: send Message continuously to the main thread, and check once in a while whether the Message just sent is processed. If it is not processed, the main thread is stuck during this period.

The advantage of this method is that it is simple to implement and can monitor various types of latches. The disadvantage is that polling method is not elegant enough and the polling time is not easy to determine. The shorter the time interval, the greater the impact on performance.

Reason: If my polling interval is set as 3s, I cannot detect the delay occurring in 1.5s~4.5s, because 0 ~ 3s and 3s-6s both have delays, and all messages sent can be processed. Therefore, when I set the delay threshold as 3s, the delay will be missed. There is no particularly good way, can only adjust the time threshold and failure rate to achieve a balance.

Code snippet:

class UiMonitorThread implements Runnable {
    @Override public void run(a) {
        while (isRunning) {
            // Send a message to the main thread every 1.5s
            uiMonitorHandler.sendEmptyMessage(id);
            try {
                Thread.sleep(1500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            // If two consecutive messages are not processed, a delay is considered to have occurredcheckMessageHandled(); }}}Copy the code

Way 2

We can use the system method setMessageLogging to replace the Printer object of the main thread Looper and get the execution time of the system dispatchMessage method by calculating the time difference of Printer printing logs.

Looper.getmainlooper ().setMessagelogging (STR -> {// calculate the interval between two adjacent logs});Copy the code

The advantage of this method is that it is easy to implement and will not be missed. The disadvantage is that some types of latencies cannot be monitored.

Android.os.looper# loop()

As you can see from the code, just monitoring dispatchMessage does not cover all the latons. The mqueue.next comment makes it clear: might block. These include the nativePollOnce method and idler.Queueidle () method. Me.mqueue.next

The nativePollOnce method is important because it blocks here when the main thread is idle, and the View’s touch events are handled here. So if your application contains a lot of custom views, or handles a lot of onTouch events, it’s not acceptable.

Not only that, Native messages are also stuck in the nativePollOnce method, so they can’t be monitored.

The queueIdle() method is called when the main thread is idle, so if we have a time-consuming operation here, it could cause a lag, which is also unmonitored.

Another scenario that drew Caton’s attention was what is often called the synchronization barrier (the first time you heard the name, you looked stunned). Our messages are all synchronous messages by default, and when we call invalidate to refresh the UI, we end up calling the scheduleTraversals method in ViewRootImpl, Synchronization barrier messages are inserted into the main thread, Looper postSyncBarrier, so that all synchronization messages in Looper are skipped when the UI is refreshed, so that synchronization barrier messages in the rendering UI are processed first.

The scheduleTraversals method and unscheduleTraversals method are paired, but are not thread-safe. If invalidate is performed on an asynchronous thread, scheduleTraversals will be executed multiple times, and unscheduleTraversals will remove only the last mTraversalBarrier. This causes the synchronization messages of the main thread’s Looper to remain unprocessed, causing a deadlock.

Having said all these problems, however, as a mainstream monitoring scheme, some defects have already been solved.

  • NativePollOnce’s onTouchEvent monitor

We can Hook the recvForm and sendto methods of libinput.so with ELF Hook and replace them with our own methods. When the sendto method is called, the onTouch event has been consumed.

The input system of the system will be covered in a future article.

  • IdleHandler# queueIdle monitoring

The ArrayList mIdleHandlers hold all the idleHandlers we need, so we can just reflect and assign to our own MyArrayList and rewrite MyArrayList’s add method, Is it possible to monitor every IdleHandler that is added?

Once we have the IdleHandler added to the add method, we can monitor the queueIdle method execution time.

static class MyArrayList<E> extends ArrayList {
    @Override
    public boolean add(Object o) {
        if (o instanceof MessageQueue.IdleHandler) {
            super.add(new MyIdleHandler((MessageQueue.IdleHandler)o));
        }
        return super.add(o); }}static class MyIdleHandler implements MessageQueue.IdleHandler {
    private final MessageQueue.IdleHandler idleHandler;
    MyIdleHandler(MessageQueue.IdleHandler idleHandler) {
        this.idleHandler = idleHandler;
    }
    @Override
    public boolean queueIdle(a) {
        // Monitor idleHandler.queueidle ()
        return this.idleHandler.queueIdle(); }}Copy the code
  • The synchronization barrier is jammed

If mmessages. target=null and mmessages. when has been too long, the synchronization barrier may leak. We can then actively send a synchronous message and an asynchronous message to the main thread Looper. If the synchronous message cannot be executed but the asynchronous message is processed, it is almost certain that a leak has occurred.

We can remove vesyncBarrier (token) by reflection, where token is mmessages.arg1.

Methods 3

Android has joined Choreographer since 4.1 to work with the VSync mechanism (see my previous blog about VSync) for a unified scheduling and drawing interface. We can set up the Choreographer class’s FrameCallback function, which fires the FrameCallback callback every time a frame is rendered. FrameCallback calls the doFrame(Long frameTimeNanos) function, An interface render calls back to doFrame, and if the interval between two Doframes is greater than 16.6ms, a lag occurs. The number of callbacks in 1s represents the actual frame rate.

Choreographer.getInstance().postFrameCallback(new Choreographer.FrameCallback() {
    @Override
    public void doFrame(long frameTimeNanos) {
        // You can count adjacent intervals, judge the lag, and also count the doFrame frame rate
        Choreographer.getInstance().postFrameCallback(this); }});Copy the code

Advantages of this method: simple to use, not only support the lag monitoring, but also support the calculation of frame rate. The disadvantage is that you need to open another child thread to get the stack information, which can consume some system resources.

Methods 4

In the Android compilation process (see this blog), we can use plugin’s Transform mechanism to perform secondary processing on the compiled class file before it is compiled into dex. The output of each Transform is used as input for the next Transform to Transform the bytecode. It is recommended to use ASM. The specific method of pile insertion is not mentioned here, and will be introduced later.

The purpose of piling is to record the entry and exit of the function, including the action, method name and time stamp, so that we can calculate the time and restore the call stack.

The advantages of this approach is that can be traced back to other way needs to be caton stack and all kinds of necessary information, this is not easy to do so, the downside is that: the project the amount of data, computation, IO bottleneck should be taken into account, of course, here are empty, technology research and there is always a gap between actual implementation.

In terms of the coverage of pile insertion, we can select the insertion to avoid the CONSUMPTION of CPU caused by a large number of pile insertion:

  1. You can eliminate unnecessary third-party libraries and system libraries.

  2. You can filter out some very simple functions.

  3. Filter code automatically generated by the compiler

In this chapter.

The second part introduces the principle of ANR monitoring technology.