Common anr
- Input, click event: 5 seconds;
- Contentprovider: 10 seconds;
- Broadcast: foreground 10 seconds, background 60 seconds;
- Service: 20 seconds for the foreground and 200 seconds for the background
The principle of
- Planting and defusing bombs
When a service is started, the scheduleCreateService method is called to create the service
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState);
Copy the code
During the creation process, the handler is used to send delayed messages and pre-bury bombs
mAm.mHandler.sendMessageDelayed(msg, proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
Copy the code
If the operation is completed within the delay time, the Handler removes the delay message and disarm the bomb
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
Copy the code
This design idea of delay message using handler, burying bomb and defusing bomb can be used for reference in daily development. Such as watchdog is implemented using this principle
If the bomb is not defused within the specified time, the system’s appNotResponding method is executed to collect the current stack information, print it to the console and write it to the trace file
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation){
}
Copy the code
Except for input event, all other ANR types use this method of bomb loading and bomb disarming. Input adopts the scheme of checking anR before accepting the next input event
- Core flow code — loop.loop () function
public static void loop() { for (;;) { Message msg = queue.next(); . if (logging ! = null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); }... msg.target.dispatchMessage(msg); . if (logging ! = null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); }}... }Copy the code
Anr in code process and monitor has two main convergent point, respectively is Message MSG = queue. The next () function is too long, or MSG. Target. DispatchMessage (MSG); The execution of the specific message function takes too long so only these two points need to be monitored.
Monitoring plan
- BlockCanary, watchDog solution
Println (“<<<<< Finished to “+ msg.target +” “+ msg.callback); To calculate whether it takes time and print out the stack to locate the stuck. However, it is not accurate to locate the stuck information at this time, because the function has been executed at this time and cannot accurately reflect the specific stack information at the scene of the stuck. In practice, you need to dense the Dumo stack to get accurate information. But a dense dump stack leads to lag and large hprof file transfers, which can backfire
- Bytecode staking
Using the Gradle Plugin and ASM, a method is added at the beginning and at the end of the method, and a threshold is defined. If the threshold is exceeded, the stack is printed. At this time, the stack is accurate, but it will cause the number of methods to explode, so the scope of the strategy and bytecode staking should be well defined.
- Online monitoring scheme
Neither bytecode staking nor BlockCanary or watchDog should be carried in full to an online formal package. In actual operation, the combination of the two can be adopted.
For example, debug and grayscale periods can be bytecode pegs to monitor time and ANR packets, Println (“<<<<< Finished to “+ msg.target +” “+ msg.callback) code is brought online with switches and policies, which can be turned off and on flexibly according to specific problems and requirements. This can solve problems without affecting performance
- BlockCanary and watchdog improvements
The open source library on the Internet generally has the characteristics of universal applicability, but we need to transform according to the characteristics and needs of our project, or reference its source code and design to write their own.
Println (“<<<<< Finished to “+ msg.target +” “+ msg.callback) to add your own business special logic when monitoring lag. Adapt switch and dynamic degrade policy at the same time.
Watchdog based on the idea of burying and disarming bombs, every 5 seconds to calculate whether the card, so easy to cause 5 seconds of error. We can change the interval time to 1 second, and if the accumulative 5 times of continuous monitoring are not changed, it can be regarded as the occurrence of stalling. This reduces errors without affecting performance.