Summary of ANR

1) First, ANR(Application Not responding) refers to the Application that does Not respond. The Android system needs to complete some events within a certain time range. If the event fails to receive an effective response or the response time is too long, ANR will result. ANR is guaranteed by the message processing mechanism. Android has implemented a set of sophisticated mechanisms to discover ANR in the system layer, and the core principles are message scheduling and timeout processing.

2) Secondly, the main body of ANR mechanism is implemented at the system layer. All messages related to ANR are scheduled by the system process (System_server), and then sent to the application process to complete the actual processing of the message. Meanwhile, the system process designs different timeout limits to track the message processing. Once the application handles the message incorrectly, the timeout limit comes into play, collecting system state such as CPU/IO usage, process function call stack, and reporting that the user has a process that is not responding (ANR dialog).

3) Then, the ANR problem is essentially a performance problem. The ANR mechanism effectively limits the main thread of an application by requiring the main thread to complete some of the most common operations (starting services, processing broadcasts, processing inputs) in a limited amount of time, and if the processing times out, the main thread is considered to have lost its ability to respond to other operations. Time-consuming operations in the main thread, such as cpu-intensive operations, large amounts of IO, and complex interface layouts, can reduce the responsiveness of the application.

What scenarios contribute to ANR?

1. When they happen ANR call AppNotRespondingDialog. Show () method of the pop-up dialog prompting the user, the dialog box, in turn, call relations as shown in the figure below:


2. AppErrors. AppNotResponding (), the method is the only entry eventually pop up ANR dialog, call the method of the scene will be ANR tips, also can be thought of in the main thread no time consuming tasks, as long as the final not invoke this method, will not have ANR hint, There will be no ANR related logs and reports; The call relationships show which scenarios lead to ANR. There are four scenarios:

(1) Service Timeout: the Service cannot complete within a specified period of time

(2) BroadcastQueue Timeout: The BroadcastReceiver cannot finish processing the BroadcastQueue within a specified period of time

(3) ContentProvider Timeout: execution Timeout of the ContentProvider

(4) inputDispatching Timeout: The key or touch events do not respond in certain time.

ANR mechanism

The ANR mechanism can be divided into two parts: ANR monitoring mechanism: Android has a monitoring mechanism for different ANR types (Broadcast, Service, and InputEvent). ANR reporting mechanism: After ANR is detected, the ANR dialog box needs to be displayed, and logs (process function call stack, CPU usage, etc.) need to be output.

The code of the entire ANR mechanism also spans several layers of Android: App layer: the processing logic of the main application thread; The Framework layer: The core of ANR mechanism includes AMS, BroadcastQueue, ActiveServices, InputmanagerService, InputMonitor, InputChannel, ProcessCpuTracker, etc. Native layer: inputDispatcher.cpp;

The Provider timeout mechanism is rarely encountered, so it is not analyzed temporarily. Broadcast wants to talk about two things:

First: whether ordinary broadcast or ordered broadcast, the final broadcast receiver’s Onreceive is serial execution, can be verified by Demo;

Second: by adding relevant logs through Demo and framework, it has been verified that ordinary Broadcast can also have ANR monitoring mechanism. ANR mechanism and problem analysis article believe that only serial Broadcast can have ANR monitoring mechanism. In the follow-up, we will specifically explain the Broadcast sending and receiving process, and also supplement the Broadcast ANR monitoring mechanism. This paper mainly discusses ANR monitoring mechanism by taking Servie processing timeout and input event distribution timeout as examples.

Service timeout monitoring mechanism

The Service runs on the main thread of the application, and if the Service takes more than 20 seconds to execute, an ANR is raised.

When a Service ANR occurs, check whether the Service life cycle functions (onCreate(), onStartCommand(), etc.) do time-consuming operations, such as complex operations, IO operations, etc. If the application code logic fails to find the problem, you need to check the current system status, such as CPU usage and system service status, to determine whether the ANR process is affected by the system running exception.

How do I detect a Service timeout? Android does this by setting up timed messages. Timed messages are handled by AMS’s message queue (System_Server’s ActivityManager thread). AMS has context information for Service execution, so it makes sense to have a timeout detection mechanism in AMS. Let’s start with two questions: Service startup process? How to monitor Service timeout?

The above two questions are mainly used to illustrate the Service monitoring mechanism. After knowing the Service startup process, it is easier to analyze the Service timeout monitoring mechanism through the Service startup process.

1. Service startup process is as follows:

(1) ActiveServices. RealStartServiceLocked () through the app. The thread scheduleCreateService () to create a Service object and call the Service. The onCreate (), The sendServiceArgsLocked() method is then called to invoke other methods of the Service, such as onStartCommand. The above two steps are inter-process communication, and the inter-process communication between application and AMS can be referred to the application process and system process communication. (2) The above is just a list of the key steps of the Service startup process, and the specific work of each method needs to be checked. If you are interested, you can refer to Android development art Exploration and other related materials

2. Service timeout monitoring mechanism The Service timeout monitoring mechanism can be found in the Service startup process.

(1) ActiveServices. RealStartServiceLocked () main work

    private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException { ... // Start ANR monitoring before Service is started. bumpServiceExecutingLocked(r,execInFg, "create"); // The process calls the scheduleCreateService method, which eventually calls the service.oncreate method; App. Thread. ScheduleCreateService (r, r.s erviceInfo, / / bind process, This method will be called app. Thread. RequestServiceBindingsLocked scheduleBindService method (r,execInFg); // Other methods of mobilizing Service, such as onStartCommand, are also IPC communication sendServiceArgsLocked(r,execInFg, true);
    }Copy the code

(2) bumpServiceExecutingLocked () will call scheduleServiceTimeoutLocked () method

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return; } Message msg = mAm.mHandler.obtainMessage( ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; / / in serviceDoneExecutingLocked will remove the SERVICE_TIMEOUT_MSG news, / / when the timeout after still didn't remove SERVICE_TIMEOUT_MSG message, Execute the ActiveServices. ServiceTimeout () method. mAm.mHandler.sendMessageDelayed(msg, proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); // Run the Service in the foreground process, SERVICE_TIMEOUT=20s; Run Service in background process, SERVICE_BACKGROUND_TIMEOUT=200s}Copy the code

(3) if not serviceDoneExecutingLocked within a specified time () method will remove news, is called ActiveServices. ServiceTimeout () method

void serviceTimeout(ProcessRecord proc) { ... final long maxTime = now - (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); . // Find the Service that timed outfor (int i=proc.executingServices.size()-1; i>=0; i--) {
        ServiceRecord sr = proc.executingServices.valueAt(i);
        if (sr.executingStart < maxTime) {
            timeout = sr;
            break; }... }... // Determine if the process whose Service timed out is currently running the list of processes. If not, ignore the ANRif(timeout ! = null && mAm.mLruProcesses.contains(proc)) { anrMessage ="executing service "+ timeout.shortName; }...if(anrMessage ! AppNotResponding (proc, null, null) {// appNotResponding(proc, null, null) {// appNotResponding(proc, null, null,false, anrMessage); }}Copy the code

(4) The overall process of Service onCreate timeout monitoring is shown below

Before onCreate life cycle starts, start monitoring overtime, if there are no completed the specified time onCreate (the time-consuming tasks) in the method, is called ActiveServices. ServiceTimeout ANR () method of the report; If onCreate execution within a specified time to complete, you will call ActivityManagerService. ServiceDoneExecutingLocked () method removes SERVICE_TIMEOUT_MSG message, Service. OnCreate method does not occur ANR, Service is scheduled by AMS, using Handler and Looper, design a TIMEOUT message to be processed by AMS thread, the entire TIMEOUT mechanism is implemented in the Java layer. This is the overall flow of Service timeout monitoring.



Input event timeout monitoring

An application can receive input events (keystrokes, touch screens, trackballs, etc.) and raise an ANR if no processing is completed within 5 seconds.

Here’s the question: What steps does an input event go through before it gets sent to the application interface? How do I detect an input time processing timeout?

1. Introduction to the Android input System The following figure shows the overall process and participants of the Android input system.

In simple terms, the kernel writes the raw event to the device node, and InputReader continuously extracts the raw input event from EventHub in its thread loop, processes it, and puts the processed event into the dispatch queue of InputDispatcher. InputDispatcher takes the event from the dispatch queue in its thread loop, finds the appropriate window, and writes the event to the window’s event receiving pipe. The Looper of the window event receiving thread pulls the event out of the pipe and hands it to the window event handler for the event response. Key processes include: reading and processing of original input events; Distribution of input events; Input event sending, receiving, and feedback. Input event dispatch refers to the process in which InputDispatcher continuously takes out events from the dispatch queue and searches for appropriate Windows for sending. Input event dispatch refers to the process in which InputDispatcher sends events to Windows through Connection objects.

Interprocess communication between InputDispatcher and Windows is mainly accomplished through InputChannel. After InputDispatcher and window are connected through InputChannel, events can be sent, received and fed back. The main process of sending and receiving input events is shown in the figure:

Among them, the input event after injection of distributed queue, awakens distributing threads, thread cycle carried out by InputDispatcher. DispatchOnce function is complete; InputDispatcher after the event to write InputChannel InputMessage, the stars at the window awakened, and then perform NativeInputReceiver: : handleEvent () began to receive input events, Input events are sent to the user interface starting from InputEventReceiver; The above is only the general flow of input events, more detailed flow can refer to relevant materials; With an overview of the flow of the input system, let’s examine the time-out monitoring mechanism for input events.

2. Input event timeout monitoring The overall process of key event timeout monitoring is shown in the following figure


(1) InputDispatcher: : dispatchOnceInnerLocked () : Choose different events depending on the type of event processing method: InputDispatcher: : dispatchKeyLocked () or InputDispatcher: : dispatchMotionLocked (), we show the key event monitoring overtime, for example; (2) findFocusedWindowTargetsLocked () method will be called checkWindowReadyForMoreInputLocked (); This method checks whether the window is capable of receiving new input events; There are a number of scenarios that may prevent the event from being distributed, including:

Scenario 1: The window is paused and cannot handle the input event “Waiting because the [targetType] window is paused.”

Scenario 2: Windows are not registered with InputDispatcher, Waiting because the [targetType] window’s input channel is not registered with the input dispatcher. the Window may bein the process of being removed.”

Scenario 3: The window has been disconnected from InputDispatcher. Waiting because the [targetType] window’s input connection is [status]. The window may be in the The process of being removed.”

Scenario 4: InputChannel is saturated and cannot process new events “Waiting because the [targetType] window’s input channel is full. Outbound Queue length: %d. Wait queue length: %d.”

Scenario 5: For KeyEvent type input events, Waiting to send key event because the [targetType] window has not finished processing all of the input Queue length: % D. Wait queue length: % D.”

Scenario 6: Input events of the TouchEvent type can be immediately dispatched to the current window, since TouchEvents occur in the window that is currently visible to the user. In one case, however, if ANR occurs due to the current application having too many input events waiting to be dispatched, the TouchEvent event will need to be queued for dispatch. Waiting to send a non-key event because the %s window has not finished processing certain input events that were Delivered to it over %0.1fms ago. Wait queue length: %d. Wait queue head age: %0.1fms.”

These scenarios are the ANR cause prints that we often see in logs.

(3) The 5s limit of event distribution is defined in inputDispatcher.cpp; InputDispatcher: : handleTargetsNotReadyLocked () method if the event does not distribute within 5 s are finished, call the InputDispatcher: : onANRLocked () happen ANR prompt the user application;

// Default dispatch timeout is 5s const nsecs_t DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5000 * 1000000LL; int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime, const EventEntry* entry, const sp<InputApplicationHandle>& applicationHandle, const sp<InputWindowHandle>& windowHandle, nsecs_t* nextWakeupTime, const char* reason) { // 1. If there is no focus window, there is no focus applicationif (applicationHandle == NULL && windowHandle == NULL) {
        ...
    } else{// 2. Focus window or application with focusif(mInputTargetWaitCause ! = INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY) {// Get the wait time valueif(windowHandle ! = NULL) {// Focus window exists, DEFAULT_INPUT_DISPATCHING_TIMEOUT 5s timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);  }else if(applicationHandle ! = NULL) {// Focus application exists, Timeout = applicationHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT); }else{// The default dispatch timeout is 5s. Timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT; }}} / / if the current target waiting timeout time is greater than the input, namely when the timeout to / / currentTime ANR processing during the 5 s is the system of the current time, mInputTargetWaitTimeoutTime is a global variable,if(currentTime > = mInputTargetWaitTimeoutTime) {/ / call onANRLocked ANR processing (currentTime, applicationHandle windowHandle, entry->eventTime, mInputTargetWaitStartTime, reason); // The return needs to wait for processingreturnINPUT_EVENT_INJECTION_PENDING; }}Copy the code

(4) When the main thread of the application is stuck, other components of the application will not respond, because the event distribution is serial, and the next event will not be processed until the last event is completed. (5) activity. onCreate performs a time-consuming operation, no matter how the user operates, ANR will not occur, because the input event related listening mechanism has not been established; InputChannel does not respond to input events, InputDispatcher does not send events to application Windows, ANR listening mechanism is not established, so ANR is not reported at this time. (6) Input events are scheduled by InputDispatcher, and the input events to be processed will enter the queue and wait. A judgment of waiting timeout is designed, and the timeout mechanism is implemented in the Native layer. The above is the input event ANR monitoring mechanism; Specific logic please refer to the relevant source code;

ANR reporting mechanism

No matter what type of ANR happen later, eventually will call AppErrors. AppNotResponding () method, the so-called “all roads lead to Rome”. The function of this method is to report to the user or developer that ANR has occurred. In the end, a dialog pops up telling the user that an application is not responding. Enter a bunch of ANR-related logs for developers to troubleshoot.

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ...
        if(ActivityManagerService.MONITOR_CPU_USAGE) { // 1. Updated the CPU usage. ANR first CPU information sampling, the sampling data can be stored in this variable mProcessStats mService. UpdateCpuStatsNow (); } eventLog.writeEvent (EventLogTags.AM_ANR, app.userId, app.pid, app.processName, app.info.flags, eventLogtags.am_anr, app.userId, app.pid, app.processName, app.info. annotation); // Output ANR to main log.stringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(app.processName);
        if(activity ! = null && activity.shortComponentName ! = null) { info.append("(").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if(annotation ! = null) { info.append("Reason: ").append(annotation).append("\n");
        }
        if(parent ! = null && parent ! = activity) { info.append("Parent: ").append(parent.shortComponentName).append("\n"); } // 3. Print the call stack. Concrete implementation by dumpStackTraces () function to finish the File tracesFile = ActivityManagerService. DumpStackTraces (true, firstPids, (isSilentANR) ? null : processCpuTracker, (isSilentANR) ? null : lastPids, nativePids); String cpuInfo = null; // MONITOR_CPU_USAGE The default value istrue
        if(ActivityManagerService.MONITOR_CPU_USAGE) { // 4. Updated the CPU usage. ANR's second CPU usage information sampling. Two sampling data correspond ANR happen before and after the CPU usage mService. UpdateCpuStatsNow (); Synchronized (mService mProcessCpuTracker) {/ / output ANR happen within a period of time before the CPU usage of each process cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime); } / / output CPU load info. Append (processCpuTracker. PrintCurrentLoad ()); info.append(cpuInfo); } / / output ANR happen after a period of time the CPU usage of each process info. Append (processCpuTracker. PrintCurrentState (anrTime)); Slog.e(TAG, info.tostring ()); slog.e (TAG, info.tostring ());if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit// sendSignal 3(SIGNAL_QUIT) to dump stack information process. sendSignal(app.pid, Process. } / / will anr information as well as the output to DropBox mService. AddErrorToDropBox (" anr, "app, the app processName, activity, the parent, the annotation, cpuInfo. tracesFile, null); // Bring up the infamous App Not Responding dialog // 5. The ANR dialog box is displayed. Throws the SHOW_NOT_RESPONDING_MSG message, // ams.mainHandler will handle this message, ANR Message MSG = message.obtain (); HashMap
      
        map = new HashMap
       
        (); msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG; msg.obj = map; msg.arg1 = aboveSystem ? 1:0; map.put("app", app); if (activity ! = null) { map.put("activity", activity); } mService.mUiHandler.sendMessage(msg); }}
       ,>
      ,>Copy the code

In addition to the principal logic, various types of logs are also generated when ANR occurs: Event log: The main log of the application where ANR occurs can be found by searching the keyword “AM_ANr” : Dropbox: query “ANR in” to find information about ANR, which contains CPU usage in the context of the log. Dropbox: Query “ANR” to find information about ANR

Now that the ANR report is complete, you need to analyze ANR issues, usually starting with CPU usage in the main log and function call stacks in traces. Therefore, the updateCpuStatsNow() method and the dumpStackTraces() method are key to reporting ANR problems. For details about ANR problems, refer to related documents.

conclusion

1. ANR monitoring mechanism: the first analysis of Service and input events roughly workflow, and then from Service, InputEvent two different ANR monitoring mechanism source code implementation, analysis of Android how to find all kinds of ANR. Embedded timeout detection is used to discover ANR during service startup and input event distribution. 2. ANR reporting mechanism: Analyze how Android outputs ANR logs. When ANR is discovered, two important types of log output are CPU usage and the function call stack of the process. These two types of log are useful tools for solving ANR problems. 3. The core principles of monitoring ANR are message scheduling and timeout processing. 4. ANR reports and ANR prompt boxes are displayed only for scenarios monitored by ANR.

The resources

ANR mechanism and problem analysis understand the trigger principle of Android ANR in-depth understanding of Android volume three (Android input system) Android development art explore Android source code

Finally, thank the authors of the articles referenced in this article.

The original link: https://www.jianshu.com/p/ad1a84b6ec69

This article is not easy, if you like this article, or helpful to you hope you more, like, forward, follow oh. The article will be updated continuously. Absolutely dry!!