The preface explains the ANR mechanism from the perspective of source code
Read the
- Type 1: Component scheduling The following uses Service startup as an example
- The second type: touch event event distribution
- conclusion
Type 1: Component scheduling The following uses Service startup as an example
If you’re familiar with the component startup process, when you start a Service, you end up executing the realStartServiceLocked() method in ActiveServices, In detail, we look at the four components of Android Service, Service causes ANR.
Based on API28 / / com. Android. Server. Am. ActiveServices. Java private final void realStartServiceLocked (ServiceRecord r, ProcessRecord app , boolean execInFg) throws RemoteException { ... / / 1. Delay sending messages (SERVICE_TIMEOUT_MSG) bumpServiceExecutingLocked (r, execInFg, "create"); //2. Create a Service object. And invoke onCreat (app). Thread. ScheduleCreateService (r, r.serviceInfmAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState); . }Copy the code
Continue to look at
//com.android.server.am.ActiveServices.java private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg , String why) { ... / / 3. The scheduling of AMS handler scheduleServiceTimeoutLocked of state Richard armitage (pp); . } void scheduleServiceTimeoutLocked(ProcessRecord proc) { if (proc.executingServices.size() == 0 || proc.thread == null) { return; } long now = SystemClock.uptimeMillis(); Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (now + SERVICE_TIMEOUT) : (now SERVICE_BACKGROUND_TIMEOUT)); } static final int SERVICE_TIMEOUT = 20 * 1000; static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;Copy the code
A delay message is sent through mam. mHandler, where mAm is ActivityManagerService. The delay time is divided according to the current process status
//com.android.server.am.ActivityManagerService.java final class MainHandler extends Handler { public MainHandler(Looper looper) { super(looper, null, true); } @Override public void handleMessage(Message msg) { ... Case SERVICE_TIMEOUT_MSG: {//4 ActiveServices mServices.serviceTimeout((ProcessRecord)msg.obj); } break; . }}Copy the code
Back to ActiveServices
//com.android.server.am.ActiveServices.java void serviceTimeout(ProcessRecord proc) { String anrMessage = null; . //5. Join anrMessage if (anrMessage! = null) { mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage); }}Copy the code
ASM collects the error information aggregation class AppErrors. ANR caused by other components will also enter here, so it is explained here. There are several important issues to pay attention to. 1. What information does the ANR system provide me? 2. Where are they stored? 3. Can these information help us solve the ANR information? The next article will answer these questions
//com.android.server.am.AppErrors.java
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
//6.保存最近执行的进程号
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
//7.记录发生ANR的时间
long anrTime = SystemClock.uptimeMillis();
if (ActivityManagerService.MONITOR_CPU_USAGE) {
//8.第一次更新CPU的状态
mService.updateCpuStatsNow();
}
...
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
// Log the ANR to the event log. 在系统日志里打印ANR
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
// Don't dump other PIDs if it's a background ANR //9.后台ANR不dump其他进程
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
if (!isSilentANR) {
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) {
parentPid = parent.app.pid;
}
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
//10.将最近使用的进程pid添加到firstPids和lastPids集合中
for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mService.mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(pid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
// Log the ANR to the main log.11.记录ANR信息到system日志中
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// don't dump native PIDs for background ANRs unless it is the process of interest
//12.此处结合注释9可以看到 如果是后台进程ANR,是不会有其他进程信息及其他系统服务进程信息的
String[] nativeProcs = null;
if (isSilentANR) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
nativeProcs = new String[] { app.processName };
break;
}
}
} else {
nativeProcs = NATIVE_STACKS_OF_INTEREST;
}
//13.此处下面有详细进程说明
int[] pids = nativeProcs == null ? null :
Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<Integer>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
//14.调用AMS的dumpStackTraces记录ANR日志到trace文件中 这里值得我们深究 这里都记录哪些信息,那么当发生ANR时都应该是我们该考虑的方向
File tracesFile = ActivityManagerService.dumpStackTraces(
true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
//再次更新CPU的状态
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
//记录第一次 CPU的信息
info.append(cpuInfo);
}
//记录第二次CPU的信息
info.append(processCpuTracker.printCurrentState(anrTime));
//记录ANR信息到system日志中
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit's threads to the log
//如果没有生成trace文件,则发送SIGNAL_QUIT信号
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
StatsLog.write(StatsLog.ANR_OCCURRED, app.uid, app.processName,
activity == null ? "unknown": activity.shortComponentName, annotation,
(app.info != null) ? (app.info.isInstantApp()
? StatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE
: StatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE)
: StatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE,
app != null ? (app.isInterestingToUserLocked()
? StatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND
: StatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND)
: StatsLog.ANROCCURRED__FOREGROUND_STATE__UNKNOWN);
mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
if (mService.mController != null) {
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
synchronized (mService) {
mService.mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
if (isSilentANR) {
app.kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
// 15.通知系统显示应用未响应的Dialog
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
Copy the code
Note 13 refers to the recording of native process information, that is, the main system services
``` // Which native processes to dump into dropbox's stack traces public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] { "/system/bin/audioserver", "/system/bin/cameraserver", "/system/bin/drmserver", "/system/bin/mediadrmserver", "/system/bin/mediaserver", "/system/bin/sdcard", "/system/bin/surfaceflinger", "media.extractor", // system/bin/mediaextractor "media.metrics", // system/bin/mediametrics "media.codec", / / vendor/bin/hw/[email protected] com. Android. "bluetooth", / / bluetooth service "statsd." // Stats daemon }; ` ` ` here are to be sent ANR plays the window message ` ` `. / / com. Android Java server. Am. ActivityManagerService. The final class UiHandler extends Handler { public UiHandler() { super(com.android.server.UiThread.get().getLooper(), null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { ... case SHOW_NOT_RESPONDING_UI_MSG: { mAppErrors.handleShowAnrUi(msg); ensureBootCompleted(); } break; Void handleShowAnrUi(Message MSG) {dialog dialogToShow = null; synchronized (mService) { AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj; final ProcessRecord proc = data.proc; . } // If we've created a crash dialog, show it without the lock held if (dialogToShow ! = null) { dialogToShow.show(); }} ` ` `Copy the code
Just as the name of the method implies, the above series of operations are to plant ANR bombs and detonate bombs, so when to remove the bombs? In fact, each lifecycle method of the Service is debombed after it completes the call, as follows:
``` //com.android.server.am.ActiveServices.java private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) { if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r + ": nesting=" + r.executeNesting + ", inDestroying=" + inDestroying + ", app=" + r.app); else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING, "<<< DONE EXECUTING " + r.shortName); // the value is ++ for each life cycle method invocation of the service, r.xecutenesting --; if (r.executeNesting <= 0) { if (r.app ! = null) { ... / / remove the timeout (ANR) news mAm. MHandler. RemoveMessages (ActivityManagerService. SERVICE_TIMEOUT_MSG, of state Richard armitage pp); . } ` ` `Copy the code
The Anr mechanism of BroadCastReceiver is similar to that of BroadCastReceiver, which contains ContentProvider. That is, executing an infinite loop in the Activity’s onCreate method does not cause an ANR, it just blocks the main thread. For BroadCastReceiver, ContentProvider; BroadCastReceiver: The onReceive execution time of BroadCastReceiver exceeds 10 seconds, and the ANR of BroadCastReceiver exceeds 60 seconds. ContentProvider:publish no ANR within 10s;
The second type: touch event event distribution
As long as the user does not generate input, the UI does not actually “ANR”. If the user clicks on the APP, the APP just encounters the class bottleneck, and the normal user behavior will definitely be poking the screen randomly, which will inevitably produce input events.
In this way, when the underlying InputDisptcher dispatches to the current InputChannel InputEvent, the table records the wait timeout of the distribution time, notifying the AMS reported by the upper-layer InputManagerService to easily catch the ANR. But InputManager is different, so let’s take a look at it;
InputDispatcher dispatches KeyEvent and MotionEvent with dispatchKeyLocked and dispatchMotionLocked respectively:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
bool InputDispatcher::dispatchKeyLocked(nsecs_t currentTime, KeyEntry* entry,
DropReason* dropReason, nsecs_t* nextWakeupTime) {
...
// Identify targets.
Vector<InputTarget> inputTargets;
int32_t injectionResult = findFocusedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime);
if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
return false;
}
setInjectionResultLocked(entry, injectionResult);
if (injectionResult != INPUT_EVENT_INJECTION_SUCCEEDED) {
return true;
}
addMonitoringTargetsLocked(inputTargets);
// Dispatch the key.
dispatchEventLocked(currentTime, entry, inputTargets);
return true;
}
bool InputDispatcher::dispatchMotionLocked(
nsecs_t currentTime, MotionEntry* entry, DropReason* dropReason, nsecs_t* nextWakeupTime) {
// Preprocessing.
...
int32_t injectionResult;
if (isPointerEvent) {
// Pointer event. (eg. touchscreen)
injectionResult = findTouchedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime, &conflictingPointerActions);
} else {
// Non touch event. (eg. trackball)
injectionResult = findFocusedWindowTargetsLocked(currentTime,
entry, inputTargets, nextWakeupTime);
}
if (injectionResult == INPUT_EVENT_INJECTION_PENDING) {
return false;
}
...
dispatchEventLocked(currentTime, entry, inputTargets);
return true;
}
Copy the code
Passes through findFocusedWindowTargetsLocked Keyevent, for example, here, in the middle will monitor whether the current window is waiting for more input:
// frameworks/native/services/inputflinger/InputDispatcher.cpp int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime, const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) { int32_t injectionResult; String8 reason; . // Check whether the window is ready for more input. reason = checkWindowReadyForMoreInputLocked(currentTime, mFocusedWindowHandle, entry, "focused"); if (! reason.isEmpty()) { injectionResult = handleTargetsNotReadyLocked(currentTime, entry, mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.string()); goto Unresponsive; }... Failed: Unresponsive: nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime); updateDispatchStatisticsLocked(currentTime, entry, injectionResult, timeSpentWaitingForApplication); return injectionResult; }Copy the code
Then enter the handleTargetsNotReadyLocked to monitor whether the last input event timeout:
// frameworks/native/services/inputflinger/InputDispatcher.cpp int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime, const EventEntry* entry, const sp<InputApplicationHandle>& applicationHandle, const sp<InputWindowHandle>& windowHandle, nsecs_t* nextWakeupTime, const char* reason) { ... if (currentTime >= mInputTargetWaitTimeoutTime) { onANRLocked(currentTime, applicationHandle, windowHandle, entry->eventTime, mInputTargetWaitStartTime, reason); // Force poll loop to wake up immediately on next iteration once we get the // ANR response back from the policy. *nextWakeupTime = LONG_LONG_MIN; return INPUT_EVENT_INJECTION_PENDING; } else { // Force poll loop to wake up when timeout is due. if (mInputTargetWaitTimeoutTime < *nextWakeupTime) { *nextWakeupTime = mInputTargetWaitTimeoutTime; } return INPUT_EVENT_INJECTION_PENDING; }}Copy the code
When there is a timeout a command is posted through onANRLocked and an ANR event is sent to InputManager in the event of a Looper event. The algorithm of timeout is as follows:
// Default input dispatching timeout if there is no focused application or paused window // from which to determine an appropriate dispatching timeout. const nsecs_t DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5000 * 1000000LL; // 5 sec // frameworks/native/services/inputflinger/InputDispatcher.cpp if (windowHandle ! = NULL) { timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT); } else if (applicationHandle ! = NULL) { timeout = applicationHandle->getDispatchingTimeout( DEFAULT_INPUT_DISPATCHING_TIMEOUT); } else { timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT; }Copy the code
OnANRLocked is called and enqueue a Command in CommandQueue:
// frameworks/native/services/inputflinger/InputDispatcher.cpp
void InputDispatcher::onANRLocked(
nsecs_t currentTime, const sp<InputApplicationHandle>& applicationHandle,
const sp<InputWindowHandle>& windowHandle,
nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
...
CommandEntry* commandEntry = postCommandLocked(
& InputDispatcher::doNotifyANRLockedInterruptible);
commandEntry->inputApplicationHandle = applicationHandle;
commandEntry->inputWindowHandle = windowHandle;
commandEntry->reason = reason;
}
Copy the code
Here, the notifyANR function of mPolicy is called, where mPolicy is NativeInputManager. By now it is the JNI layer and will be the Java layer, as follows
// frameworks/base/service/core/jni/com_android_server_input_InputManagerService.cpp nsecs_t NativeInputManager::notifyANR(const sp<InputApplicationHandle>& inputApplicationHandle, const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) { ATRACE_CALL(); JNIEnv* env = jniEnv(); jobject inputApplicationHandleObj = getInputApplicationHandleObjLocalRef(env, inputApplicationHandle); jobject inputWindowHandleObj = getInputWindowHandleObjLocalRef(env, inputWindowHandle); jstring reasonObj = env->NewStringUTF(reason.string()); jlong newTimeout = env->CallLongMethod(mServiceObj, gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj, reasonObj); . return newTimeout; }Copy the code
The notifyANR method in the Java layer InputManagerService is as follows:
// com.android.server.input.InputManagerService.java
// Native callback.
private long notifyANR(InputApplicationHandle inputApplicationHandle,
InputWindowHandle inputWindowHandle, String reason) {
return mWindowManagerCallbacks.notifyANR(
inputApplicationHandle, inputWindowHandle, reason);
}
Copy the code
// frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
public boolean inputDispatchingTimedOut(final ProcessRecord proc,
final ActivityRecord activity, final ActivityRecord parent,
final boolean aboveSystem, String reason) {
...
final String annotation;
if (reason == null) {
annotation = "Input dispatching timed out";
} else {
annotation = "Input dispatching timed out (" + reason + ")";
}
if (proc != null) {
...
mHandler.post(new Runnable() {
@Override
public void run() {
mAppErrors.appNotResponding(proc, activity, parent, aboveSystem, annotation);
}
});
}
return true;
}
Copy the code
Then come to AppErrors appNotResponding function, as well as the collection, pop-up ANR bounced.
conclusion
- There are several scenarios for component scheduling
- SERVICE_TIMEOUT Indicates the foreground Service timeout period: 20 seconds
- SERVICE_BACKGROUND_TIMEOUT Indicates the timeout period of background Service: 10 SERVICE_TIMEOUT=200 seconds.
- BROADCAST_FG_TIMEOUT Foreground broadcast timeout: 10 seconds
- BROADCAST_BG_TIMEOUT Background broadcast timeout period: 60 seconds
- Event distribution is
- KEY_DISPATCHING_TIMEOUT or DEFAULT_INPUT_DISPATCHING_TIMEOUT The timeout period of the input event is 5 seconds.
Reference links:
Android ANR source code analysis