As your project gets more complex, or as your user base gets to an order of magnitude, and your code accidentally goes wrong, you’ll get all kinds of bugs, including ANR. This paper will explain in detail the types of ANR, the causes of occurrence, detailed analysis of ANR cases, and classic cases.

define

ANR(Application Not Responding) No response from the Application. If your application is blocked in the UI thread for too long, ANR will occur, and usually ANR will occur, and a prompt will pop up to let the user know that the application is being blocked and whether to continue to wait or close.

ANR type

ANR occurs in the following types: 1:KeyDispatchTimeout (common) Input Events are not processed within 5 seconds. ANR occurs. Logcat Log Keyword: Input Event Dispatching Out

2:BroadcastTimeout Foreground Broadcast: ANR occurs when the onReceiver does not complete processing within 10 seconds. Background Broadcast: ANR occurs when onReceiver does not complete processing within 60 seconds. Logcat Log keyword: Timeout of Broadcast BroadcastRecord

3:ServiceTimeout Front-end Service: onCreate, onStart, and onBind The ANR occurs if the processing is not complete within 20 seconds. Background Service: onCreate, onStart, and onBind Are not processed within 200 seconds. ANR logcat log Keyword: Timeout Executing Service

4: ContentProviderTimeout The ContentProvider does not complete the ANR process within 10S. Logcat Log keyword: timeout Publishing Content providers

Why ANR appears

1. The main thread frequently performs time-consuming I/O operations, such as reading and writing database operations. 2. 3: Main thread is blocked by Binder opposite end; 4: ANR is displayed in the WatchDog of the System Server. 6. System resources (pipes, CPUS, I/O) are exhausted.

ANR case analysis process

We will walk through the ANR step by step, as we gain a better understanding of how to find, analyze, and solve problems.

Check events_log

View events_log in the mobilelog folder and search for am_anr in the log to find the time when ANR occurs, process PID, and ANR type.

Such as log:

07-20 15:36:36.472 1000 1520 1597 I am_anr: [0, 0148 com. XXXX. Moblie, 952680005, Input dispatching timed out (AppWindowToken {da8f666 token = token {5501 f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes  starting up.)]Copy the code

A KeyDispatchTimeout ANR (process number 1480) was generated when com.xxxx.moblie was applied at 07-20 15:36:36.472. ANR Time: 07-20 15:36:36.472 Process PID: 1480 Process name: com.xxxx.moblie ANR type: KeyDispatchTimeout

We already know that the ANR of KeyDispatchTimeout occurs because the input event is not processed within 5 seconds. So what does the program do in the first 5 seconds of this time period (15:36:30 ~15:36:31)? This is easy because we already know pid, so let’s search the log where PID = 1480. These logs represent the running path of the process. Key logs are as follows:

07-20 15:36:29.749 10102 1480 1737 D Moblie-Application: [Thread:17329] receive an intent from server, RECEIVE_MESSAGE 07-20 15:36:30.136 10102 1480 1737 D Moblie-Application: Receiving an empty message, drop 07-20 15:36:35.791 10102 1480 1766 I Adreno: QUALCOMM build: 9c9b012, I92eb381bc9 07-20 15:36:35.791 10102 1480 1766 I Adreno: Build Date: 12/31/17 07-20 15:36:35.791 10102 1480 1766 I Adreno: OpenGL ES Shader Compiler Version EV031.22.00.01 07-20 15:36:35.791 10102 1480 1766 I Adreno: Local Branch: 07-20 15:36:35.791 10102 1480 1766 I Adreno: Remote Branch: Refs/tags/AU_LINUX_ANDROID_LA. UM. 6.4 R1.08.00.00.309.049 07-20 15:36:35. 791 10102 1480 1766 I Adreno: Remote Branch: Reconstruct Branch (s) : Bread crumbs (碎 屑) (Reconstruct Branch) NOTHING 07-20 15:36:35.826 10102 1480 1766 I VNDKSupport: SPhal namespace is not configuredforThis process. The Loading/vendor/lib64 / hw/gralloc msm8998. So the from the current namespace home. 07-20 15:36:36. 682, 10102 1480 1480 W ViewRootImpl[MainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_PERIOD, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=16099429, downTime=16099429, deviceId=-1,source=0x101 }
Copy the code

From the above we can see that the program received an action message at time 07-20 15:36:29.749.

07-20 15:36:29.749 10102 1480 1737 D Moblie-Application: [Thread:17329] Receive an intent from the server, action= com.ttt.push.receive_message.Copy the code

Moblie received a push message (com.ttt.push.receive_message) which blocked the application. Action =com.ttt.push.RECEIVE_MESSAGE is blocked at time 07-20 15:36:29.749 and ANR of KeyDispatchTimeout occurs 5 seconds later.

Although we know how it started, we haven’t found out the specific reason yet. Is it possible that the CPU was very tight at that time and various apps occupied resources again? Let’s look at the CPU information. Search keyword Keyword: ANR IN

07-20 15:36:58.711  1000  1520  1597 E ActivityManager: ANR inCom. XXXX. Moblie (com. XXXX. Moblie /. UI. MainActivity) (process name) 07-20 15:36:58. 711, 1000, 1520, 1597 E ActivityManager: PID: 1480 (process PID) 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Reason: Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes Starting up.) 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Load: 0.0/0.0/0.0 (Load indicates 1 minute,5 minutes,15 minutes of CPU Load) 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: CPU usage from 20ms to 20286ms later (2018-07-20 15:36:36.170 to 2018-07-20 15:36:56.436): 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 42% 6774/pressure: 41% User + 1.4% kernel/faults: 168 minor 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 34% 142/ kSWAPd0: 711 1000 1520 1597 E ActivityManager: 31% 1520/system_server: 07-20 15:36:58.711 1000 1520 1597 E Faults: 13%, 29901 / com. Ss. Android. Article. News: 7.7% user + 6% kernel/faults: Minor 2446 major 07-20 15:36:58. 56007 711 1000 1520 1597 E ActivityManager: 13% 32638 / com. Android. Quicksearchbox: 07-20 15:36:58.711 1000 1520 1597 E Faults: 1480/com.xxxx.moblie: 5.0% user + 6.0% kernel/faults: 76401 Minor 2422 Major 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 8.0% 21000/kworker/u16:12: 711 1000 1520 1597 E ActivityManager: 0.8% 724/ MTD: 1561 Minor 07-20 15:36:58.711 1000 1520 1597 E faults: 711 1000 1520 1597 E ActivityManager: /kworker/ u16:8:0% user + 8% kernel 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Kernel 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1%30656 /kworker/ u16:14:0% user + 7.1%kernel 07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1% 9998/kworker/ U16:4:0% user + 7.1% kernelCopy the code

I’ve marked the relevant meaning in the log. Com.xxxx. moblie takes up 11% of the CPU, which is not much. Today’s mobile phones are basically multi-core CPUS. If your CPU is 4 cores, the upper limit is 400%, and so on.

Since it’s not the CPU load, what is it? At this point, it’s time to look at our final kill – testamp.txt.

Traces.txt log analysis

TXT file :/data/anr/ tetrace.txt file :/data/anr/ tetrace.txt file :/data/anr/ tetrace.txt The traces file is used to retrieve thread names, stack information, thread status, binder calls, etc. Trace. TXT trace: Cmd line:com.xxxx.moblie

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x73bcc7d0 self=0x7f20814c00
  | sysTid=20176 nice=-10 cgrp=default sched=0/0 handle=0x7f251349b0
  | state=R schedstat=( 0 0 0 ) utm=12 stm=3 core=5 HZ=100
  | stack=0x7fdb75e000-0x7fdb760000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held) // Java stack call information, you can view the call relationship, Positioning to a specific location at TTT. Push. InterceptorProxy. AddMiuiApplication ats (InterceptorProxy. Java: 77) ttt.push.InterceptorProxy.create(InterceptorProxy.java:59) at android.app.Activity.onCreate(Activity.java:1041) at miui.app.Activity.onCreate(SourceFile:47) at com.xxxx.moblie.ui.b.onCreate(SourceFile:172) at com.xxxx.moblie.ui.MainActivity.onCreate(SourceFile:68) at android.app.Activity.performCreate(Activity.java:7050) at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1214) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2807) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2929) at android.app.ActivityThread.-wrap11(ActivityThread.java:-1) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1618)
  at android.os.Handler.dispatchMessage(Handler.java:105)
  at android.os.Looper.loop(Looper.java:171)
  at android.app.ActivityThread.main(ActivityThread.java:6699)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
Copy the code

I’ll parse some of the testamp.txt fields in detail to see what it can tell us. Main: This is the main Thread, and if it is a Thread, it will be named “thread-x”, where X indicates the Thread ID, incrementing it. Tid: tid is not the id of a thread but the unique id of a thread. Group: indicates the name of the thread group. SCount: indicates the number of times that the thread is suspended. SysTid: indicates the thread id (the thread id of the main thread is the same as the process ID) nice: indicates the scheduling priority of the thread. Sched: indicates the scheduling policy and priority of the thread. CGRP: indicates the scheduling group. Schedstat: read from /proc/[pid]/task/[tid]/schedstat. The three values represent the time a thread executes on the CPU, the wait time of a thread, and the length of the slice of time a thread executes. Utm: indicates the time used in user mode (unit: jiffies). STM: indicates the scheduling time in kernel mode. Core: indicates the serial number of the CPU core that last executes this thread.

The Java stack information is what we care most about, and it helps us pinpoint where. Traces from above, we can judge the TTT. Push. InterceptorProxy. AddMiuiApplicationInterceptorProxy. Java: 77 led to the com. XXXX. Moblie ANR happened. At this point you can look at the source code, find the problem, and fix it.

To summarize the analysis process, first we search for am_ANR, find the time at which ANR occurred, the process PID, and the type of ANR, and then search for the PID, looking for the first 5 seconds or so of the log. Filter ANR IN to view CPU information, then view tetrace.txt, find the Java stack information to locate the code location, and finally view the source code, analysis and solve the problem. This process can basically figure out the context of ANR.

ANR case review

The main thread is locked by another thread, resulting in a deadlock

waiting on <0x1cd570> (a android.os.MessageQueue)

DALVIK THREADS:
"main" prio=5 tid=3 TIMED_WAIT
  | group="main" sCount=1 dsCount=0 s=0 obj=0x400143a8
  | sysTid=691 nice=0 sched=0/0 handle=-1091117924
  at java.lang.Object.wait(Native Method)
  - waiting on <0x1cd570> (a android.os.MessageQueue)
  at java.lang.Object.wait(Object.java:195)
  at android.os.MessageQueue.next(MessageQueue.java:144)
  at android.os.Looper.loop(Looper.java:110)
  at android.app.ActivityThread.main(ActivityThread.java:3742)
  at java.lang.reflect.Method.invokeNative(Native Method)
  at java.lang.reflect.Method.invoke(Method.java:515)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:739)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:497)
  at dalvik.system.NativeStart.main(Native Method)

"Binder Thread #3" prio=5 tid=15 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x434e7758
  | sysTid=734 nice=0 sched=0/0 handle=1733632
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #2" prio=5 tid=13 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x1cd570
  | sysTid=696 nice=0 sched=0/0 handle=1369840
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #1" prio=5 tid=11 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x433aca10
  | sysTid=695 nice=0 sched=0/0 handle=1367448
  at dalvik.system.NativeStart.run(Native Method)

----- end 691 -----
Copy the code

The main thread does time-consuming operations, such as database reads and writes.

"main" prio=5 tid=1 Native
held mutexes=
kernel: (couldn't read /proc/self/task/11003/stack)
native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)
native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)
native: #02 pc 00009cab /system/lib/libsqlite.so (???)
native: #03 pc 00011119 /system/lib/libsqlite.so (???)
native: #04 pc 00016455 /system/lib/libsqlite.so (???)
native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)
native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)
native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)
native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)
at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)
at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)
locked <0x0db193bf> (a java.lang.Object)
at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)
at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)
at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)
at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)
at com.xxxx.video.common.data.DataBaseORM.<init>(DataBaseORM.java:46)
at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)
locked <0x017095d5> (a java.lang.Class<com.xxxx.video.common.data.DataBaseORM>)
Copy the code

3. Large amount of binder data

07-21 04:43:21.573 1000 1488 12756 E Binder: Unreasonably large Binder Reply Buffer: on android.content.pm.BaseParceledListSliceThe $1@770C74f Calling 1 size 388568 (data: 1, 32, 7274595) 07-21 04:43:21.573 1000 1488 12756 E Binder: Android.util.log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSliceThe $1@770C74f Calling 1 size 388568 (data: 1, 32, 7274595) 07-21 04:43:21.607 1000 1488 2951 E Binder: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSliceThe $1@770C74f Calling 1 size 211848 (data: 1, 23, 7274595) 07-21 04:43:21.607 1000 1488 2951 E Binder: Android.util.log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSliceThe $1@770C74f Calling 1 size 211848 (data: 1, 23, 7274595) 07-21 04:43:21.662 1000 1488 6258 E Binder: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSliceThe $1@770c74f calling 1 size 259300 (data: 1, 33, 7274595)
Copy the code

4. Communication failure with Binder

07-21 06:04:35. 580 < 6 > [32837.690321] binder: 1698:2362 Transaction Failed 29189/-3, SIZE 100-0 line 3042 07-21 06:04:35.594 <6>[32837.704042] Binder: 1765:4071 Transaction Failed 29189/-3, SIZE 76-0 line 3042 07-21 06:04:35.899 <6>[32838.009132] Binder: 1765:4067 Transaction Failed 29189/-3, SIZE 224-8 line 3042 07-21 06:04:36.018 <6>[32838.128903] Binder: 1765:2397 transaction failed 29189/-22, size 348-0 line 2916Copy the code

recommended

Latest source code summary: weekly share the new open source code, renderings, more intuitive.