An overview,

Solving ANR has always been an important skill for Android developers to master, and it usually starts with three aspects.

Development stage: check the time and lag of each method through the tool, and modify one by one.

Online phase: This phase relies on monitoring tools such as Matrix to detect ANR and report it.

Analysis: If an ANR occurs online and you get a log, this is where this article comes in — ANR log analysis tips.

2. ANR generation mechanism

A popular interview answer on the Internet

ANR — The application does not respond. 5 seconds for Activity, 10 seconds for BroadCastReceiver, and 20 seconds for Service.

In order to further analyze and locate ANR, we need to know more knowledge. Generally speaking, ANR can be divided into four types according to its generation mechanism:

2.1 Input Event Timeout (5s)

InputEvent Timeout

A.I nputDispatcher sends key events to a Focused Window of the process that is not present, is suspended, is occupied, is unregistered, is abnormal, or is not processed within 5 seconds. ANR B.INputDispatcher sends a MotionEvent event with the exception of when there is more than one input waitQueue for the Touched Window0.InputDispatcher will pause the event and wait for 5s. If the window 'finish' event is still not received, ANR C will be triggered. ANR will not be fired until the next event arrives and a timeout event is foundCopy the code

2.2 Broadcast Timeout (foreground 15s, background 60s)

BroadcastReceiver Timeout

A. Statically registered and ordered broadcasts are ANR. Dynamically registered and ordered broadcasts are not ANR b. If the process does not exist, it is created. The creation time is counted as the timeout period c. The ANR dialog box will be displayed only when the process has the Activity displayed in the foreground. Otherwise, the current process D will be directly killed. ANR e is generated when the onReceive execution exceeds the threshold (15s in the foreground and 60s in the background). How to send radio reception: Intent. AddFlags (Intent. FLAG_RECEIVER_FOREGROUND)Copy the code

2.3 Service Timeout (front-end 20s, back-end 200s)

Service Timeout

Each of the following methods in a.service triggers ANR: onCreate(),onStartCommand(), onStart(), onBind(), onRebind(), onTaskRemoved(), onUnbind(), onDestroy(). b. The timeout duration of the foreground Service is 20 seconds, and that of the background Service is 200 seconds. How to distinguish between foreground and background execution ———— The current APP is in user mode, and the Service is foreground execution. D. User mode: Foreground activity, foreground broadcast, and foreground service are implementedCopy the code

2.4 ContentProvider type

A. CoronTentProvider does not ANR b when creating a release timeout. Use ContentProviderclient to visit ContentProverder can choose on their own will trigger ANR, timeout your client. SetDetectNotResponding (PROVIDER_ANR_TIMEOUT);Copy the code

Ps: Does an ANR occur when the Activity lifecycle times out? — Not according to tests.

override fun onCreate(savedInstanceState: Bundle?) {
       Thread.sleep(60000)
       super.onCreate(savedInstanceState)
       setContentView(R.layout.activity_main)
   }
Copy the code

3. Causes of ANR

Many developers believe that it is time-consuming operations that cause ANR and that it is all a problem at the application layer. In fact, most OF the ANR in an online environment is due to system reasons.

3.1 Application Layer Causes ANR (Time-consuming Operation)

A. Function blocking: such as infinite loop, main thread IO, processing big data b. Lock error: main thread waiting for the child thread lock c. Tight memory: The memory allocated to an application has a maximum limit. If the memory is tight for a long time, frequent memory swapping will occur and some application operations will time outCopy the code

3.2 ANR caused by the System

A. CPU preemption: Generally speaking, you are playing a game in the foreground, which may cause your background broadcast to be preempted. System services, such as obtaining system contacts, cannot respond in a timely manner. System services are all Binder mechanisms. Service capabilities are limited. Other applications occupy a large amount of memoryCopy the code

Analyze logs

When ANR occurs, the system generates an ANR log file (the name of the file in the /data/ ANR directory of the mobile phone may vary from manufacturer to manufacturer, but it is mostly called trace file in the industry), which contains the following important information.

4.1 CPU load

Load: 2.62 / 2.55 / 2.25
CPU usage from 0ms to 1987ms later (2020-03-10 08:31:55.169 to 2020-03-10 08:32:17.156): 41% 2080/system_server: 28% user + 12% kernel / faults: 76445 minor 180 major 26% 9378/com.xiaomi.store: 20% user + 6.8% kernel/Faults: 68408 minor 68 major........ Omit line N..... 66% TOTAL: 20% user + 15% kernel + 28% iowait + 0.7% IRq + 0.7% softirqCopy the code

As shown above:

  • Line 1: Average number of active processes that are using and waiting to use the CPU in 1, 5, and 15 minutes

  • Line 2: indicates that the load information is captured 0~1987ms after ANR occurs. It also indicates the time point of ANR: 2020-03-10 08:31:55.169

  • Middle: details of the CPU used by each process

  • Last line: total CPU usage of each process.

Noun explanation:

A. user: user mode; kernel: kernel mode B. Faults: page shortage; minor; Major: Major; data needs to be taken from disk C. ioWait: RATIO of I/O usage D. IRq: hard interrupts; SoftirQ: soft interruptsCopy the code

Note:

  • The high ioWAIT ratio means that there is a high probability that the ANR is caused by the TIME spent on THE I/O. Check to see if the process has more faults major.

  • The load of a single-process CPU is not limited to 100%, but to several hundred percent for several cores, such as 400% for four cores.

4.2 Memory Information

Total number of allocations 476778Allocated Tokens Allocated Objects Total Bytes Allocated memory Total memory allocated by the process Total Bytes Freed 52MB Allocated memory Total memory allocated by the process 777KB Memory Free memory Until GC 777KBGC Free Memory Until OOME 383MBOOM Total memory Total Memory (used + Available) Max Memory 384MB Maximum memory that a process can apply forCopy the code

**Free memory until the value of OOME ** is very small. The application may be using too much memory.

In addition, in addition to trace files, ordinary eventlogs also contain memory information (not necessarily printed).

04-02 22:00:08.195  1531  1544 I am_meminfo: [350937088.41086976.492830720.427937792.291887104]
Copy the code

The above four values refer to:

  • Cached

  • Free,

  • Zram,

  • Kernel,Native

Cached+Free memory represents the amount of memory currently available on the phone, and if the value is small, it means that memory is in short supply. The lower memory threshold is as follows: 4 gb the lower memory threshold is 350MB, and the higher memory threshold is 450MB

Ps: If you print onTrimMemory in the log before and after ANR time point, it can also be used as a reference to judge memory tension

4.3 Stack Messages

The stack information is the most important piece of information, showing the current state of all threads in the process where ANR occurred.

suspend all histogram:  Sum: 2.834s 99% C.I. 5.738us-7145.919us Avg: 607.155us Max: 41543us
DALVIK THREADS (248):
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74b17080 self=0x7bb7a14c00
  | sysTid=2080 nice=-2 cgrp=default sched=0/0 handle=0x7c3e82b548
  | state=S schedstat=( 757205342094 583547320723 2145008 ) utm=52002 stm=23718 core=5 HZ=100
  | stack=0x7fdc995000-0x7fdc997000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0xb0/0xbc
  kernel: SyS_epoll_wait+0x288/0x364
  kernel: SyS_epoll_pwait+0xb0/0x124
  kernel: cpu_switch_to+0x38c/0x2258
  native: #00 pc 000000000007cd8c  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 0000000000014d48  /system/lib64/libutils.so (android::Looper::pollInner(int) +148)
  native: #02 pc 0000000000014c18  /system/lib64/libutils.so (android::Looper::pollOnce(int.int*, int*, void+ * *)60)
  native: #03 pc 0000000000127474  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long.int) +44)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:330)
  at android.os.Looper.loop(Looper.java:169)
  at com.android.server.SystemServer.run(SystemServer.java:508)
  at com.android.server.SystemServer.main(SystemServer.java:340)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:856)... Omit line N....."OkHttp ConnectionPool" daemon prio=5 tid=251 TimedWaiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13daea90 self=0x7bad32b400
  | sysTid=29998 nice=0 cgrp=default sched=0/0 handle=0x7b7d2614f0
  | state=S schedstat=( 951407 137448 11 ) utm=0 stm=0 core=3 HZ=100
  | stack=0x7b7d15e000-0x7b7d160000 stackSize=1041KB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x05e5732e> (a com.android.okhttp.ConnectionPool)
  at com.android.okhttp.ConnectionPool$1.run(ConnectionPool.java:103)
  - locked <0x05e5732e> (a com.android.okhttp.ConnectionPool)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
  at java.lang.Thread.run(Thread.java:764)
Copy the code

As shown in the above log, this article captures two thread information. One is the main thread, and its state is native.

The other is OkHttp ConnectionPool, whose state is TimeWaiting. As you know, the textbook says there are five thread states: new, ready, execute, blocked, and dead. There are six Thread states in Java, all of which are defined in java.lang.thread.state

** What is the native state of the main thread? ** The states in the trace file are the states defined in the CPP code. Here is a table of the corresponding states.

Thus, the native state of the main function is that the JNI function is executing. Stack information is the first important information to analyze ANR. In general:

The main thread is in the BLOCK, WAITING, TIMEWAITING state.

If the main thread is normal, check the CPU load and memory environment.

V. Typical case analysis

5.1 The main thread is not jammed, and the stack is in normal state

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74b38080 self=0x7ad9014c00
  | sysTid=23081 nice=0 cgrp=default sched=0/0 handle=0x7b5fdc5548
  | state=S schedstat=( 284838633 166738594 505 ) utm=21 stm=7 core=1 HZ=100
  | stack=0x7fc95da000-0x7fc95dc000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0xb0/0xbc
  kernel: SyS_epoll_wait+0x288/0x364
  kernel: SyS_epoll_pwait+0xb0/0x124
  kernel: cpu_switch_to+0x38c/0x2258
  native: #00 pc 000000000007cd8c  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 0000000000014d48  /system/lib64/libutils.so (android::Looper::pollInner(int) +148)
  native: #02 pc 0000000000014c18  /system/lib64/libutils.so (android::Looper::pollOnce(int.int*, int*, void+ * *)60)
  native: #03 pc 00000000001275f4  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long.int) +44)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:330)
  at android.os.Looper.loop(Looper.java:169)
  at android.app.ActivityThread.main(ActivityThread.java:7073)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:876)
Copy the code

The above main thread stack is a normal free stack, indicating that the main thread is waiting for new messages.

If the main thread in the ANR log is in such a state, there may be two reasons:

This ANR is caused by other factors such as CPU preemption or memory stress

When this ANR log was captured, the main thread was restored to normal

When you encounter this free stack, you can analyze the CPU/memory situation as described in Section 3. Second, you can focus on whether there is too much time between the fetching of the log and the occurrence of ANR to make the stack useless.

5.2 Time-consuming Operations on the Main Thread

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 flags=0 obj=0x72deb848 self=0x7748c10800
  | sysTid=8968 nice=-10 cgrp=default sched=0/0 handle=0x77cfa75ed0
  | state=R schedstat=( 24783612979 48520902 756 ) utm=2473 stm=5 core=5 HZ=100
  | stack=0x7fce68b000-0x7fce68d000 stackSize=8192KB
  | held mutexes= "mutator lock"(shared held)
  at com.example.test.MainActivity$onCreate$2.onClick(MainActivity.kt:20) -- key line!! at android.view.View.performClick(View.java:7187)
  at android.view.View.performClickInternal(View.java:7164)
  at android.view.View.access$3500(View.java:813)
  at android.view.View$PerformClick.run(View.java:27640)
  at android.os.Handler.handleCallback(Handler.java:883)
  at android.os.Handler.dispatchMessage(Handler.java:100)
  at android.os.Looper.loop(Looper.java:230)
  at android.app.ActivityThread.main(ActivityThread.java:7725)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:526)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1034)
Copy the code

The above log indicates that the main thread is in the execution state, and the stack information indicates that it is not idle. ANR occurred because of a time-consuming operation performed in a click listener.

5.3 Main thread is blocked by a lock

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x72deb848 self=0x7748c10800
  | sysTid=22838 nice=-10 cgrp=default sched=0/0 handle=0x77cfa75ed0
  | state=S schedstat=( 390366023 28399376 279 ) utm=34 stm=5 core=1 HZ=100
  | stack=0x7fce68b000-0x7fce68d000 stackSize=8192KB
  | held mutexes=
  at com.example.test.MainActivity$onCreate$1.onClick(MainActivity.kt:15)
  - waiting to lock <0x01aed1da> (a java.lang.Object) held by thread 3—————— key line!! at android.view.View.performClick(View.java:7187)
  at android.view.View.performClickInternal(View.java:7164)
  at android.view.View.access$3500(View.java:813)
  at android.view.View$PerformClick.run(View.java:27640)
  at android.os.Handler.handleCallback(Handler.java:883)
  at android.os.Handler.dispatchMessage(Handler.java:100)
  at android.os.Looper.loop(Looper.java:230)
  at android.app.ActivityThread.main(ActivityThread.java:7725)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:526)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1034)... Omit line N....."WQW TEST" prio=5 tid=3 TimeWating
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c44230 self=0x772f0ec000
  | sysTid=22938 nice=0 cgrp=default sched=0/0 handle=0x77391fbd50
  | state=S schedstat=( 274896 0 1 ) utm=0 stm=0 core=1 HZ=100
  | stack=0x77390f9000-0x77390fb000 stackSize=1039KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x043831a6> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:440)
  - locked <0x043831a6> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:356)
  at com.example.test.MainActivity$onCreate$2$thread$1.run(MainActivity.kt:22)
  - locked <0x01aed1da> (a java.lang.object)———————————————————— at java.lang.Thread.run(Thread.java:919)
Copy the code

This is a typical example of the main thread being blocked by a lock;

waiting to lock <0x01aed1da> (a java.lang.Object) held by thread 3
Copy the code

The waiting lock is < 0x01Aed1da >, which is held by thread 3. A further search for “tid=3” finds thread 3 and finds that it is TimeWating.

So the reason for ANR is found: thread 3 holds a lock and has not released it for a long time, and the main thread is waiting for the lock to time out. A common lock-anR scenario in an online environment is SharePreference write.

5.4 THE CPU Is Preempted

CPU usage from 0ms to 10625ms later (2020-03- 0914:38:31.633 to 2020-03- 0914:38:42.257): 543%, 2045 / com. Alibaba. Android. Rimet: 54% user + 89% kernel/faults: 4608 minor major - 1 - a key line!!!!!! 674 / [email protected]: 99% to 81% user + 18% kernel/faults: 403 Minor 24% 32589/com.wang.test: 22% user + 1.4% kernel/faults: 7432 minor 1 major........ Omit line N.....Copy the code

In the above log, the second line is the nailed process, which occupies up to 543% of the CPU, and occupies most of the CPU resources, thus causing ANR.

5.5 ANR due to Memory Shortage

If there is a log, the CPU and stack are fine (not posted), and ANR still occurs, considering memory tightness.

From the first line of information on the CPU, CPU usage from 0ms to 21752ms later (2020-10-31 22:38:58.468 to 2020-10-31 22:39:20.220)

Then go to the system log and search for AM_meminfo. This one is not found. After searching onTrimMemory again, many records were found.

10-31 22:37:19.749 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:37:33.458 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:00.153 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:58.731 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:39:02.816 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
Copy the code

It can be seen that memory is in a tight state before and after ANR occurs. The level is 80. Check the Android API documentation.

   /**
    * Level for {@link #onTrimMemory(int)}: the process is nearing the end
    * of the background LRU list, and if more memory isn't found soon it will
    * be killed.
    */
   static final int TRIM_MEMORY_COMPLETE = 80;
Copy the code

The level of 80 is very serious. The application is going to be killed immediately. The application that is killed can be seen from the name of the desktop, and even the desktop is going to be killed.

Generally speaking, memory shortage will lead to ANR of multiple applications, so if you find multiple applications ANR together in the log, you can preliminarily determine that this ANR is not related to your application.

5.6 ANR Is Caused by System Service Timeout

System services timeout typically contains BinderProxy. TransactNative keywords, please see the following logs:

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x727851e8 self=0x78d7060e00
  | sysTid=4894 nice=0 cgrp=default sched=0/0 handle=0x795cc1e9a8
  | state=S schedstat=( 8292806752 1621087524 7167 ) utm=707 stm=122 core=5 HZ=100
  | stack=0x7febb64000-0x7febb66000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x90/0xc4
  kernel: binder_thread_read+0xbd8/0x144c
  kernel: binder_ioctl_write_read.constprop. 58+0x20c/0x348
  kernel: binder_ioctl+0x5d4/0x88c
  kernel: do_vfs_ioctl+0xb8/0xb1c
  kernel: SyS_ioctl+0x84/0x98
  kernel: cpu_switch_to+0x34c/0x22c0
  native: #00 pc 000000000007a2ac  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 00000000000276ec  /system/lib64/libc.so (ioctl+132)
  native: #02 pc 00000000000557d4  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
  native: #03 pc 0000000000056494  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*) +60)
  native: #04 pc 00000000000562d0  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int) +216)
  native: #05 pc 000000000004ce1c  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int) +72)
  native: #06 pc 00000000001281c8  /system/lib64/libandroid_runtime.so (???)
  native: #07 pc 0000000000947ed4  /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196.) at the android OS. BinderProxy. TransactNative (Native method) -- -- -- -- -- -- -- -- the key line!!!!!! at android.os.BinderProxy.transact(Binder.java:804)
  at android.net.IConnectivityManager$Stub$Proxy.getActiveNetworkInfo(IConnectivityManager.java:1204) - Key line! at android.net.ConnectivityManager.getActiveNetworkInfo(ConnectivityManager.java:800)
  at com.xiaomi.NetworkUtils.getNetworkInfo(NetworkUtils.java:2)
  at com.xiaomi.frameworkbase.utils.NetworkUtils.getNetWorkType(NetworkUtils.java:1)
  at com.xiaomi.frameworkbase.utils.NetworkUtils.isWifiConnected(NetworkUtils.java:1)
Copy the code

From the stack you can see that getting network information happens ANR: getActiveNetworkInfo.

As mentioned above, all system services are Binder mechanism (16 threads). Service capability is also limited. It is possible that system services do not respond for a long time, resulting in ANR. If other applications occupy all Binder threads, the current application can only wait.

Can be further search: blockUntilThreadAvailable key word:

at android.os.Binder.blockUntilThreadAvailable(Native method)
Copy the code

If there is a stack that finds a thread that contains this word, you can look at the stack further to determine what system service was called. This type of ANR is also a system environment problem, and if it occurs frequently on a certain type of machine, the application layer can consider circumventing strategies.

Six, the concluding

This article summarizes the skills from the author’s work in a large number of ANR log analysis experience, if there are errors, please leave a message, communication promotes progress!

Author: Vivo Internet Client team – Wang Qinwei