@TOC

First collect the dump trace and send the dump trace signal to the corresponding process

1. When some system messages with timeout mechanism (such as the creation of Service) are judged to have timed out, AMS interface of system Service will be called to collect ANR information and archive it (data/ ANR /trace, data/system/ Dropbox).

2. When entering AMS, AppError will filter. (1. The system has killed the system. 4. Whether the system is shutting down.) If no, anR occurs in the current process.

3. Next, the System determines whether the current ANR process is perceptable to users, and then starts to count the information of the processes associated with the process or some System core service processes (such as SurfaceFligner and System Server processes that interact with applications). If these System service processes are blocked in response, The IPC communication of the application process will be blocked. It then gets the other system core processes, because these server processes are created directly by the init process and are not managed by the SystemServer or Zygote process.

FirstPids queue: the first is ANR process, the second is system_server, and the rest is all persistent processes;

Native queue: mediaserver,sdcard, and SurfaceFlinger processes in/ system/bin/. LastPids queue: All processes in mLruProcesses that do not belong to firstPids.

4. After collecting the first step of information, start counting more information about each process locally, such as virtual machine information, Java thread state, and stack. An ANR dialog pops up and a SHOW_NOT_RESPONDING_MSG message is sent to the UI thread

5. When the UI thread receives the message, it calls the dumpstackfunctions:

The most important point: Send the target process SINAL_QUIT (Signal Catcher in the process does block detection to collect information later), At the same time, the traces of the top 5 processes are stored at a fixed interval of 200ms, and the collection of user data is time-consuming.

= = summary = =;

Output am_ANR information to EventLog(read this log first when analyzing ANR problems)

Get important process information. Java and native processes output ANR Reason and CPU usage to main_log. Send the SINAL_QUIT signal to the collected process in the file saved to drPObox.

The final step is then analyzed and signals are sent to the collected process

== (SuspendAll thread for dump before Android5.0) After 5.0, checkPoint is used to dump information) ==

At ANR, the systemServer process executes the dumpstackfunctions, which issue SIGQUIT signals to the corresponding processes.

For security reasons, processes are isolated from each other, and even the system process cannot obtain information from other processes. Therefore, the instructions are sent to the target process by means of IPC communication. After receiving the message, the target process assists in completing its own process Dump information and sends it to the system process. Android P process:

1. When a process receives SIGQIUT, the WaitForSignal function of the SingaCatcher thread returns and HandlerSigQuit() is called.

2. HindleSigQuit () :

3.DumpForSigQuit () function:

This is the information to read, but when to read it (and when to ensure that you get the things you need, such as GC information and how many objects are currently allocated, which are typically printed by all threads in the suspend process), let’s begin by analyzing the suspend process:

This suspend SupendAll is implemented in thread_list. cc, which is used to suspend all other threads in the current process (typically during GC, DumpForSigQuit, etc.). SuspendAll ModifySupendCount(self, +1, false) ModifySupendCount(self, +1, false)

Since the delta value passed in is +1, the AtmoicSetFlag() is used to set the KSuspendRequest flag bit using the atomic operation, indicating that the current thread has a pending request. When will this flag bit be detected? The CheckSuspend function is executed when a thread is executing a context switch (such as a Java thread converting to a Native thread).

The FullSuspendCheck function is executed when the KSuspendRequest flag is detected. The KSuspendRequest flag bit is used to dump the stack of the thread.

Call TransitionFromRunnableToSuspend () after the function, the thread into the KSuspended state, Then call TransitionFromSuspendedToRunnablecpm function from the Suspend state when switching to a Runnable state blocks on a condition variable, unless calling SuspendAll thread and then calls the ResumeAll () function, Otherwise, these threads will always be blocked.

4. The process of the SuspendAll stack is resolved, but the stack is dumped by the KCheckpointRequest flag instead of the KSuspendRequest flag. Next take a look at the Dump function of Thread_list, which is called from DumpForSigQuit of Thread_list during SIGQUIT processing by the Signal Cathcer thread.

This function creates an object called DumpCheckPoint checkpoint and calls RunCheckpoint to pass in the object. This function returns the number of threads currently in the Runnable state. Then call the WaitForThreadsToRunThroughCheckpoint () wait for these in the Runnable threads are performed DumpCheckpoint Run function, if wait for timeout will be an error.

To analyze the RunCheckPoint function, look at the previous section:

RequestCheckpoint returns true for Runnable threads and false for other threads. Threads with these non-runnable states, like SuspendAll, set the KSuspendRequest flag bit, which is checked for suspension during state changes. The colleague RunCheckPoint function counts these threads to the Vector variable suspend_count_modified_threads, where the Singal Catcher thread actively triggers its dump stack. Now look at the RequestCheckpoint function

The last line sets the kCheckpointRequest flag bit, and the RunCheckpointFunction is executed when the thread switches the running state. Adding a change to the checkpoints run function

(The RequestCheckpoint of a Thread sets the flag bit and the parameter to the element value, which is passed by the Dump call RunCheckpoint (in fact, DumpCheckpoint).) , so that is to execute the run function of DumpCheckpoint:

The Java stack, Native stack, and Kernel stack are printed here. For Runnable threads, call their RequestCkeckPoint function. Threads that are not Runnable are added to a Vector. Then we examine the second part of the RunCheckPoint function:

For threads that are not Runnable, they may not call the Run function, so the Signal Catcher thread can only Dump them. The function of DumpCheckpoint Run is the same as that of a Runnable thread. And finally modify the reference count to keep these threads running when they switch states.

== Summary == :

1. After receiving the signal, the SingalCatcher thread dumps information about the current VM (memory status). Object, load class, GC, etc.)

2. Next, set the flag for each thread to check_point and the request thread state to suspend. This flag is checked when a context switch occurs during thread execution. If a suspend request is found, the system automatically suspends itself. After all threads are suspended, the SingalCatcher thread starts to walk through the Dump stack and thread data of each thread before waking up the thread. If a thread has been unable to suspend causing a timeout, the Dump process fails to throw an exception.

== general process (before Android5.0) == :

==checkPoint: == For safePoint code compiled by ART, you can periodically poll the Runtime to check whether certain code needs to be executed. You can think of these polling points as SafePoint; Safepoint can be used to implement a temporary Java thread, or to implement the Checkpoint mechanism. For example, when the Java code executing thread A reaches SafePoint, the CheckSuspend function is executed. When the checkpoint request is found in the current thread,

The CheckPoint function of the thread is executed at this point; SuspendCheck is performed to suspend request if the current thread is found to be in the suspend state.

So ART CheckPoint should be a safePoint implementation;

==safePoint explanation link: ==

Author: RednaxelaFX links: www.zhihu.com/question/48… The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.

From a compiler and interpreter perspective, there are two types of ART SafePoint:

Active SafePoint: Compile-generated code or interpreted code has active actions to check SafePoint and jump to the appropriate handler if it is found to need access to SafePoint. The ART interpreter places active SafePoint at the backedge of the loop (specifically, at the source before the jump) and at the method return (return/throw Exception). The ART Optimizing Compiler inserts the active SafePoint at the backedge (specifically at the source before the jump) and at the method entry. Passive SafePoint: All unlined call sites are passive Safepoints. There’s no active code here, just a normal method call. It is used as SafePoint because the control is handed over to the called method after the execution to the method call point, and the called method may enter SafePoint. In SafePoint, stack frames may need to be traversed, so the caller must also be in SafePoint.

The idea behind placing SafePoint is that the program should be able to execute to the nearest SafePoint in a timely manner after the Runtime requests safePoint and then transfer control to the Runtime. What is “timely”? As long as the execution time is bounded, the demand for real-time performance is not very high. We further assume that any code that executes forward (both linear and conditional branches count) will run out of finite time, so we can ignore it; Code that can lead to long execution is either a loop or a method call, so simply inserting SafePoint in either of these places ensures timeliness. Whether safePoint is inserted at the method entrance or exit, the source of the loop back to the edge, or the target, is an implementation detail; just choose the side to insert.