Caton principle
Time consuming operations on the main thread will cause stuttering. Stuttering exceeds the threshold, triggering ANR. When the application starts, Zygote reflects the main method that calls ActivityThread to start the loop. ActivityThread (api29)
public static void main(String[] args) { Looper.prepareMainLooper(); . Looper.loop();throw new RuntimeException("Main thread loop unexpectedly exited");
}
Copy the code
Looper’s loop method:
// Run message queue in thread. Be sure to call
public static void loop(a) {
for (;;) {
// 1
Message msg = queue.next(); // might block.// This must be in a local variable, in case a UI event sets the logger
// callback before message processing
final Printer logging = me.mLogging;
if(logging ! =null) {
logging.println(">>>>> Dispatching to " + msg.target + "" +
msg.callback + ":"+ msg.what); }...// 3. The message starts processingmsg.target.dispatchMessage(msg); .// 4, message processing is finished callback
if(logging ! =null) {
logging.println("<<<<< Finished to " + msg.target + ""+ msg.callback); }}}Copy the code
The for loop exists and the main thread can run for a long time. If the main thread executes a task, the Handler can post a task to the message queue, loop it to the MSG, and hand it to the TARGET (Handler) of the MSG.
Can lead to caton in two places:
- Note 1 the queue. The next ()
- Note 3 dispatchMessage takes time
Messagequyue. next Time-consuming code (API29)
@UnsupportedAppUsage
Message next(a) {
for (;;) {
NextPollTimeoutMillis is blocked if it is not 0
nativePollOnce(ptr, nextPollTimeoutMillis);
// Check whether the first message is a synchronization barrier message.
if(msg ! =null && msg.target == null) {
// 3. If a synchronous barrier message is encountered, it is skipped to fetch the following asynchronous message for processing
// Stalled by a barrier. Find the next asynchronous message in the queue.
do {
prevMsg = msg;
msg = msg.next;
} while(msg ! =null && !msg.isAsynchronous());
}
// 4. Normal message processing, determine whether delay
if(msg ! =null) {
if (now < msg.when) {
// Next message is not ready. Set a timeout to wake up when it is ready.
nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
} else {
// Got a message.
mBlocked = false;
if(prevMsg ! =null) {
prevMsg.next = msg.next;
} else {
mMessages = msg.next;
}
msg.next = null;
if (DEBUG) Log.v(TAG, "Returning message: " + msg);
msg.markInUse();
returnmsg; }}else {
// If no asynchronous message is received, the next loop will proceed to gaze 1. Ativepollonce is -1, which will block forever
// No more messages.
nextPollTimeoutMillis = -1; }}}Copy the code
- MessageQueue is a linked list data structure, determine whether the MessageQueue header (the first message) is a synchronous barrier message (add a layer of barrier to the synchronous message, so that the synchronous message will not be processed, only asynchronous message);
- If a synchronous barrier message is encountered, the synchronous message in MessageQueue is skipped and only the asynchronous message inside is processed. If there are no asynchronous messages, annotation 5, nextPollTimeoutMillis -1, and the next iteration of Annotation 1’s nativePollOnce will block.
- If Looper can get the message properly, regardless of whether it is asynchronous or synchronous, the process is the same. In comment 4, check whether it is delayed. If so, nextPollTimeoutMillis is assigned, and the next call to nativePollOnce in comment 1 will block for some time. If it is not a delay message, return MSG to handler for processing.
The next method takes messages from MessageQueue continuously, processes messages when there are messages, and calls nativePollOnce blocking when there are no messages. At the bottom is the EPoll mechanism of Linux, and Linux IO multiplexing.
Linux I/O multiplexing schemes include Select, poll, and epoll. Epoll has the best performance and supports the largest number of concurrent requests.
- Select: is a system call function provided by the operating system that sends an array of file descriptors to the operating system. The operating system iterates over which descriptors can be read or written and tells us to process them.
- Poll: The main difference between poll and SELECT is that the limit of 1024 file descriptors that select can listen on is removed.
- Epoll: Improvements for the three optimizable points of SELECT.
1The kernel keeps a set of file descriptors without the user passing them in again each time, just telling the kernel to modify the part.2The kernel no longer finds ready file descriptors through polling and wakes them up through asynchronous IO events.3The kernel only returns file descriptors with IO to the user. The user does not need to traverse the entire set of file descriptors.Copy the code
Synchronous barrier messages
The Android App cannot directly invoke the synchronous message barrier, MessageQueue (API29) code
@TestApi
public int postSyncBarrier(a) {
return postSyncBarrier(SystemClock.uptimeMillis());
}
private int postSyncBarrier(long when) {... }Copy the code
The system uses synchronous barrier messages for high-priority operations, such as the scheduleTraversals method of ViewRootImpl when the View is drawn, inserting the synchronization barrier message, and removing the synchronization barrier message when the View is drawn. ViewRootImpl api29
@UnsupportedAppUsage
void scheduleTraversals(a) {
if(! mTraversalScheduled) { mTraversalScheduled =true;
mTraversalBarrier = mHandler.getLooper().getQueue().postSyncBarrier();
mChoreographer.postCallback(
Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null);
if (!mUnbufferedInputDispatch) {
scheduleConsumeBatchedInput();
}
notifyRendererOfFramePending();
pokeDrawLockIfNeeded();
}
}
void unscheduleTraversals(a) {
if (mTraversalScheduled) {
mTraversalScheduled = false;
mHandler.getLooper().getQueue().removeSyncBarrier(mTraversalBarrier);
mChoreographer.removeCallbacks(
Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null); }}Copy the code
To ensure that the drawing process of the View is not affected by other tasks of the main thread, the View will insert the synchronization barrier message to MessageQueue before drawing, and then register the Vsync signal listener. Choreographer $FrameDisplayEventReceiver listen callback vsync reception.
private final class FrameDisplayEventReceiver extends DisplayEventReceiver
implements Runnable {
@Override
public void onVsync(long timestampNanos, long physicalDisplayId, int frame) {
Message msg = Message.obtain(mHandler, this);
// 1. Send an asynchronous message
msg.setAsynchronous(true);
mHandler.sendMessageAtTime(msg, timestampNanos / TimeUtils.NANOS_PER_MS);
}
@Override
public void run(a) {
// 2. DoFrame takes precedencedoFrame(mTimestampNanos, mFrame); }}Copy the code
After receiving the Vsync signal callback, comment 1 posts an asynchronous message to the main thread MessageQueue to ensure that comment 2’s doFrame is executed first.
DoFrame is where the View really starts to draw, it calls the View otiml doTraversal, performTraversals, Formtraversals calls onMeasure, onLayout and onDraw of the View.
While apps cannot send synchronous barrier messages, the use of asynchronous messages is allowed.
The ASYNCHRONOUS Message SDK restricts the App from Posting asynchronous messages to MessageQueue, the Message class
@UnsupportedAppUsage
/*package*/ int flags;
Copy the code
Use asynchronous messages with caution. If not used properly, the main thread may fake death.
Handler#dispatchMessage
/** * Handle system messages here. */
public void dispatchMessage(@NonNull Message msg) {
if(msg.callback ! =null) {
handleCallback(msg);
} else {
if(mCallback ! =null) {
if (mCallback.handleMessage(msg)) {
return; } } handleMessage(msg); }}Copy the code
- Handler#post(Runnable r)
- The constructor passes CallBack
- Handler overrides the handlerMessage method
Application lag is usually caused by the time it takes for the Handler to process messages (method itself, algorithm efficiency, CPU preemption, insufficient memory, IPC timeout, etc.).
Caton monitoring
Caton monitoring scheme one Looper#loop
// Run message queue in thread. Be sure to call
public static void loop(a) {
for (;;) {
// 1
Message msg = queue.next(); // might block.// This must be in a local variable, in case a UI event sets the logger
// callback before message processing
final Printer logging = me.mLogging;
if(logging ! =null) {
logging.println(">>>>> Dispatching to " + msg.target + "" +
msg.callback + ":"+ msg.what); }...// 3. The message starts processingmsg.target.dispatchMessage(msg); .// 4, message processing is finished callback
if(logging ! =null) {
logging.println("<<<<< Finished to " + msg.target + ""+ msg.callback); }}}Copy the code
Looper.getmainlooper ().setMessagelogging (printer) specifies the time before and after the message. “DispatchMessage” has long since been called and the stack does not contain “Caton” code.
The main thread stack is obtained regularly, the time is key, the stack information is value, save the map, the delay occurs, and take out the stack within the delay time is feasible. Suitable for offline use.
- Logging. println has string concatenation, frequent calls, large number of objects created, and memory jitter.
- Background frequently obtain the main thread stack, impact on performance, obtain the main thread stack, suspend the main thread running.
Caton surveillance plan two
Bytecode staking technology is needed for on-line stuck monitoring.
With Gradle Plugin+ASM, a line of code is inserted at the beginning and end of each method at compile time. For example, Wechat Matrix uses caton monitoring scheme. Attention issues:
- Avoid method explosion: Assign a unique ID as a parameter
- Filter simple functions: added dark sheets to reduce unnecessary function statistics
A lot of optimization was made for wechat Matrix, the package volume increased by 1% ~ 2%, the frame rate decreased by less than 2 frames, and gray package was used.
ANR principle
- Service Timeout: The foreground Service is not executed within 20 seconds, and the background Service is executed within 10 seconds
- BroadcastQueue Timeout: the BroadcastQueue Timeout is implemented within 10 seconds in the foreground and 60 seconds in the background
- ContentProvider Timeout:publish Timeout 10s
- InputDispatching Timeout: Input events are distributed regularly for more than 5s, including key strokes and touch events.
ActivityManagerService api29
// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;
Copy the code
ANR triggers the process
bomb
Background sevice call: Context.startService–> AMS.startService–> ActiveService.startService–> ActiveService.realStartServiceLocked
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
// send delay message (SERVICE_TIMEOUT_MSG)
bumpServiceExecutingLocked(r, execInFg, "create");
try {
// 2. Notify AMS to create the serviceapp.thread.scheduleCreateService(r, r.serviceInfo, mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo), app.getReportedProcState()); }}Copy the code
Note 1 call scheduleServiceTimeoutLocked inside
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
// Send the delay message. The delay is 20s for the foreground service and 200s for the background service
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
Copy the code
Note 1 sends a delay message to the handler before notifying AMS to start the service. ActiveServices#serviceTimeout is invoked if the handler does not finish processing for 20 seconds.
bomb
When a Service is started, AMS manages it. AMS then tells the application to execute the Service lifecycle. The handlerCreateService method of ActivityThread is called.
@UnsupportedAppUsage
private void handleCreateService(CreateServiceData data) {
try {
Application app = packageInfo.makeApplication(false, mInstrumentation);
service.attach(context, this, data.info.name, data.token, app,
ActivityManager.getService());
// 1. Call service onCreate
service.onCreate();
mServices.put(data.token, service);
try {
// 2
ActivityManager.getService().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_ANON, 0.0);
} catch (RemoteException e) {
throwe.rethrowFromSystemServer(); }}}Copy the code
Note 1, 2 onCreate method of Service be invoked, call the AMS serviceDoneExecuting method, will eventually call ActiveServices. ServiceDoneExecutingLocked
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
// Remove delay messages
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
}
Copy the code
After onCreate is called, the delay message is removed and the bomb is defused.
Detonate the bomb. If the Service onCreate executes for more than 10 seconds, the bomb will detonate and the ActiveServices#serviceTimeout method will be called. api29
void serviceTimeout(ProcessRecord proc) {
if(anrMessage ! =null) {
proc.appNotResponding(null.null.null.null.false, anrMessage); }}Copy the code
All the ANRs, and finally the appNotResponding method with the call to ProcessRecord. api29
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
String parentShortComponentName, WindowProcessController parentProcess,
boolean aboveSystem, String annotation) {
// 1. Write event log
// Log the ANR to the event log.
EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
annotation);
// 2. Collect required logs, anr, CPU, etc., and place them in StringBuilder.
// Log the ANR to the main log.
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(processName);
if(activityShortComponentName ! =null) {
info.append("(").append(activityShortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(pid).append("\n");
if(annotation ! =null) {
info.append("Reason: ").append(annotation).append("\n");
}
if(parentShortComponentName ! =null
&& parentShortComponentName.equals(activityShortComponentName)) {
info.append("Parent: ").append(parentShortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// dump stack information, including Java stack and native stack, to a file
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
(isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
nativePids);
String cpuInfo = null;
// 4. Output ANR logs
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// if tracesFile is not found, send a SIGNAL_QUIT signal
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(pid, Process.SIGNAL_QUIT);
}
// output to drapbox
mService.addErrorToDropBox("anr".this, processName, activityShortComponentName,
parentShortComponentName, parentPr, annotation, cpuInfo, tracesFile, null);
synchronized (mService) {
// select * from ANR
if(isSilentAnr() && ! isDebugging()) { kill("bg anr".true);
return;
}
// error reporting
// Set the app's notResponding state, and look up the errorReportReceivermakeAppNotRespondingLocked(activityShortComponentName, annotation ! =null ? "ANR " + annotation : "ANR", info.toString());
// The handleShowAnrUi method is called when an ANR dialog is displayed
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem); mService.mUiHandler.sendMessage(msg); }}Copy the code
- Write to the event log
- Write the main log
- Generate tracesFile
- Output ANR logcat (visible on the console)
- If the tracesFile is not obtained, the SIGNAL_QUIT signal is sent to trigger the process of collecting thread stack information and write traceFile
- Output to the drapbox
- Background ANR, kill the process directly
- The bug report
- The pop-up ANR dialog calls the AppErrors#handleShowAnrUi method.
ANR triggers the process, buries the bomb -- The process of disarm the bomb starts the Service. Before calling the onCreate method, the Handler will use a message delayed by 10 seconds. After executing the onCreate method of the Service, the delay message will be removed. If the Service's onCreate method takes more than 10 seconds, the delayed message is processed normally, the ANR is triggered, the CPU and stack messages are collected, and the ANR dialog is firedCopy the code
Data /anr/trace. TXT file, but older systems need root permission to read this directory.
ANRWatchDog github.com/SalomonBrys…
Automatic detection of ANR open source libraries