What is Caton
A lag occurs when the interface is not responsive or the rendering is sticky during the use of the application. The rendering of the application interface and event response are done in the main thread. The reason for the lag can be attributed to the main thread blocking.
During development, the main thread may be blocked by:
- The main thread is doing a lot of I/O: writing a lot of data directly to the main thread for coding purposes;
- The main thread is doing a lot of computing: the code is not written properly, the main thread is doing complex computing;
- Heavy UI drawing: The interface is too complex, and UI drawing takes a lot of time;
- The main thread is waiting: the main thread needs to acquire lock A, but A child thread currently holds lock A, so the main thread has to wait for the child thread to complete its task.
How to solve the problem of caton
Capture the main thread stack that Caton was using, and with that stack, you can tell what function or line of code the main thread is stuck in, whether it’s waiting for a lock, or doing I/O, or doing complex calculations. With the stack, you can target the problem
The principle of analysis
The main thread has a Runloop. Runloop is an Event Loop model that allows threads to receive messages, process events, and enter wait without exiting immediately. Before and after entering an event, Runloop notifies registered observers of the event
Add an Observer at the beginning and end of the Runloop to get the start and end states of the main thread. When the state of the main thread exceeds a certain threshold, it is considered that the main thread is stalled and thus marked as a stalled thread.
The main program Runloop timeout threshold is 2 seconds, and the child thread check period is 1 second. Every 1 second, the child thread checks the running status of the main thread; If the main thread Runloop runs for more than 2 seconds, it is considered to be stalled and the current thread snapshot is taken.
A high CPU may also cause an application to stall. Therefore, when a child thread checks the status of the main thread and detects that the CPU usage is too high, the child thread will capture the snapshot of the current thread and save it to a file. The wechat App believes that the CPU usage is too high when the single-core CPU usage exceeds 80%
Three ways to monitor APP jams
- FPS
- Runloop
- The child thread pings the main thread
FPS detects APP jams
Normally, the screen is refreshed at 60Hz /s, with a screen refresh signal emitted each time. CADisplayLink allows us to register a callback process that synchronizes with the refresh signal. The FPS value can be displayed using the screen refresh mechanism:
In the display, the frequency is fixed, such as 60 frames per second (60FPS) in iOS, which is 16.7ms per frame. As you can see from the figure above, there is a time interval (16.7ms) between two VSync signals. During this time, the CPU main thread calculates the layout, decodes the image, creates the view, draws the text, and then delivers the content to the GPU, which transforms, synthesizes, renders, and puts it into the frame buffer. If the CPU and GPU do not have time to produce a frame buffer within 16.7ms, the frame will be discarded and the display will remain unchanged and continue to display the previous frame, resulting in a screen jam. Therefore, no matter which CPU or GPU consumes too much time, it will fail to generate a frame cache within 16.7ms
In order for the main thread to achieve rendering efficiency close to 60fps, it is not allowed to have a single computation task in the UI thread that exceeds (1/60s≈16ms), otherwise it will stall
We can detect FPS using CADisplayLink, which uses frame rate as the unit of time interval
- (void)setupDisplayLink {// create CADisplayLink, To add to the current run loop of NSRunLoopCommonModes _displayLink = [CADisplayLink displayLinkWithTarget: self selector:@selector(linkTicks:)]; [_displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSRunLoopCommonModes]; } - (void)linkTicks:(CADisplayLink *)link {// scheduletimes ++; If (_timestamp == 0){_timestamp = link.timestamp; } CFTimeInterval timePassed = link.timestamp - _timestamp; if(timePassed >= 1.f) //fps CGFloat fps = _scheduleTimes/timePassed; printf("fps:%.1f, timePassed:%f\n", fps, timePassed); //reset _timestamp = link.timestamp; _scheduleTimes = 0; }}Copy the code
We can also monitor the FPS using The Core Animation in Instruments in Xcode
In the FPS indicator above, what happens if you place CADisplayLink in the Runloop of a child thread?
The answer is that no matter how busy the main thread is and how high the GPU is occupied, the FPS is always 60 because the FPS indicator based on CADisplayLink can only detect the FPS of the current RunLoop
How does Runloop detect lag
Since runloop will call the callback to synchronize the screen refresh, if the loop interval is longer than 16.67ms, the FPS will naturally fall short of 60Hz. In a loop, there are multiple stages, and you can monitor how long each stage stays
- (void)startRunloopMonitorFreeze { CFRunLoopObserverRef observer = CFRunLoopObserverCreateWithHandler(CFAllocatorGetDefault(), kCFRunLoopAllActivities, YES, 0, ^(CFRunLoopObserverRef observer, CFRunLoopActivity Activity) { The APP is in idle state and is always in the beforeWaiting state, the judgment is not accurate}); CFRunLoopAddObserver(CFRunLoopGetMain(), observer, kCFRunLoopCommonModes); }Copy the code
In this detection mode, when the APP is idle and Runloop is in the constant beforeWaiting state, the judgment is not accurate
The child thread detects the main thread
Annealing algorithm
In order to reduce the performance loss caused by detection, we add annealing algorithm for detection thread:
- Each time a child thread checks that the main thread is stuck, it gets the stack of the main thread and saves it in memory (instead of taking a snapshot of the thread and saving it to a file).
- Compare the obtained main thread stack to the main thread stack obtained by the last catton:
- If the stack is different, the current thread snapshot is taken and written to the file;
- If the same is skipped, the check time increments according to the Fibonacci sequence until no lag is encountered or the main thread lag stack is different.
In this way, you can avoid the same card to write multiple files; Avoid continuously writing to the thread snapshot file if the detection thread is stuck on the main thread.
How do I advance the time stack
When the child thread detects the main thread Runloop, it takes a snapshot of the current thread as a stalled file. However, this current main thread stack is not necessarily the most time-consuming stack and is not necessarily the main cause of the main thread timeout.
For example, the main thread executes funcA(), funcB(), funcC(), and funcD(), respectively, and the child thread takes a snapshot when it detects that the threshold is exceeded. The current task of the main thread is funcD(), but funcB() is the main reason that the main thread times out. Caton monitoring solves this problem with time-consuming stack extraction of the main thread
The caton monitor periodically obtains the main thread stack and stores the stack in a circular queue in memory. As shown in the figure below, a stack is obtained at every interval t, and the stack is saved in a circular queue with a maximum number of 3. There’s a cursor that keeps pointing to the nearest stack.
Wechat’s strategy is to get the main thread stack every 50 milliseconds and save the last 20 main thread stacks. This increases the CPU usage by 3% and the memory usage is negligible.
When the main thread detects a card, it retrieves the most recent and time-consuming stack by tracing the stack stored in the loop queue.
As shown in the figure below, when the card is detected, the last 20 main thread stacks are recorded in the memory’s circular queue, and the most recent and time-consuming stack needs to be found. Catton monitor finds the most recent stack with the following features:
- Characterized by the top of the stack function, the top of the stack function is the same that the entire stack is the same;
- The interval of fetching the stack is the same, the number of repetitiveness of the stack is approximately the time of calling the stack, the more repetitiveness, the more time-consuming;
- There may be multiple stacks that repeat the same number of times, and fetching the nearest one takes the most time.