The main thread of online detection is stuck

Hello everyone, first time to write on the nuggets platform. I hope to point out any mistakes. Recently found that online discussion is often APP online state, how to detect the main thread of card, I also learned a little bit about the, some time ago in a blogger see an article about the part explain the problem, it is said that Meituan is this solution, concrete is not clear, and then I found online implementation scheme about this problem is very similar, If you are not aware of this problem at the screen, listen to me analyze this common online test:

Detection scheme using Runloop

I won’t say much about what runloop is, because there are many articles about it on the web, and the most recommended one is the one on YYKit’s author’s blog. What I want to note is the state of the runloop:

typedef CF_OPTIONS(CFOptionFlags, CFRunLoopActivity) { kCFRunLoopEntry = (1UL << 0), KCFRunLoopBeforeTimers = (1UL << 1), // Timer kCFRunLoopBeforeSources = (1UL << 2), // Source kCFRunLoopBeforeWaiting = (1UL << 5), kCFRunLoopAfterWaiting = (1UL << 6), KCFRunLoopExit = (1UL << 7), // About to exit Loop};Copy the code

KCFRunLoopBeforeSources and kCFRunLoopAfterWaiting are used to determine whether there is too much event processing resulting in a lag. The following code is directly used:

static void runLoopObserverCallBack(CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) { PingConfig *object = (__bridge PingConfig*)info; Object ->activity = activity; // Dispatch_semaphore_t semaphore = object->semaphore; dispatch_semaphore_signal(semaphore); }Copy the code

These are callback functions written to listen for runloop status

- (void)registerObserver { PingConfig *config = [PingConfig new]; Dispatch_semaphore_t semaphore = dispatch_semaphore_create(0); config->semaphore = semaphore; CFRunLoopObserverContext context = {0,(__bridge void*)config,NULL,NULL}; CFRunLoopObserverRef observer = CFRunLoopObserverCreate(kCFAllocatorDefault, kCFRunLoopAllActivities, YES, 0, &runLoopObserverCallBack, &context); CFRunLoopAddObserver(CFRunLoopGetMain(), observer, kCFRunLoopCommonModes); __block uint8_t timeoutCount = 0; Dispatch_async (dispatch_get_global_queue(0, 0), ^{while(YES) {// Long st = dispatch_semaphore_wait(semaphore, dispatch_time(DISPATCH_TIME_NOW, 50*NSEC_PER_MSEC));if(st ! = 0) { // NSLog(@"In loop --%ld",config->activity);
                if (config->activity==kCFRunLoopBeforeSources || config->activity==kCFRunLoopAfterWaiting)
                {
                    if (++timeoutCount < 5){
                        continue;
                    }else{
                        NSLog(@"Stuck."); } } } timeoutCount = 0; }}); }Copy the code

Now let me read this code:

PingConfig is just a custom class I wrote to store the state and semaphore of runloop. The structure is as follows:
```
@interface PingConfig : NSObject
{
 @public
 CFRunLoopActivity activity;
 dispatch_semaphore_t semaphore;
}
@endCopy the code
```
Well, that’s all there is to it.
When the APP starts I can go to the registerObserver method, where I first create an instance of the PingConfig class that records information, then create a signal and store it in the PingConfig instance (really just for convenience).
Next I create an observer to monitor the main thread’s runloop, which calls back when the main thread’s runloop state switches.
Start a child thread and execute a while loop, wait for a semaphore at the beginning of the loop and set the timeout to 50 milliseconds. On failure, a non-zero number is returned, and on success, zero is returned. The thread blocks and waits for a signal.
If the runloop state switches normally, then the callback function is called, where we emit a signal and record the current state to the PingConfig instance. The following statement finds 0, and the timeoutCount is automatically set to 0.
If the main thread stalls, the semaphore in the while loop waits again, but the callback function does not fire, so that the wait times out and returns a non-zero number. After entering the considering statement, Again, we determine whether the state is kCFRunLoopBeforeSources or kCFRunLoopAfterWaiting, and if so, timeoutCount+1.
Continuing five times without changing the state means that the runloop is handling some tricky event and cannot rest without updating the state, so the semaphore timeout in the while loop will keep happening, and after five times we will determine that the main thread has stalled and upload the stack information.

After testing, can detect the main thread card phenomenon, have to admire the big guys. However, in a test, it was found that this scheme could not detect the delay when the main thread was stuck before the interface was fully displayed. For example, I put the following code in controller B:

    dispatch_semaphore_t t = dispatch_semaphore_create(0);
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(3.0 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        NSLog(@"--");
        dispatch_semaphore_signal(t);
    });
    dispatch_semaphore_wait(t, DISPATCH_TIME_FOREVER);Copy the code

So here’s a piece of code that’s going to cause the main thread to continue to block, and if we put this code in the ViewDidLoad method of controller B (ViewWillAppear as well), after that, when you want to push to controller B, the project will get completely stuck on the previous interface, And can not be detected with the above scheme, and the CPU and memory display is normal:

After the runloop processes source0 or source1, for example, the interface jump also executes the method. It is not important whether the source0 is used or not, but it will immediately enter the state of kCFRunLoopBeforeWaiting. However, the thread blocking causes the state of the runloop to be stuck and unable to switch, which causes the condition to be unable to be detected in the inspection code. But then again, both the APP being at rest and the APP being stuck keep the runloop in the kCFRunLoopBeforeWaiting state, so we can’t add judgments to that code to fix it because we can’t tell if it’s really at rest and not doing anything or if it’s blocked. I also didn’t find the blocking state property of the thread, if you find this property, then you can use that property to determine. But I would also like to mention my detection scheme when that attribute is not found:

My testing protocol

Code first:

    dispatch_queue_t serialQueue = dispatch_queue_create("serial", DISPATCH_QUEUE_SERIAL); self.timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, serialQueue); Dispatch_source_set_timer (self.timer, DISPATCH_TIME_NOW, 0.25 * NSEC_PER_SEC, 0); __block int8_t chokeCount = 0; dispatch_semaphore_t t2 = dispatch_semaphore_create(0); dispatch_source_set_event_handler(self.timer, ^{if (config->activity == kCFRunLoopBeforeWaiting) {
            static BOOL ex = YES;
            if (ex == NO) {
                chokeCount ++;
                if (chokeCount > 40) {
                    NSLog(@"Almost stuck.");
                    dispatch_suspend(self.timer);
                    return ;
                }
                NSLog(@"Stuck.");
                return ;
            }
            dispatch_async(dispatch_get_main_queue(), ^{
                ex = YES;
                dispatch_semaphore_signal(t2);
            });
            BOOL su = dispatch_semaphore_wait(t2, dispatch_time(DISPATCH_TIME_NOW, 50*NSEC_PER_MSEC));
            if (su != 0) {
                ex = NO;
            };
        }
    });
    dispatch_resume(self.timer);Copy the code

Explain my plan:

Start an asynchronous queue and create a timer. I set the timer to 0.25 seconds. This timer is used to check the duration of the stuck queue.
Outside the timer I also created a semaphore for synchronization. This will not be explained, but to see how the semaphore is used. After entering the timer callback, I set a static variable to record whether the main queue has completed execution.
We determine whether the current runloop state is kCFRunLoopBeforeWaiting, so this scheme is used to compensate for the previous one. If the main thread is not blocked at this point, we throw a block to the main Queue to see if it can execute successfully. If it does, It means that the main thread is not blocked, and if it is blocked, then the block I throw will definitely not be executed.
When the semaphore exceeds 50 ms and the block thrown to the main thread does not execute, something is blocked, return a non-zero number and set ex to NO to report the next timer callback.

The sample code in the solution I wrote is just for demonstration, and the specific principle can be optimized on this basis. Currently, the APP jam caused by the previous blocking can be normally detected in my project. If you find a better detection scheme, please let me know, thank you!

The main thread of online detection is stuck

Related Posts

IOS Learning Notes – Componentization (1)

React Native communicates with iOS and Android

Swift Protocol Oriented programming — Practice