preface

I recently read byte’s new article on how to systematically manage iOS stability issues. It is mentioned that when the APP freezes, we can determine whether the deadlock is caused by deadlock through deadlock detection.

It’s as simple as opening a refrigerator and putting an elephant in it.

In the spirit of learning, to achieve a simple, while recording the problems encountered this time. Personal level is limited, the code is for reference only.

The project address

Caton detection

First, let’s review the mainstream katton detection scheme.

  1. Ping the main thread opens a child thread and periodically throws a bool variable into the main thread to set the value. If the variable that was last thrown into the main thread does not recover, it can be judged to be stuck. For an implementation, see KSCrashMonitor_Deadlock. M in KSCrash

  2. The main thread Runloop detection implementation example is smlagmonitor.m in GCDFetchFeed. Matrix and so on

The target

Suppose I have detected the stutter, now I want to check if the current stutter is caused by a deadlock.

Let’s do it step by step

Some of the underlying thread library API, usually used less, so reference to many excellent source code use. Here will be the reference documents are also listed, for the convenience of students who want to learn.

  1. Get thread list
    // Get the list of threads
    // See ksmachinecontext. c ksmc_suspendEnvironment
    // See smcpumonitor. m updateCPU
    thread_act_array_t threads = NULL;
    mach_msg_type_number_t numThreads = 0;
    kern_return_t kr;
    const task_t thisTask = mach_task_self();
    if((kr = task_threads(thisTask, &threads, &numThreads)) ! = KERN_SUCCESS) { NSLog(@"task_threads: %s", mach_error_string(kr));
        return;
    }
Copy the code
  1. Getting thread information
Uintptr_t thread_self(void) {// https://blog.csdn.net/killer1989/article/details/106674973 thread_t thread_self = mach_thread_self(); mach_port_deallocate(mach_task_self(), thread_self); return thread_self; }Copy the code
    // Get thread information
    // see ksthread. c ksthread_getQueueName
    // Refer to smcallstack. m smStackOfThread
    const thread_t thisThread = (thread_t)thread_self();
    for(mach_msg_type_number_t i = 0; i < numThreads; i++) {
        if (threads[i] == thisThread) {
            continue;
        }
        // Basic thread information Name CPU usage
        thread_extended_info_data_t threadInfoData;
        mach_msg_type_number_t threadInfoCount = THREAD_EXTENDED_INFO_COUNT;
    if (thread_info((thread_act_t)threads[i], THREAD_EXTENDED_INFO, (thread_info_t)&threadInfoData, &threadInfoCount) == KERN_SUCCESS){
        integer_t cpu_usage = threadInfoData.pth_cpu_usage;
        integer_t run_state = threadInfoData.pth_run_state;
        integer_t flags = threadInfoData.pth_flags;
        char *pth_name = threadInfoData.pth_name;
    }
Copy the code

From here, we can iterate through the CPU usage, thread status, identity, thread name for each thread. Note that the thread_info function can also fetch additional thread information by passing in the corresponding THREAD_EXTENDED_INFO_COUNT and THREAD_EXTENDED_INFO and the corresponding thread information structure. We’ll use it later when we get the thread ID. I chose to use thread_extended_info_data_t here because the structure contains the thread name and does not need to be retrieved through another API.

  1. Dead-loop judgment

Let’s start with an easy way to determine an infinite loop based on the scheme in bytes.

// If the main thread has a high CPU usage and is in the running state, it should be suspected that the main thread has some cpu-intensive tasks such as an infinite loop. If ((run_state & TH_STATE_RUNNING) && cpu_usage > 800) {// Check [SMCPUMonitor updateCPU] NSLog(@" suspicious loop :%@",threadDesc); }Copy the code

4. Suspected deadlock

// There are two main analytical approaches: // If the CPU usage of the main thread is 0, it is in the waiting state and has been switched out. If ((run_state & TH_STATE_WAITING) && (flags & TH_FLAGS_SWAPPED) && cpu_usage == 0){// Suspected deadlock}Copy the code

5. Get thread instructions

// We can get the state of all threads when stuck and screen out all threads in wait state, then get the current PC address of each thread, that is, the method being executed, and judge whether it is a lock wait method through symbolization. // refer to smcallstack. m smStackOfThread // refer to KSStackCursor _STRUCT_MCONTEXT machineContext; // Get the complete machineContext message from thread_get_state, Mach_msg_type_number_t state_count = smThreadStateCountByCPU(); kern_return_t kr = thread_get_state(threads[i], smThreadStateByCPU(), (thread_state_t)&machineContext.__ss, &state_count); if (kr ! = KERN_SUCCESS) { NSLog(@"Fail get thread: %u", threads[i]); continue; } / / get the current instruction by instruction pointer address const uintptr_t instructionAddress = smMachInstructionPointerByCPU (& machineContext); Dl_info info; dladdr((void *)instructionAddress, &info); NSLog (@ "what's instructions -- -- -- -- -- -- -- -- -- -- % s % s", the info. Dli_sname, info. Dli_fname);Copy the code

This part of the code can refer to KSCrash and SMCallStack to get the current thread PC register instruction, trace the stack frame, and symbolized the instruction address code. Both are implemented in the same way. Smcallstack. m code is more commented and easier to understand. KSCrash uses a cursor to trace back step by step. Here I have a question, why do both implement their own symbolization and not use the system dlADDR? Is it because of APP review issues?

  1. Judge instruction
if (strcmp(info.dli_sname, "__psynch_mutexwait") = =0) {
// __psynch_mutexwait /usr/lib/system/libsystem_kernel.dylib
    // This is a thread waiting for a lock
}

Copy the code

Each lock wait method defines an argument that passes in information about the current lock wait.

How do I know what the arguments to the __psynch_mutexwait method are? It looks like this function is defined in libsystem_kernel.dylib, which is not an open source library.

I tried to look it up in other open source libraries and there are. List of Apple open Source libraries

// extern uint32_t __psynch_mutexwait(pthread_mutex_t *mutex, uint32_t mgen, uint32_t ugen, uint64_t tid, uint32_t flags);Copy the code

OK is the first argument. According to the function call convention of C language, the first parameter is placed in the X0 register under arm architecture. Get the parameters as above to get the PC register

uintptr_t firstParamRegister(mcontext_t const machineContext) {
#if defined(__arm64__)
    return machineContext->__ss.__x[0];
#elif defined(__arm__)
    return machineContext->__ss.__x[0];
#elif defined(__x86_64__)
    return machineContext->__ss.__rdi;
#endif
}

uintptr_t firstParam = firstParamRegister(&machineContext);
Copy the code

We can read the lock waiting information from the register and force it into the corresponding structure

The pthread_mutex_t definition looks like this

#ifndef _PTHREAD_MUTEX_T
#define _PTHREAD_MUTEX_T
#include <sys/_pthread/_pthread_types.h> /* __darwin_pthread_mutex_t */
typedef __darwin_pthread_mutex_t pthread_mutex_t;
#endif /*_PTHREAD_MUTEX_T */
Copy the code

Looking through the source code, we find that types_internal. H defines pthread_mutex_t as pthread_mutex_s. OK, copy the definition of pthread_mutex_s.

typedef os_unfair_lock _pthread_lock; struct pthread_mutex_options_s { uint32_t protocol:2, type:2, pshared:2, policy:3, hold:2, misalign:1, notify:1, mutex:1, ulock:1, unused:1, lock_count:16; }; typedef struct _pthread_mutex_ulock_s { uint32_t uval; } *_pthread_mutex_ulock_t; struct pthread_mutex_s { long sig; _pthread_lock lock; union { uint32_t value; struct pthread_mutex_options_s options; } mtxopts; int16_t prioceiling; int16_t priority; #if defined(__LP64__) uint32_t _pad; #endif union { struct { uint32_t m_tid[2]; // thread id of thread that has mutex locked uint32_t m_seq[2]; // mutex sequence id uint32_t m_mis[2]; // for misaligned locks m_tid/m_seq will span into here } psynch; struct _pthread_mutex_ulock_s ulock; }; #if defined(__LP64__) uint32_t _reserved[4]; #else uint32_t _reserved[1]; #endif }; Strong turn! struct pthread_mutex_s *mutex = (struct pthread_mutex_s *)firstParam; uint32_t *tid = mutex->psynch.m_tid; uint64_t hold_lock_thread_id = *tid; NSLog of at sign "who's holding it? ------>%d", *tid);Copy the code

In fact, from the structure definition, I can see that there is something in the union that I want: who is holding the lock

uint32_t m_tid[2]; // thread id of thread that has mutex locked
Copy the code

What about thread ids



I wrote two threads, AB, each holding the AB lock, and requesting the other lock at the same time. The thread ID can be seen in the figure.

So how do you get the thread ID. You can refer to the other structure of the thread_info function above.

struct thread_identifier_info { uint64_t thread_id; /* system-wide unique 64-bit thread id */ uint64_t thread_handle; /* handle to be used by libproc */ uint64_t dispatch_qaddr; /* libdispatch queue address */ }; Thread_identifier_info_data_t threadIDData; mach_msg_type_number_t threadIDDataCount = THREAD_IDENTIFIER_INFO_COUNT; if(thread_info((thread_act_t)threads[i], THREAD_IDENTIFIER_INFO, (thread_info_t)&threadIDData, &threadIDDataCount) == KERN_SUCCESS){ uint64_t thread_id = threadIDData.thread_id; }Copy the code

From this structure, I can get the thread ID and queue address. When I get thread information above, many thread names are empty, and I can’t even tell which thread is the main thread (there are other apis to do that), so I can use queue names to provide auxiliary information here.

int queueNameLen = 128;
char queueName[queueNameLen];
bool getQueueNameSuccess = ksthread_getQueueName((thread_t)threads[i], queueName, queueNameLen);
Copy the code

To get the queue name of the thread, I used KSCrash’s ksthread_getQueueName method directly, partly because I couldn’t force the pointer to the queue structure in the ARC environment, and partly because good source code has a lot of address security checks, so I just used it off the rack and didn’t write it separately. Grateful for the excellent source code.

So far, I know.

  • Who is waiting for the lock
  • Who owns the lock of etc
  1. Determine whether a deadlock is formed
NSMutableDictionary<NSNumber *,NSString *> *threadDescDic = [NSMutableDictionary dictionary]; NSMutableDictionary<NSNumber *,NSMutableArray<NSNumber *> *> *threadWaitDic = [NSMutableDictionary dictionary]; NSString *threadDesc = [NSString stringWithFormat:@"[%llu %s %s ] [run_state: %d] [flags : %d] [cpu_usage : %d]",thread_id,pth_name,getQueueNameSuccess ? queueName : "",run_state,flags,cpu_usage]; threadDescDic[@(thread_id)] = threadDesc; // Save NSMutableArray *array = threadWaitDic[@(hold_lock_thread_id)]; if (! array) { array = [NSMutableArray array]; } [array addObject:@(thread_id)]; threadWaitDic[@(hold_lock_thread_id)] = array;Copy the code

Here I use two dictionaries to store information about all the threads and the ids of the threads waiting for the lock.

/// determine if deadlock exists
// @param threadDescDic Specifies the thread description
// the @param threadWaitDic thread waits for information
+ (void)checkIfIsCircleWithThreadDescDic:(NSMutableDictionary<NSNumber *,NSString *> *)threadDescDic threadWaitDic:(NSMutableDictionary<NSNumber *,NSMutableArray<NSNumber *> *> *)threadWaitDic {
    __block BOOL hasCircle = NO;
    NSMutableDictionary<NSNumber *,NSNumber *> *visited = [NSMutableDictionary dictionary];
    NSMutableArray *path = [NSMutableArray array];
    [threadWaitDic enumerateKeysAndObjectsUsingBlock:^(NSNumber * _Nonnull hold_lock_thread_id, NSMutableArray<NSNumber *> * _Nonnull waitArray, BOOL * _Nonnull stop) {
        [self checkThreadID:hold_lock_thread_id withThreadDescDic:threadDescDic threadWaitDic:threadWaitDic visited:visited path:path hasCircle:&hasCircle];
        if(hasCircle) { *stop = YES; }}];if (hasCircle) {
        NSLog(@"Deadlock found as follows:");
        for (NSNumber *threadID in path) {
            NSLog(@"% @",threadDescDic[threadID]); }}else {
        NSLog(@"No deadlocks found"); }} + (void)checkThreadID:(NSNumber *)threadID withThreadDescDic:(NSMutableDictionary<NSNumber *,NSString *> *)threadDescDic threadWaitDic:(NSMutableDictionary<NSNumber *,NSMutableArray<NSNumber *> *> *)threadWaitDic visited:(NSMutableDictionary<NSNumber *,NSNumber *> *)visited path:(NSMutableArray *)path hasCircle:(BOOL *)hasCircle {
    if (visited[threadID]) {
        *hasCircle = YES;
        NSUInteger index = [path indexOfObject:threadID];
        path = [[path subarrayWithRange:NSMakeRange(index, path.count - index)] mutableCopy];
    }
    if (*hasCircle) {
        return;
    }
    
    visited[threadID] = @1;
    [path addObject:threadID];
    NSMutableArray *array = threadWaitDic[threadID];
    if (array.count) {
        for (NSNumber *next in array) {
            [self checkThreadID:next withThreadDescDic:threadDescDic threadWaitDic:threadWaitDic visited:visited path:path hasCircle:hasCircle];
        }
    }
    [visited removeObjectForKey:threadID];
}
Copy the code

Check for deadlocks using an algorithm that checks for loops in a directed graph.


// Other lock cases TODO
//__psynch_rw_rdlock ReadWrite lock
//__psynch_rw_wrlock ReadWrite lock
//__ulock_wait UnfariLock lock
//_kevent_id GCD lock
Copy the code

Due to the limited time, the detection of other locks was not implemented. I think it’s about the same 😅, need to check out the libDispatch source etc.

The whole implementation process is interesting.

Finally, thanks to the official and the big guys open source code.