* * * * * * * * * * * * * * * * * *


Scene:

In somePerformance MonitoringIn the tool, when the main thread of App is detected to be stuck, the method call stack of all threads at the current time can be captured by the sub-thread (to save the stuck scene), and the stack information can be uploaded to our server at the appropriate time (WiFi environment & network environment is good). The server filters and analyzes the stack information and sends it to the client for optimization. In this way, the user experience can be better improved and problems can be found in the online environment in a timely manner.

At the same time, we can also timely find problems, timely optimize our code quality and execution efficiency.

(A good development cycle)


So how do we grab the method call stack when the App gets stuck? What does stack information look like? This article will use a concrete demo to illustrate how to do the stack capture operation.

Before that, I’d like to thank my idol bestswifter for his blog, “the stuff about getting call stacks for any thread”, which was so inspiring and helpful.

Next, let’s get to the topic of the day:

  1. What is the call stack?
  2. How do I grab the thread’s current call stack?
  3. How to symbolic analysis?
  4. Some special call stack
  5. (Supplement) How to detect App lag?

What is the call stack?

The call stack (call stack) :

In computer science, a stack that stores messages about running subroutines. — Wikipedia

In the running of our program, there is usually a case where one function calls another. For example, in A thread, func A is called. Func B is called during func A execution.

So what do you need to do at the bottom of a computer program?

  1. Transfer controlSuspended:func AAnd start executingfunc BAnd, infunc BWhen you’re done, go backfunc AContinue execution.
  2. To transfer datafunc AYou have to be able to pass parameters tofunc BAnd,func BReturn the returned value, if anyfunc A.
  3. Allocate and free memoryIn:func BAt the start of execution, memory is allocated for local variables that need to be used. infunc BWhen the execution is complete, the memory is released.

For example, I declare two functions: foo and bar. At the same time, the function bar is called in function foo.

- (void)foo {
    [self bar];
}

- (void)bar {
    NSLog(@"QiShare");
}
Copy the code

Under the simulator (x86), this is converted to the following assembly:

QiStackFrameLogger`-[ViewController foo]:
    0x105a1f0d0 <+0>:  pushq  %rbp
    0x105a1f0d1 <+1>:  movq   %rsp, %rbp
    0x105a1f0d4 <+4>:  subq   $0x10, %rsp
    0x105a1f0d8 <+8>:  movq   %rdi, -0x8(%rbp)
    0x105a1f0dc <+12>: movq   %rsi, -0x10(%rbp)
    0x105a1f0e0 <+16>: movq   -0x8(%rbp), %rax
    0x105a1f0e4 <+20>: movq   0x64a5(%rip), %rsi        ; "bar"
    0x105a1f0eb <+27>: movq   %rax, %rdi
    0x105a1f0ee <+30>: callq  *0x3f1c(%rip)             ; (void *)0x00007fff50ad3400: objc_msgSend
->  0x105a1f0f4 <+36>: addq   $0x10, %rsp
    0x105a1f0f8 <+40>: popq   %rbp
    0x105a1f0f9 <+41>: retq   
QiStackFrameLogger`-[ViewController bar]:
    0x105a1f100 <+0>:  pushq  %rbp
    0x105a1f101 <+1>:  movq   %rsp, %rbp
    0x105a1f104 <+4>:  subq   $0x10, %rsp
    0x105a1f108 <+8>:  leaq   0x3f61(%rip), %rax        ; @"QiShare"
    0x105a1f10f <+15>: movq   %rdi, -0x8(%rbp)
    0x105a1f113 <+19>: movq   %rsi, -0x10(%rbp)
->  0x105a1f117 <+23>: movq   %rax, %rdi
    0x105a1f11a <+26>: movb   $0x0, %al
    0x105a1f11c <+28>: callq  0x105a20cd4               ; symbol stub for: NSLog
    0x105a1f121 <+33>: jmp    0x105a1f121               ; <+33> at ViewController.m:24:5
Copy the code

On my real machine (ARM64), this will be converted to the following assembly:

QiStackFrameLogger`-[ViewController foo]:
    0x10443833c <+0>:  sub    sp, sp, #0x20 ; =0x20
    0x104438340 <+4>:  stp    x29, x30, [sp, #0x10]
    0x104438344 <+8>:  add    x29, sp, #0x10 ; =0x10
    0x104438348 <+12>: adrp   x8, 9
    0x10443834c <+16>: add    x8, x8, #0x5a8 ; =0x5a8
    0x104438350 <+20>: str    x0, [sp, #0x8]
    0x104438354 <+24>: str    x1, [sp]
    0x104438358 <+28>: ldr    x9, [sp, #0x8]
    0x10443835c <+32>: ldr    x1, [x8]
    0x104438360 <+36>: mov    x0, x9
    0x104438364 <+40>: bl     0x10443a0ac               ; symbol stub for: objc_msgSend
->  0x104438368 <+44>: ldp    x29, x30, [sp, #0x10]
    0x10443836c <+48>: add    sp, sp, #0x20 ; =0x20
    0x104438370 <+52>: ret    
QiStackFrameLogger`-[ViewController bar]:
    0x104438374 <+0>:  sub    sp, sp, #0x20 ; =0x20
    0x104438378 <+4>:  stp    x29, x30, [sp, #0x10]
    0x10443837c <+8>:  add    x29, sp, #0x10 ; =0x10
    0x104438380 <+12>: str    x0, [sp, #0x8]
    0x104438384 <+16>: str    x1, [sp]
->  0x104438388 <+20>: adrp   x0, 4
    0x10443838c <+24>: add    x0, x0, #0x58 ; =0x58
    0x104438390 <+28>: bl     0x104439fe0               ; symbol stub for: NSLog
    0x104438394 <+32>: b      0x104438394               ; <+32> at ViewController.m:24:5
Copy the code

When translated to a more intuitive diagram, it looks like this:

Currently, the vast majority of iOS devices are based on the arm64 architecture (iPhone5s and all subsequent devices). By querying the official documents of ARM, we can learn:

address The name of the role
sp Stack pointer The address where the current function is stored.
x30 Link Register Store the return address of the function.
x29 Frame Pointer Register The address of the upper-level function (same as x30).
x19~x28 Callee-saved registers This save register is called.
x18 The Platform Register The platform is reserved and the operating system itself is used.
X17, x16 Intra-procedure-call temporary registers Temporary register.
x9~x15 Temporary registers A temporary register used to hold local variables.
x8 Indirect result location register Indirect return address, used when the return address is too large.
x0~x7 Parameter/result registers Parameter/return value register.

Among them, the stack pointer (SP) and frame pointer (FP) are more important. Sp stores the top stack address of the current function, fp stores sp of the upper-level function.


How to fetch the current call stack of the thread?

Just now, we have seen that we can find the address of the function at the next level by using FP. You can find the addresses of all current method call stacks by constantly looking up the next fp level. (Backtracking)

Talk is easy, show me code.

  • The first step:

    First, we declare a structure to store chained stack pointer information. (sp+fp)
// typedef struct QiStackFrameEntry {const struct QiStackFrameEntry *const previouts; / /! Const uintptr_t return_address; / /! < address of current stack frame} QiStackFrameEntry;Copy the code

Yeah, it’s a linked list.

  • The second step:

    Take out thethreadIn themachine context
_STRUCT_MCONTEXT machineContext; // Declare a context and fetch the context from threadif(! [self qi_fillThreadStateFrom:thread intoMachineContext:&machineContext]) {return [NSString stringWithFormat:@"Fail to get machineContext from thread: %u\n", thread];
}
Copy the code

Concrete implementation:

/ *! @brief Extracts machineContext from thread. @param thread Specifies the current thread @param machineContextreturn*/ + (BOOL) qi_fillThreadStateFrom:(thread_t) thread intoMachineContext:(_STRUCT_MCONTEXT *)machineContext { mach_msg_type_number_t state_count = Qi_THREAD_STATE_COUNT; kern_return_t kr = thread_get_state(thread, Qi_THREAD_STATE, (thread_state_t)&machineContext->__ss, &state_count);return kr == KERN_SUCCESS;
}
Copy the code
  • Step 3:

    To obtainmachineContextIn the stack frame pointer address.

    throughfpThe traceback saves all method addresses inbacktraceBufferIn the array.

    Until you get to the bottom, and there’s no upper addressbreak.
uintptr_t backtraceBuffer[50];
int i = 0;
NSMutableString *resultString = [[NSMutableString alloc] initWithFormat:@"Backtrace of Thread %u:\n", thread];

const uintptr_t instructionAddress = qi_mach_instructionAddress(&machineContext);
backtraceBuffer[i++] = instructionAddress;

uintptr_t linkRegister = qi_mach_linkRegister(&machineContext);
if (linkRegister) {
    backtraceBuffer[i++] = linkRegister;
}

if (instructionAddress == 0) {
    return @"Fail to get instructionAddress.";
}

QiStackFrameEntry frame = {0};
const uintptr_t framePointer = qi_mach_framePointer(&machineContext);
if(framePointer == 0 || qi_mach_copyMem((void *)framePointer, &frame, sizeof(frame)) ! = KERN_SUCCESS) {return @"Fail to get frame pointer"; } // Assign to framefor(; i<50; i++) { backtraceBuffer[i] = frame.return_address; // Save the current addressif(backtraceBuffer[i] == 0 || frame.previouts == 0 || qi_mach_copyMem(frame.previouts, &frame, sizeof(frame)) ! = KERN_SUCCESS) {break; // Find the original framebreak}}Copy the code

In this case, the backtraceBuffer contains the method call address of the current thread (fp).

But the array backtraceBuffer is currently just a bunch of method addresses. We don’t know which method it refers to, right?

That requires the next “symbolic parsing” operation. Match each address to the corresponding symbol name (function/method name).


Three, how to symbolic analysis?

We can retrieve the addresses of all function calls under the thread by tracing back the frame pointer (FP). How do we match the address to the corresponding symbol (function/method name)?

This requires symbolic parsing steps. Symbolic resolution: If ADDRESS is displayed, go to Symbol.

  • Preparation:

    We don’t have to declare it this time, the system prepares the structure for usdl_info.

    Used specifically to store current symbolic information.
/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;
Copy the code
  • The first step:

    According to thebacktraceBufferThe size of the array, and declare one of the same sizedl_info[]Array to store symbolic information.
int backtraceLength = i; Dl_info symbolicated[backtraceLength]; qi_symbolicate(backtraceBuffer, symbolicated, backtraceLength, 0); / /! The < symbolCopy the code
  • The second step:

    throughaddressFind the symbolimage.

    The following method can get the correspondingimagetheindex(No.).
/ / to find the address of the image number uint32_t qi_getImageIndexContainingAddress (const uintptr_t address) {const uint32_t imageCount = _dyld_image_count(); Const struct mach_header *header = 0;for (uint32_t i = 0; i < imageCount; i++) {
        header = _dyld_get_image_header(i);
        if(header ! = NULL) {// Find the segment in the provided address rangecommanduintptr_t addressWSlide = address - (uintptr_t)_dyld_get_image_vmaddr_slide(i); / /! < ASLR uintptr_t cmdPointer = qi_firstCmdAfterHeader(header);if (cmdPointer == 0) {
                continue;
            }
            for (uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) {
                const struct load_command *loadCmd = (struct load_command*)cmdPointer;
                if (loadCmd->cmd == LC_SEGMENT) {
                    const struct segment_command *segCmd = (struct segment_command*)cmdPointer;
                    if(addressWSlide >= segCmd->vmaddr && addressWSlide < segCmd->vmaddr + segCmd->vmsize) {returni; }}else if (loadCmd->cmd == LC_SEGMENT_64) {
                    const struct segment_command_64 *segCmd = (struct segment_command_64*)cmdPointer;
                    if(addressWSlide >= segCmd->vmaddr && addressWSlide < segCmd->vmaddr + segCmd->vmsize) {returni; } } cmdPointer += loadCmd->cmdsize; }}}return UINT_MAX; // 没找到就返回UINT_MAX
}
Copy the code
  • Step 3:

    We got it.addressThe correspondingimagetheindex.

    And we can do some systematic methods and calculations to getheader, virtual memory address, ASLR offset (security considerations, in order to prevent hacking.iOS 5,Android 4After introduction).

    And the key thingsegmentBase(bybaseAddress + ASLRGet).
const struct mach_header *header = _dyld_get_image_header(index); Uintptr_t imageVMAddrSlide = (uintptr_t) _dyLD_GET_image_vmaddr_slide (index); // uintptr_t addressWithSlide = address-imagevmaddrslide; / / ASLR offset const uintptr_t segmentBase = qi_getSegmentBaseAddressOfImageIndex (index) + imageVMAddrSlide; // segmentBase is derived from index + ASLRif (segmentBase == 0) {
    return false;
}

info->dli_fname = _dyld_get_image_name(index);
info->dli_fbase = (void *)header;
Copy the code
  • Step 4:

    Find the corresponding symbol by searching the symbol table and assign the value todl_infoThe array.
Const Qi_NLIST* bestMatch = NULL; const Qi_NLIST* bestMatch = NULL; uintptr_t bestDistace = ULONG_MAX; uintptr_t cmdPointer = qi_firstCmdAfterHeader(header);if (cmdPointer == 0) {
    return false;
}
for (uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) {
    const struct load_command* loadCmd = (struct load_command*)cmdPointer;
    if(loadCmd->cmd == LC_SYMTAB) { const struct symtab_command *symtabCmd = (struct symtab_command*)cmdPointer; const Qi_NLIST* symbolTable = (Qi_NLIST*)(segmentBase + symtabCmd->symoff); const uintptr_t stringTable = segmentBase + symtabCmd->stroff; /* * struct symtab_command { uint32_t cmd; / LC_SYMTAB / uint32_t cmdsize; / sizeof(struct symtab_command) / uint32_t symoff; / symbol table offset/uint32_t nsyms; / number of symbol table entries/uint32_t stroff; / string table offset/uint32_t strsize; / string table sizeinBytes Size of the string table in bytes /}; * /for(uint32_t iSym = 0; iSym < symtabCmd->nsyms; ISym++) {// if n_value is 0, the symbol refers to an external object.if(symbolTable[iSym].n_value ! = 0) { uintptr_t symbolBase = symbolTable[iSym].n_value; uintptr_t currentDistance = addressWithSlide - symbolBase;if((addressWithSlide >= symbolBase) && (currentDistance <= bestDistace)) { bestMatch = symbolTable + iSym; bestDistace = currentDistance; }}}if(bestMatch ! = NULL) { info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddrSlide); info->dli_sname = (char*)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx);if (*info->dli_sname == '_') { info->dli_sname++; } // This happens if all symbols are removed.if (info->dli_saddr == info->dli_fbase && bestMatch->n_type == 3) {
                info->dli_sname = NULL;
            }
            break;
        }
    }
    cmdPointer += loadCmd->cmdsize;
}
Copy the code
  • Step 5:

    traversebacktraceBufferArray and assign symbolic informationdl_infoThe array.
// Symbolization: Convert backtraceBuffer (address array) to symbolsBuffer (symbol array). void qi_symbolicate(const uintptr_t* const backtraceBuffer, Dl_info* const symbolsBuffer, const int numEntries, const int skippedEntries) { int i = 0;if(! skippedEntries && i < numEntries) { qi_dladdr(backtraceBuffer[i], &symbolsBuffer[i]); i++; }for(; i < numEntries; i++) { qi_dladdr(CALL_INSTRUCTION_FROM_RETURN_ADDRESS(backtraceBuffer[i]), &symbolsBuffer[i]); / /! < Trace back the stack frame to find the corresponding symbol name. }}Copy the code
  • Summary: symbolic parsing, the complete code is as follows:
#pragma mark - Symbolicate// Symbolization: Convert backtraceBuffer (address array) to symbolsBuffer (symbol array). void qi_symbolicate(const uintptr_t* const backtraceBuffer, Dl_info* const symbolsBuffer, const int numEntries, const int skippedEntries) { int i = 0;if(! skippedEntries && i < numEntries) { qi_dladdr(backtraceBuffer[i], &symbolsBuffer[i]); i++; }for(; i < numEntries; i++) { qi_dladdr(CALL_INSTRUCTION_FROM_RETURN_ADDRESS(backtraceBuffer[i]), &symbolsBuffer[i]); / /! < Trace back the stack frame to find the corresponding symbol name. }} // Get the current function info from address: Bool qi_DLaddr (const uintptr_t address, Dl_info* const info) { info->dli_fname = NULL; info->dli_fbase = NULL; info->dli_saddr = NULL; info->dli_sname = NULL; const uint32_t index = qi_getImageIndexContainingAddress(address); // Find index in image based on address.if (index == UINT_MAX) {
        return false; / / didn't find the Header is returned UINT_MAX} / * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the Load commands Segmentcommand 1 -------------|
     Segment command2 | ------------------ | Data | Section 1 data |segment 1 <----| Section 2 data | <----| Section 3 data | <----| Section  4 data |segment 2 Section 5 data | ... | Section n data | */ /*----------Mach Header---------*/ const struct mach_header *header = _dyld_get_image_header(index); Uintptr_t imageVMAddrSlide = (uintptr_t) _dyLD_GET_image_vmaddr_slide (index); // uintptr_t addressWithSlide = address-imagevmaddrslide; / / ASLR offset const uintptr_t segmentBase = qi_getSegmentBaseAddressOfImageIndex (index) + imageVMAddrSlide; // segmentBase is derived from index + ASLRif (segmentBase == 0) {
        return false; } info->dli_fname = _dyld_get_image_name(index); info->dli_fbase = (void *)header; Const Qi_NLIST* bestMatch = NULL; const Qi_NLIST* bestMatch = NULL; uintptr_t bestDistace = ULONG_MAX; uintptr_t cmdPointer = qi_firstCmdAfterHeader(header);if (cmdPointer == 0) {
        return false;
    }
    for (uint32_t iCmd = 0; iCmd < header->ncmds; iCmd++) {
        const struct load_command* loadCmd = (struct load_command*)cmdPointer;
        if(loadCmd->cmd == LC_SYMTAB) { const struct symtab_command *symtabCmd = (struct symtab_command*)cmdPointer; const Qi_NLIST* symbolTable = (Qi_NLIST*)(segmentBase + symtabCmd->symoff); const uintptr_t stringTable = segmentBase + symtabCmd->stroff; /* * struct symtab_command { uint32_t cmd; / LC_SYMTAB / uint32_t cmdsize; / sizeof(struct symtab_command) / uint32_t symoff; / symbol table offset/uint32_t nsyms; / number of symbol table entries/uint32_t stroff; / string table offset/uint32_t strsize; / string table sizeinBytes Size of the string table in bytes /}; * /for(uint32_t iSym = 0; iSym < symtabCmd->nsyms; ISym++) {// if n_value is 0, the symbol refers to an external object.if(symbolTable[iSym].n_value ! = 0) { uintptr_t symbolBase = symbolTable[iSym].n_value; uintptr_t currentDistance = addressWithSlide - symbolBase;if((addressWithSlide >= symbolBase) && (currentDistance <= bestDistace)) { bestMatch = symbolTable + iSym; bestDistace = currentDistance; }}}if(bestMatch ! = NULL) { info->dli_saddr = (void*)(bestMatch->n_value + imageVMAddrSlide); info->dli_sname = (char*)((intptr_t)stringTable + (intptr_t)bestMatch->n_un.n_strx);if (*info->dli_sname == '_') { info->dli_sname++; } // This happens if all symbols are removed.if (info->dli_saddr == info->dli_fbase && bestMatch->n_type == 3) {
                    info->dli_sname = NULL;
                }
                break;
            }
        }
        cmdPointer += loadCmd->cmdsize;
    }
    return true;
}
Copy the code

Some special call stacks

It seems that our grasping scheme and grasping stack strategy are impeccable. But in the Release environment, there are special call stacks that we can’t catch because the compiler has optimized them for us.

1. Tail call optimization

The essence of tail-call optimization is the reuse of stack frames. Therefore, the original stack frame is reused each time the stack is pressed. At this point, the stack we catch is always the lowest stack, and the middle call stack is all lost.

PS: I wrote a blog post about tail-call optimization when I was an intern.

For reference:IOS objc_msgSend Tail Call Optimization In detail

2. Function inlining

This is also easier to understand because inline functions are expanded at compile time. The code block is copied directly, saving the extra time required to call the function. In addition, some compilers will automatically help us to optimize some logically simple functions into inline functions.

Therefore, functions optimized by the compiler as inline functions have no way of reaching the call stack.


P: How to detect App lag?

Please refer to my previous blog: iOS Performance Monitoring (2) – Main thread Lag Monitoring.

The App lag that we can perceive is due to the lag in the main thread, which leads to the delay in updating the UI and thus the loss of frames. (Normally, the iPhone’s screen is 60 FPS, or 60 refreshes a second.)

Therefore, the current better monitoring scheme is to use runloop principle to monitor App state.

The scheme is as follows:

  • Step 1: Start a child thread and open its runloop to make the child thread resident in the App.

  • Step 2: Create a RunloopObserver and add the RunloopObserver to the commonModes of the main Runloop. At the same time, the child thread’s runloop starts listening.

  • Step 3: Notify the RunloopObserver whenever the status of the main thread Runloop changes. And by sending GCD semaphore to ensure synchronous operation. Meanwhile, the child thread’s Runloop keeps listening.

  • Step 4: When the status of the main thread runloop is stuck in BeforeSources or AfterWaiting for a long time, the main thread is stuck.

  • Step 5: Detect the jam, grab the stack and keep the spot. At the same time, the call stack information is saved locally and reported to the server when appropriate.

Q1: Why CommonModes of the main thread? The main thread runloops are DefaultMode, UITrackingMode, UIInitializationMode, GSEventReceiveMode, and CommonModes. CommonModes is a collection of DefaultMode and UITrackingMode. Normally, these two modes are switched.

Q2: Why BeforeSources and AfterWaiting? This brings us to the order of execution of runloop, after BeforeSources, mainly processing Source0 events (responding to UIEvent). If the card stays in this state for too long, the current App cannot respond to click events. After AfterWaiting, the current thread has just woken up from sleep and is ready to execute the timer event. But again stuck in this state, not to execute. It can also indicate that the current App is stuck.

PS: For more details on the monitoring process, check out my previous blog.

For reference:IOS Performance Monitoring (2) — Main Thread Lag Monitoring.

Source:

GitHub address: QiStackFrameLogger


References and acknowledations: 1. “Getting the Call Stack for Any Thread” — BestSwifter 2. “iOS Developer’s Class” — Mr. Daming 3. 5. What is Virtual Memory? 6. Arm64 Official Documentation


To learn more about iOS and related new technologies, please follow our official account:

You can add the following xiaobian wechat, and note to join the QiShare technical exchange group, xiaobian will invite you to join the QiShare technical Exchange Group.

QiShare(Simple book) QiShare(digging gold) QiShare(Zhihu) QiShare(GitHub) QiShare(CocoaChina) QiShare(StackOverflow) QiShare(wechat public account)

IOS View and export the project run logs, the use of Flutter Platform Channel and source code analysis and development do not cut map how to do? Vector icon (iconFont) Getting started with guide DarkMode, WKWebView, Apple login must be adapted? ATaller Strange dance weekly