This article is published by the Cloud + community

Performance issues like lag or deadlocks are inevitable during iOS development, and it helps to have a call stack. How do you get the call stack of a function in real time in an application? In this article, I refer to a number of blog posts on how to get a call stack using Mach Threads. The basic concepts of stack frames are covered in this article, and a demo of assembly code is provided to understand how to get a call chain.

First, stack frame and other concepts

Let’s throw out the concept of a stack frame to explain what a stack frame is.

Each newly created thread in the application has a dedicated stack space that can be used freely for the duration of the thread. There are tens of thousands of function calls in the thread, and these functions share the stack space of the process, so the problem is that there are a lot of steps in and out of the stack during the operation of the function. When the function returns the backtrace, how can we accurately locate the return address? And the contents of some registers saved by the child function? Thus comes the concept of stack frames, that is, the stack space used by each function is one stack frame, and all the stack frames make up the complete stack of the thread.

Here are a few more concepts:

Fp, SP, LR, PC in registers.

A register is a small piece of memory that is closely related to the CPU and is often used to store some data that is in use. The ARM processor for the 32-bit architecture ARMV7 instruction set has 16 registers ranging from R0 to R15, each of which is a 32-bit bit. The calling convention specifies that some of their registers have special uses, for example:

  • R0-r3: Used to store parameters passed to a function;
  • R4-r11: Used to store the local parameters of a function.
  • R11: Commonly used as frame pointer FP (frame Pointer register), stack frame base address register, pointing to the bottom of the current function stack frame, it provides a way to trace the program back to the function called.
  • R12: is the internal program call temporary register. This register is special because it can be changed by a function call;
  • R13: stack pointer SP. Stack is a very important term in computer science. The register holds a pointer to the top of the stack. See here to learn more about stacks;
  • R14: is the link register (LR). It saves the address of the current function when the current function returns;
  • R15: is program counter PC (Program Counter). It holds the address of the currently executing instruction. It will increase automatically after each instruction is executed;

Different instruction sets may have different numbers of registers, and PC, LR, SP, FP may also use different registers. Later, we will ignore register numbers such as R11 and directly use FP, SP, LR to tell the story

As you can see, the structure of the older frame, the caller frame, and the current frame is exactly the same, because each frame is based on a function, and frames are created, developed, and died along with the life cycle of the function. In this process we use the register mentioned above, the FP frame pointer, which always points to the bottom of the current frame; Sp stack pointer, which always points to the top of the current frame. These two registers are used to locate all Spaces in the current frame. The compiler needs to carefully adjust the values of these two registers according to the rules of the instruction set, and errors can cause problems in parameter passing and function return.

In fact, these registers meet certain rules, such as:

  • Fp refers to the bottom of the stack frame and the value stored in this address is the address of the FP of the last stack frame that called the current stack frame.
  • Lr is always at the top of the previous stack frame (i.e. the one calling the current stack frame), and stack frames are stored continuously, so LR is the last address at the bottom of the current stack frame, and so on to deduce the order of all functions. Notice here, the bottom of the stack is at a high address, and the stack is growing down

From this we can further think that sp and FP indicated by the stack frame can be recovered from the parent function of the stack frame, recursive recovery will recover in addition to the call stack. As in the following code, each recursive PC store *(fp + 1) is actually the return address, it is in the caller’s function, using this address we can restore the corresponding method name through the symbol table.

while(fp) {
  pc = *(fp + 1);
  fp = *fp;
}
Copy the code

Two, assembly interpretation

If you have to ask why this is the case, we can look at how the function is called from an assembly point of view to better understand why FP always stores the address of fp on the previous stack frame, while FP always stores lr on the previous stack frame.

Writing the following demo program, since I was doing experiments on the MAC, I compiled the executable directly using Clang, and then disassembled the assembly code using hopper. Of course, I could also use Clang directly

The -s argument specifies the production assembly code.

The demo source code

#import <Foundation/Foundation.h>

int func(int a);

int main (void)
{
	int a = 1;
	func(a);
    return 0;
}

int func (int a)
{
	int b = 2;
	return a + b;
}
Copy the code

Assembly language

        ; ================ B E G I N N I N G   O F   P R O C E D U R E ================

        ; Variables:
        ;    var_4: -4
        ;    var_8: -8
        ;    var_C: -12


                     _main:
0000000100000f70         push       rbp
0000000100000f71         mov        rbp, rsp
0000000100000f74         sub        rsp, 0x10
0000000100000f78         mov        dword [rbp+var_4], 0x0
0000000100000f7f         mov        dword [rbp+var_8], 0x1
0000000100000f86         mov        edi, dword [rbp+var_8]                      ; argument #1 for method _func
0000000100000f89         call       _func
0000000100000f8e         xor        edi, edi
0000000100000f90         mov        dword [rbp+var_C], eax
0000000100000f93         mov        eax, edi
0000000100000f95         add        rsp, 0x10
0000000100000f99         pop        rbp
0000000100000f9a         ret
                        ; endp
0000000100000f9b         nop        dword [rax+rax]


        ; ================ B E G I N N I N G   O F   P R O C E D U R E ================

        ; Variables:
        ;    var_4: -4
        ;    var_8: -8


                     _func:
0000000100000fa0         push       rbp                                         ; CODE XREF=_main+25
0000000100000fa1         mov        rbp, rsp
0000000100000fa4         mov        dword [rbp+var_4], edi
0000000100000fa7         mov        dword [rbp+var_8], 0x2
0000000100000fae         mov        edi, dword [rbp+var_4]
0000000100000fb1         add        edi, dword [rbp+var_8]
0000000100000fb4         mov        eax, edi
0000000100000fb6         pop        rbp
0000000100000fb7         ret
Copy the code

It should be noted that since the executable program is compiled on MAC, the instruction set is x86-64, so the names of FP, SP, LR, PC and registers used above have changed, but the meanings are basically the same, and the corresponding relationship is as follows:

  • fp—-rbp
  • sp—-rsp
  • pc—-rip

Call func: call func: call func: call func: call func: call func: call func: call func: call func

Pushl %rip // Save the address of the next instruction (the code address on line 41) for the function to return to continue execution
Jmp _func // Jump to function foo
Copy the code

So, when main calls func, it pushes the next line of address onto the stack. At this point, main’s stack frame is finished, and it jumps to the func code to continue execution. It can be seen that the next address of the function pointed to by RIP, namely LR mentioned above, has been pushed to the top of the stack frame.

As can be seen from the func code, push RBP is used to save the frame pointer at first, but RBP is actually the frame pointer of the last stack frame, that is, its value is actually the bottom address of the last stack frame, so this step actually saves the bottom address of the last frame.

The next assembly statement mov RBP, RSP updates the top address of the stack RSP to RBP, so that the value of RBP becomes the top address of the stack, which is also the start of the current stack frame, namely FP. The top of the stack is exactly the address of the last frame pointer that was just pushed in, so RBP points to the bottom of the current stack frame, but holds the value of the bottom address of the last stack frame.

This explains why the address to which FP points is stored is the address of FP in the previous stack frame, and why fp is lr in the previous address.

Another important thing is the order of the entry and exit of the stack. In the ARM instruction system, it is the address decrement stack. The parameters of the push operation are pushed from right to left, and the parameters are pushed from left to right. Including push/ POP and LDMFD/STMFD, etc.

Get the call stack step

In fact, the above fp, LR, sp are defined in the Mach kernel API, we can use the corresponding API to get the corresponding values. Here are the definitions of 64-bit and 32-bit

_STRUCT_ARM_THREAD_STATE64
{
	__uint64_t    __x[29];	/* General purpose registers x0-x28 */
	__uint64_t    __fp;		/* Frame pointer x29 */
	__uint64_t    __lr;		/* Link register x30 */
	__uint64_t    __sp;		/* Stack pointer x31 */
	__uint64_t    __pc;		/* Program counter */
	__uint32_t    __cpsr;	/* Current program status register */
	__uint32_t    __pad;    /* Same size for 32-bit or 64-bit clients */
};
_STRUCT_ARM_THREAD_STATE
{
	__uint32_t	r[13];	/* General purpose register r0-r12 */
	__uint32_t	sp;		/* Stack pointer r13 */
	__uint32_t	lr;		/* Link register r14 */
	__uint32_t	pc;		/* Program counter r15 */
	__uint32_t	cpsr;		/* Current program status register */
};
Copy the code

So, we just need to get the corresponding FP and LR, recursively find the parent function’s address, and finally symbolize it, we can restore the call stack.

To summarize, the following steps are required to obtain the call stack:

1. Suspend the thread

thread_suspend(main_thread);
Copy the code

Get the current thread state context thread_get_state

_STRUCT_MCONTEXT ctx;

#if defined(__x86_64__)
    
    mach_msg_type_number_t count = x86_THREAD_STATE64_COUNT;
    thread_get_state(thread, x86_THREAD_STATE64, (thread_state_t)&ctx.__ss, &count);

#elif defined(__arm64__)
    _STRUCT_MCONTEXT ctx;
    mach_msg_type_number_t count = ARM_THREAD_STATE64_COUNT;
    thread_get_state(thread, ARM_THREAD_STATE64, (thread_state_t)&ctx.__ss, &count);

#endif
Copy the code

3. Get the current frame pointer FP

#if defined(__x86_64__)
    uint64_t pc = ctx.__ss.__rip;
    uint64_t sp = ctx.__ss.__rsp;
    uint64_t fp = ctx.__ss.__rbp;
#elif defined(__arm64__)
    uint64_t pc = ctx.__ss.__pc;
    uint64_t sp = ctx.__ss.__sp;
    uint64_t fp = ctx.__ss.__fp;
#endif
Copy the code

4. Recursively traverse FP and LR, and record the address of LR in turn

while(fp) {
  pc = *(fp + 1);
  fp = *fp;
}
Copy the code

In this step, we actually use the above method to iterate the function address on the call chain, the code is as follows

void* t_fp[2];

vm_size_t len = sizeof(record);
vm_read_overwrite(mach_task_self(), (vm_address_t)(fp),len, (vm_address_t)t_fp, &len);

do {

    pc = (long)t_fp[1]  // LR is always at the address above fp
    // Record the values of PC in sequence
    printf(pc)
    
    vm_read_overwrite(mach_task_self(),(vm_address_t)m_cursor.fp[0], len, (vm_address_t)m_cursor.fp,&len);

} while (fp);
Copy the code

The above code will print out the address in the call stack function from bottom to top. This address is always the next address in the function call place, and we need to restore the corresponding symbol name from this address.

Resume thread thread_resume

thread_resume(main_thread);
Copy the code

Restore the symbol table

This step is mainly to obtain the address on the call chain to parse out the corresponding symbol. The main reference is to the run-time method of fetching the function call stack, which uses the basics of dyLD linking to the Mach-O file, which will be summarized in a subsequent article.

enumerateSegment(header, [&](struct load_command *command) {
    if (command->cmd == LC_SYMTAB) {
        struct symtab_command *symCmd = (struct symtab_command *)command;
        
        uint64_t baseaddr = 0;
        enumerateSegment(header, [&](struct load_command *command) {
            if (command->cmd == LC_SEGMENT_64) {
                struct segment_command_64 *segCmd = (struct segment_command_64 *)command;
                if (strcmp(segCmd->segname, SEG_LINKEDIT) == 0) {
                    baseaddr = segCmd->vmaddr - segCmd->fileoff;
                    return true; }}return false;
        });
        
        if (baseaddr == 0) return false;
        
        nlist_64 *nlist = (nlist_64 *)(baseaddr + slide + symCmd->symoff);
        uint64_t strTable = baseaddr + slide + symCmd->stroff;
        
        uint64_t offset = UINT64_MAX;
        int best = - 1;
        for (int k = 0; k < symCmd->nsyms; k++) {
            nlist_64 &sym = nlist[k];
            uint64_t d = pcSlide - sym.n_value;
            if(offset >= d) { offset = d; best = k; }}if (best >= 0) {
            nlist_64 &sym = nlist[best];
            std::cout << "SYMBOL: " << (char *)(strTable + sym.n_un.n_strx) << std::endl;
        }
        
        return true;
    }
    return false;
});
Copy the code

reference

Function call stack space and FP register

Function call stack

Also talk about stacks and stack frames

Get the function call stack at run time

In-depth analysis of Mac OS X & iOS learning Notes

This article has been published by Tencent Cloud + community authorized by the author

For more fresh technology dry goods, you can follow usTencent Cloud technology community – Cloud Plus community official number and Zhihu organization number