This is the third article in the Mach-O series

Before you read the FishHook source code, you may want to have a brief understanding of the following

  • Mach-o file format: Interesting exploration mach-O: File format analysis
  • Mach-O dynamic linking process, interesting exploration Mach-O: loading process
  • Understanding of operating system and compilation principle: In-depth analysis of Mac OS X & iOS operating system and self-cultivation of programmers

This article follows the sequence of function calls

What can Fishhook do

Here, an analysis chart of Alibaba can be used to clearly understand the roles played by FishHook

Ali bacc the

FishHook here works on the dynamically linked library, modifying the corresponding function implementation

For the C function in the dynamic link library, the first time we call it, we get the connection between the function and its implementation address, which is stored in a place called la_symbol_ptr, and the second time we call it, we just find the function address through la_symbol_ptr. No more tedious process of getting the address of the function. (For details, see the link above: Dynamic linking for Mach-O.)

So, the meaning of the picture above is clear

When the program runs, the dynamically linked C function dynamic(…) The address is recorded in la_symbol_ptr under the DATA segment; At the beginning, the program only knows the symbol name of the dynamic function but not the implementation address of the function. On the first call, the program uses stub_helper in the TEXT segment to get binding information and dyLD_STUB_binder to update the symbol implementation address in la_SYMBOL_ptr. This way, when called again, the implementation of the dynamic function can be found directly through la_symbol_ptr; If we need to replace the implementation of the dynamic function, we can simply change __la_symbol_ptr, which is Fishhook

The realization of the Fishhook

Fishhook’s official documentation shows how to use Fishhook:

static int (*original_open)(const char *, int, ...) ; int new_open(const char *path, int oflag, ...) { va_list ap = {0}; mode_t mode = 0;if((oflag & O_CREAT) ! = 0) { // mode only applies to O_CREAT va_start(ap, oflag); mode = va_arg(ap, int); va_end(ap);printf("Calling real open('%s', %d, %d)\n", path, oflag, mode);
        return original_open(path, oflag, mode);
    } else {
        printf("Calling real open('%s', %d)\n", path, oflag);
        return original_open(path, oflag, mode);
    }
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        struct rebinding open_rebinding = { "open", new_open, (void *)&original_open };
        rebind_symbols((struct rebinding[1]){open_rebinding}, 1);
        __unused int fd = open(argv[0], O_RDONLY);
    }
    return 0;
}Copy the code

Let’s start with rebind_symbols, which uses _dyLD_register_func_for_add_image to register callback functions that perform operations when loading the dynamic library

Int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) { Int retval = prepend_rebindings(&_REBindingS_head, rebindings, rebindings_nel);if (retval < 0) {
    returnretval; } // If this was the first call, register callbackfor image additions (which is also invoked for
  // existing images, otherwise, just run on existing images
  if(! _rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); }else {
    uint32_t c = _dyld_image_count();
    for(uint32_t i = 0; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); }}return retval;
}Copy the code

The code for prepend_rebindings is as follows

Struct rebindings_entry {struct rebinding *rebindings; size_t rebindings_nel; struct rebindings_entry *next; }; static struct rebindings_entry *_rebindings_head; static int prepend_rebindings(struct rebindings_entry **rebindings_head, struct rebinding rebindings[], size_t nel) { struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));if(! new_entry) {return- 1; } new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);if(! new_entry->rebindings) { free(new_entry);return- 1; } // insert rebindings into the linked list header memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel); new_entry->rebindings_nel = nel; new_entry->next = *rebindings_head; *rebindings_head = new_entry;return 0;
}Copy the code

Infrastructure interpretation

Dl_info

/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;Copy the code

Any valid information that we will later process through dladdr () will be put into this structure

  • Fname:Path name, for example
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator.sdk/System/ Library/Frameworks/CoreFoundation.framework/CoreFoundationCopy the code
  • Dli_fbase:The Base address of a shared object (such as CoreFoundation above)
  • Dli_saddr:Address of symbol
  • Dli_sname:The name of the symbol, the function information in the fourth column below
Thread 0:
0     libsystem_kernel.dylib          0x11135810a __semwait_signal + 94474
1     libsystem_c.dylib               0x1110dab0b sleep + 518923
2     QYPerformanceMonitor            0x10dda4f1b -[ViewController tableView:cellForRowAtIndexPath:] + 7963
3     UIKit                           0x10ed4d4f4 -[UITableView _createPreparedCellForGlobalRow:withIndexPath:willDisplay:] + 1586420Copy the code

LC_SYMTAB

struct symtab_command {
    uint32_t    cmd;        /* LC_SYMTAB */
    uint32_t    cmdsize;    /* sizeof(struct symtab_command) */
    uint32_t    symoff;        /* symbol table offset */
    uint32_t    nsyms;        /* number of symbol table entries */
    uint32_t    stroff;        /* string table offset */
    uint32_t    strsize;    /* string table size in bytes */
};Copy the code

This provides the offset of the symbol table, the number of elements, and the offset and length of the string table. The address of the symbol table in the Mach-O object file can be found by symoff specified by the LC_SYMTAB load command. The corresponding symbol name is in stroff, and there is a total of NSYMS bar symbol information

LC_DYSYMTAB

This array structure is a bit complex, you can read loader.h file, internal marking dynamic symbol table offset and symbol number

struct dysymtab_command {
    uint32_t cmd;    /* LC_DYSYMTAB */
    uint32_t cmdsize;    /* sizeof(struct dysymtab_command) */
    uint32_t indirectsymoff; /* file offset to the indirect symbol table */
    uint32_t nindirectsyms;  /* number of indirect symbol table entries */
    .......Copy the code

_rebind_symbols_for_image

The key code for _rebind_SYMBOLs_for_image is as follows

static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
  Dl_info info;
  if (dladdr(header, &info) == 0) {
    return; } // segment_command_64 segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL; // LC_SYMTAB struct symtab_command* symtab_cmd = NULL; // LC_DYSYMTAB struct dysymtab_command* dysymtab_cmd = NULL; Uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      if(STRCMP (cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {// traverse to find __LINKEDIT linkedit_segment = cur_seg_cmd; }}else if(cur_seg_cmd-> CMD == LC_SYMTAB symtab_cmd = (struct symtab_command*)cur_seg_cmd; }else if(cur_seg_cmd-> CMD == LC_DYSYMTAB) {struct dysymtab_cmd = (struct dysymtab_command)cur_seg_cmd; }}Copy the code

Why are we looking for loadCommands? __LINKEDIT, LC_DYSYMTAB, LC_SYMTAB all provide important information.

The __LINKEDIT section contains raw data for the dynamic link library, such as symbols, strings, relocation table entries, and so on

Before reading the code below, let’s look at a formula

The base address = __linkedit.vm_address – __linkedit.file_offset + silde change value when linking

There’s a slide, so what’s a slide? So let’s look at ASLR

ASLR: Address space Layout randomization, which loads the executable into memory at random. This randomization is offset, not scrambled, by “shifting” the mach-O segment by some random factor through the kernel. Slide is the offset introduced by ASLR

That is, the program’s base address is equal to the __LINKEDIT address minus the offset, and then the offset caused by ASLR

Uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment-> vmaddr-linkedit_segment ->fileoff; Nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff); Char *strtab = (char *)(linkedit_base + symtab_cmd->stroff); Uint32_t *indirect_symtab = (uint32_t *)(uintdit_base + dysymtab_cmd-> indirectSymoff);Copy the code

The symbol table elements are all nlist_t structures. There is a lot of knowledge in nlist_T, so let’s take a look at the infrastructure

/*
 * This is the symbol table entry structure for32-bit architectures. */ struct nlist { union { uint32_t n_strx; /* index into the string table */ } n_un; uint8_t n_type; / *type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    int16_t n_desc;        /* see <mach-o/stab.h> */
    uint32_t n_value;    /* value of this symbol (or stab offset) */
};Copy the code

Then iterate over loadCommands again, looking for __DATA and __DATA_CONST sections, and rebind __NL_symbol_ptr and __la_symbol_ptr

  cur = (uintptr_t)header + sizeof(mach_header_t);
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) {continue; } // Find the __DATA and __DATA_CONST sections and rebind __nl_symbol_ptr and __la_symbol_ptrfor (uint j = 0; j < cur_seg_cmd->nsects; j++) {
        section_t *sect =
          (section_t *)(cur + sizeof(segment_command_t)) + j;
        if((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {sect for sections, symtab for symbol tables, strtab for string tables, Indirect_symtab (indirect Symbol table) Perform_REBinding_WITH_section (Rebindings, Sect, Slide, symtab, strtab, indirect_symtab); }if((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) { perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab); }}}Copy the code

##perform_rebinding_with_section

The reserveD1 field in nl_SYMBOL_ptr and LA_symbol_ptrsection indicates the corresponding index from the indirect Symbol table.

For the two relevant sections, the section headers (struct sections from ) provide an offset (in the reserved1 field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the LINKEDIT segment of the binary, is just an array of indexes into the symbol table (also in LINKEDIT) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections

So, given struct section nl_symbol_ptr, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]. The symbol table itself is an array of struct nlists (see ), and each nlist contains an index into the string table in LINKEDIT which where the actual symbol names are stored. So, for each pointer nl_symbol_ptr and __la_symbol_ptr, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.

Combined with English, the following code is easy to understand

// sect for Section, symtab for symbol table, strtab for string table, Indirect_symtab static void perform_REBinding_with_section (struct Rebindings_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, The 'reserved1' field in the uint32_t *indirect_symtab) {// 'nl_symbol_ptr' and 'la_symbol_ptr' sections indicates the corresponding 'indirect symbol' Uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);for(uint i = 0; i < section->size / sizeof(void *); // uint32_t symtab_index = indirect_symbol_indices[I];if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
        symtab_index == (INDIRECT_SYMBOL_LOCAL   | INDIRECT_SYMBOL_ABS)) {
      continue; Uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; Char *symbol_name = strtab + strtab_offset;Copy the code

In fact, the above code can be used in an official image is very intuitive representation

I’m going to go here and I’m going to find the symbol for the string table

How to replace the implementation

Traverse the rebindings array, symbols for comparison, the same symbols for implementation of replacement, here the code is more clear, directly posted

    struct rebindings_entry *cur = rebindings;
    while (cur) {
        for (uint j = 0; j < cur->rebindings_nel; j++) {
            if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
                if(cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; }Copy the code

Refer to the link

  • Dynamically modify C language function implementation
  • MRH Fihshook source analysis
  • fishhook
  • In-depth analysis of Mac OS X & iOS operating systems
  • Programmer self-cultivation
  • Compilation system roaming