Distance from the last article has been a period of time, although not many people read but good self entertainment, some time ago to go out a wave of abuse, so began to work hard, began to more commonly used before some severe open source library learning. Without further further, Fishhook is a library created by Facebook that can modify the external link with a C method (a method not written by facebook, which usually exists in the dynamic library loaded with dyld at app launch). The entire file is just over 200 lines of code.

usage

If you are using a tripartite framework and it is printing some useless information all the time and there is no good product to replace it on the market, you can hook the corresponding printing function by calling fishhook, for example

static void (*orig_printf)(char *format, ...) ; int main(int argc, const char * argv[]) {printf("abcd");
	return0; } static void my_printf(const char * s, ...) {// you can write whatever code you want to replace // for example orig_printf("dcba");
}
__attribute((constructor)) void injected_function(){
	rebind_symbols((struct rebinding[1]){{"printf", my_printf, (void *)&orig_printf}},1);
}
Copy the code

Int rebind_symbols(struct rebinding rebindings[], struct rebinding rebindings[], The first argument to size_t rebindings_nel is an array of structs

struct rebinding {
  const char *name; 
  void *replacement;
  void **replaced;
};
Copy the code

Name is the name of the function you want to hook,replacement is the function pointer after replacement, and replaced is a pointer to the function pointer passed in (if the function is successfully replaced, the value of the original function is put into it).

Of course, in many cases, we will use Fishhook in reverse, and fishhook can only be used to replace the code in the external dynamic library linked, not the C function written by ourselves

Mach-O

When I went to read Fishhook, I personally thought it was necessary to know something about Mach-o. Mach-o is an executable file for iOS/MacOS. When compiled using command+b under iOS project, it will generate a.app file in the Products directory. A file with the same name in our app is our Mach-o file, which contains our app’s classes, methods, and constants that are determined at compile time.

  • Header: Contains some basic mach-O information, such as 32-bit / 64-bit, number of loadCommands, etc
  • LoadCommand: This section is followed by headers, which are used to determine memory distribution when loading Mach-o
  • Data: This contains specific Data, which is subdivided into multiple segments. Segments are divided into multiple sections, which contain specific information such as code and Data

    Let’s actually use the tools to look at Mach-o, create an iOS project, do nothing, compile it, and use it. RightMachOViewTo view

    Because the focus of this article isfishhook, our main concern__DataIn the period of__la_symbol_ptrThis section(this section represents the lazy-loaded symbol table, if we write our own function will determine that the address is written to Macho at compile time, whereas system functions such as printf do not recognize the address during compile time), and then the related concerns need to be addressedSymbol Table,Dynamic Symbol Tablewith__LINKEDITPeriod of

There are so many things in MachO that I don’t want to describe them. If you know something about MachO, you can read the fishhook source code. If you want to know more about MachO, you can read this blog

Read the source code

Let’s get started and read the source code for Fishhook. Int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel)

int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
  int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
  if (retval < 0) {
    return retval;
  }
  // If this was the first call, register callback for image additions (which is also invoked for
  // existing images, otherwise, just run on existing images
  if(! _rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); }else {
    uint32_t c = _dyld_image_count();
    for(uint32_t i = 0; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); }}return retval;
}
Copy the code

Static struct rebindings_entry *_rebindings_head (struct) The third value is the length of the array of structures

static int prepend_rebindings(struct rebindings_entry **rebindings_head,
                              struct rebinding rebindings[],
                              size_t nel) {
  struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));
  if(! new_entry) {return- 1; } new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);if(! new_entry->rebindings) { free(new_entry);return- 1; } memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel); new_entry->rebindings_nel = nel; new_entry->next = *rebindings_head; *rebindings_head = new_entry;return 0;
}
Copy the code

The code here is relatively simple

  • Initialize astruct rebindings_entryThe structure of the body
  • Initialize the array in the struct
  • Copy the values of the struct array we passed into the newly initialized array
  • After placing the newly initialized structure at the top of the list and going down the list, you see that the next pointer to the list is used to determine whether the method is called for the first time, and if so, call it_dyld_register_func_for_add_imageMethod and pass in_rebind_symbols_for_imageA function pointer

_dyLD_register_func_for_add_image registers custom callbacks and also executes callbacks for any dynamic libraries or executables that have been loaded

Each dynamic library calls back to the method _rebind_symbols_for_image, and that method is just a wrapper around rebind_symbols_for_image, which is a long implementation code that we can split into two parts see

static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
  Dl_info info;
  if (dladdr(header, &info) == 0) {
    return;
  }

  segment_command_t *cur_seg_cmd;
  segment_command_t *linkedit_segment = NULL;
  struct symtab_command* symtab_cmd = NULL;
  struct dysymtab_command* dysymtab_cmd = NULL;

  uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      if(strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; }}else if (cur_seg_cmd->cmd == LC_SYMTAB) {
      symtab_cmd = (struct symtab_command*)cur_seg_cmd;
    } else if(cur_seg_cmd->cmd == LC_DYSYMTAB) { dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd; }}... }Copy the code

This code is used to retrieve the structure of the corresponding Symbol Table, Dynamic Symbol Table and __LINKEDIT segment

  • Because LoadCommand follows mac_headeruintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);Gets the location of the first LoadCommand
  • Then we iterate to get the corresponding structure based on the value of CMD

Looked down on

static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
    ...
  // Find base symbol/string table addresses
  uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
  nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
  char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);

  // Get indirect symbol table (array of uint32_t indices into symbol table)
  uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);

  cur = (uintptr_t)header + sizeof(mach_header_t);
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) {continue;
      }
      for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
        section_t *sect =
          (section_t *)(cur + sizeof(segment_command_t)) + j;
        if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
        if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
      }
    }
  }
}
Copy the code

The base address of the program is = sild + __LINKEDIT->vmaddr – __LINKEDIT->vmaddr is the address of __LINKEDIT in memory,fileoff is the offset of __LINKEDIT in mach-o, so what is silde? In fact, SILde is ASLR, then what is ASLR? ASLR: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization After obtaining the base address of the program, the data in the symbol table is calculated according to the offset in the symbol table, and then the LoadCommand is iterated to find the __DATA and __DATA_CONST sections. And rebind __NL_symbol_ptr and __la_symbol_ptr. Next, the perform_rebinding_with_section function is called

static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
                                           section_t *section,
                                           intptr_t slide,
                                           nlist_t *symtab,
                                           char *strtab,
                                           uint32_t *indirect_symtab) {
  uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
  void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
  for (uint i = 0; i < section->size / sizeof(void *); i++) {
    uint32_t symtab_index = indirect_symbol_indices[i];
    if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
        symtab_index == (INDIRECT_SYMBOL_LOCAL   | INDIRECT_SYMBOL_ABS)) {
      continue;
    }
    uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
    char *symbol_name = strtab + strtab_offset;
    bool symbol_name_longer_than_1 = symbol_name[0] && symbol_name[1];
    struct rebindings_entry *cur = rebindings;
    while (cur) {
      for (uint j = 0; j < cur->rebindings_nel; j++) {
        if (symbol_name_longer_than_1 &&
            strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
          if(cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; }}Copy the code

This function looks a bit long, but the logic is easy to understand, first based on the address in the dynamic symbol table + the index in the symbol table Obtain the position of the segment in the dynamic symbol table (the value of reserved1 represents the offset), perform a for loop to obtain the method name of the corresponding method on each loop, and then iterate through the private structure struct Rebindings_entry *, the list of each method of the structure of the array compared with the current methods of the table, if the same information stored in the symbol table pointer to give out the incoming call pointer to a function pointer, the pointer in the table as we pass in a function pointer, so just completed a process of the Italian job. Finally borrow the official picture