• Self-cultivation of an iOS programmer (I) Compile and link
  • What’s in Mach-O
  • An iOS programmer’s Self-cultivation (iii) Mach-O file static links
  • Self-cultivation of an iOS programmer (iv) Executable file loading
  • An iOS programmer’s Self-cultivation (5) Mach-O file dynamic linking
  • The Self-cultivation of an iOS Programmer (6) Dynamically linked Applications: The Fishhook Principle
  • The self-cultivation of an iOS programmer (7) Static link Applications: The Principle of static library staking
  • Self-cultivation of an iOS programmer (8) Memory

preface

There is a classic interview question, why fishHook can only hook c functions in the dynamic library? Or conversely, why can’t Fishhook hook OC methods and our own c functions?

First take a look at fishhook’s description of how it works:

dyld binds lazy and non-lazy symbols by updating pointers in particular sections of the __DATA segment of a Mach-O binary. fishhook re-binds these symbols by determining the locations to update for each of the symbol names passed to rebind_symbols and then writing out the corresponding replacements.

Fishhook hooks lazy and non-lazy symbols on the __DATA segment of Mach-o by dynamically binding. Once the symbols are located in Mach-O, they are rebound and then made to call a custom replacement function.

Let’s answer the two questions above:

  1. Why can’t Fishhook hook OC method? All OC methods will eventually compile to C methods, remember aboveDynamic linkIn the exampletestPrintMethod? Eventually it compiles toobjc_msgSendFunction call, stored in the DATA sectionLazy Symbol PointersIn the table. Due to thePLTThe mechanism for binding, which eventually points to the dyLD_stub_binder function in the GOT segment, was examined in the last dynamic linking article. Of course, the hook objc_msgSend function is feasible. The DATA section can be read and written. It can be hooked by modifying the pointer of the objc_msgSend function in the Mach-O file, which is also the core principle of THE C function of hook dynamic library. Nothing can be done.
  2. Why can’t Fishhook hook our own c function? Remember the first case mentioned in the introduction of PIC in dynamic linking? When generating a mach-O file, the jump address of the internal function has been determined by PC addressing, and the instructions are stored in the TEXT section, and the permissions cannot be changed only.

Fishhook source code analysis

Fishhook code is very small, the entire Fishhook. C only 257 lines of code, the following combined with the understanding of The Mach-O file structure and dynamic linking, analyze the fishhook source code process. Before analyzing the entry function Rebind_symbols, it is necessary to understand the internal data structure of Fishhook. Fishhook maintains a one-way linked list inside, and the node structure of the list is as follows:

struct rebindings_entry { struct rebinding *rebindings; // Rebindings structure body size_t rebindings_nel; Struct rebindings_entry *next; // Next node in the list};Copy the code

The rebinding structure stored in the Rebindings array is the one we defined when we used Fishhook. It has the following structure:

struct rebinding { const char *name; // Function name void *replacement; // The new function pointer void ** replaces; // hold the pointer to the variable with the original function address};Copy the code

Then look at the processing of data structures in Fishhook using prepend_rebindings:

static int prepend_rebindings(struct rebindings_entry **rebindings_head, struct rebinding rebindings[], Struct rebindings_entry *new_entry = (struct rebindings_entry *) malloc(sizeof(struct rebindings_entry)); if (! new_entry) { return -1; } new_entry->rebindings = (struct rebinding *) malloc(sizeof(struct rebinding) * nel); if (! new_entry->rebindings) { free(new_entry); return -1; } // copy the bindings value to new_entry->rebindings memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel); New_entry ->rebindings_nel = nel; New_entry ->next = * rebindingS_head; *rebindings_head = new_entry; return 0; }Copy the code

Returning to the entry function rebind_symbols once you understand the internal data structure:

int rebind_symbols(struct rebinding rebindings[], Int retval = prepend_rebindings(&_rebindingS_head, rebindings, rebindings_nel); if (retval < 0) { return retval; } // If this was the first call, register callback for image additions (which is also invoked for // existing images, Otherwise, just run on existing images // The head node has no next node, which means the current list only has just inserted nodes, which means the first call. // _dyLD_register_func_for_add_image: This listener is called when the dynamic library is loaded, and also when the loaded library is called back. if (! _rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); Uint32_t c = _dyLD_image_count (); for (uint32_t i = 0; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); } } return retval; }Copy the code

Do nothing, pass directly to rebind_symbolS_for_image:

static void _rebind_symbols_for_image(const struct mach_header *header,
                                      intptr_t slide) {
    rebind_symbols_for_image(_rebindings_head, header, slide);
}
Copy the code

// Rebind_symbolS_for_image is the process of locating __nl_symbol_ptr and __la_symbol_ptr sections in Mach -o, as well as symbol tables, dynamic symbol tables, and string tables:

static void rebind_symbols_for_image(struct rebindings_entry *rebindings, const struct mach_header *header, intptr_t slide) { Dl_info info; if (dladdr(header, &info) == 0) { return; } segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL; struct symtab_command* symtab_cmd = NULL; struct dysymtab_command* dysymtab_cmd = NULL; // The offset of header + the size of header. The goal is to skip the header part of mach-o to find the Segment. uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t); For (uint I = 0; i < header->ncmds; I++, cur += cur_seg_cmd->cmdsize) {// the current Load Command cur_seg_cmd = (segment_command_t *)cur; // Type is LC_SEGMENT if (cur_seg_cmd-> CMD == LC_SEGMENT_ARCH_DEPENDENT) { Fishhook works by __LINKEDIT offset in memory address - __LINKEDIT offset in file + slide = base address of file load. // What is slide? Due to ASLR, mach-O files were loaded at random addresses. Each time a mach-O file was loaded into memory, its address was different. if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; }} else if (cur_seg_cmd-> CMD == LC_SYMTAB) symtab_cmd = (struct symtab_command*)cur_seg_cmd; } else if (cur_seg_cmd-> CMD == LC_DYSYMTAB) {dysymtab_cmd = (struct dysymtab_command)cur_seg_cmd; }} // if (! symtab_cmd || ! dysymtab_cmd || ! linkedit_segment || ! dysymtab_cmd->nindirectsyms) { return; } // Find base symbol/string table addresses //linkedit_base uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff; // find the symbol table. nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff); // Remember the structure of the symbol table? /* struct symtab_command { uint32_t cmd; /* LC_SYMTAB */ uint32_t cmdsize; /* sizeof(struct symtab_command) */ uint32_t symoff; /* symbol table offset */ uint32_t nsyms; /* number of symbol table entries */ uint32_t stroff; /* string table offset */ uint32_t strsize; /* string table size in bytes */ }; */ //stroff: offset of the string table. // Find the string table. char *strtab = (char *)(linkedit_base + symtab_cmd->stroff); // How to direct the uint32_t indices into symbol table uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff); cur = (uintptr_t)header + sizeof(mach_header_t); // Iterate Load Commands for (uint I = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) { cur_seg_cmd = (segment_command_t *)cur; If (cur_seg_cmd-> CMD == LC_SEGMENT_ARCH_DEPENDENT) {// Find __DATA and __DATA_CONST sections. if (strcmp(cur_seg_cmd->segname, SEG_DATA) ! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) { continue; } //cur_seg_cmd->nsects: the number of sections in the segment j < cur_seg_cmd->nsects; j++) { section_t *sect = (section_t *)(cur + sizeof(segment_command_t)) + j; // find __la_symbol_ptr, If ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {perform_rebinding_with_section(rebindings, SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) { sect, slide, symtab, strtab, indirect_symtab); } // Find __nl_symbol_ptr, Non-delayed bindings if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {perform_rebinding_with_section(rebindings, SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) { sect, slide, symtab, strtab, indirect_symtab); } } } } }Copy the code

Locate __NL_symbol_ptr and __la_symbol_ptr and rebind the specific functions in them:

static void perform_rebinding_with_section(struct rebindings_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, uint32_t *indirect_symtab) { const bool isDataConst = strcmp(section->segname, SEG_DATA_CONST) == 0; // Remember reserveD1. Reserved1 in nl_SYMBOL_ptr and LA_SYMBOL_ptrsection represents the starting index in the dynamic symbol table. uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; // Find where the function Pointers in __nl_symbol_ptr and __la_symbol_ptr are stored. void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr); vm_prot_t oldProtection = VM_PROT_READ; if (isDataConst) { oldProtection = get_protection(rebindings); mprotect(indirect_symbol_bindings, section->size, PROT_READ | PROT_WRITE); } // iterate over symbols in section. for (uint i = 0; i < section->size / sizeof(void *); I++) {// find the position of the symbol in the dynamic symbol table. uint32_t symtab_index = indirect_symbol_indices[i]; if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL || symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) { continue; } //symtab[symtab_index] : symtab[symtab_index] : symtab[symtab_index] uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; // Find the name of the symbol. char *symbol_name = strtab + strtab_offset; // Check that the symbol is greater than two characters. Because the symbol is preceded by an underscore "_". bool symbol_name_longer_than_1 = symbol_name[0] && symbol_name[1]; // There is a one-way list inside Fishhook to store all the hook structures. struct rebindings_entry *cur = rebindings; While (cur) {// Iterate over all nodes. for (uint j = 0; j < cur->rebindings_nel; J++) {// check whether the symbol name fetched from the symbol table is the same as the symbol name passed in to hook. if (symbol_name_longer_than_1 && strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) { if (cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! Bindings [j]. Replacement) {//indirect_symbol_bindings[I] : __nl_symbol_ptr and __la_symbol_ptr. // Bind bindings[j]. *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; Bindings [j]. Bindings [j]. Calling the original function then becomes calling the replacement function we specified. indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; } if (isDataConst) { int protection = 0; if (oldProtection & VM_PROT_READ) { protection |= PROT_READ; } if (oldProtection & VM_PROT_WRITE) { protection |= PROT_WRITE; } if (oldProtection & VM_PROT_EXECUTE) { protection |= PROT_EXEC; } mprotect(indirect_symbol_bindings, section->size, protection); }}Copy the code

Fishhook (); fishhook (); fishhook ();

Fishhook process analysis

Testfishhook = testfishhook; testfishhook = testfishhook;

@implementation ViewController static void(*sys_nslog)(NSString * format,...) ; Void newNslog(NSString * format,...) {format = [format stringByAppendingString:@" I'm on! \n"]; NSLog sys_nslog(format); } - (void)viewDidLoad { [super viewDidLoad]; NSLog (@ "I am NSLog"); // Do any additional setup after loading the view. struct rebinding nslog; nslog.name = "NSLog"; // The name of the function to be replaced nslog.replacement = newNslog; // New function pointer nslog.replace = (void *)&sys_nslog; Struct rebinding rebs[1] = {nslog}; struct rebinding rebs[1] = {nslog}; rebind_symbols(rebs, 1); NSLog (@ "I am NSLog"); } @endCopy the code

Execute command+b in Xcode to get the mach-o file under Products:

Disassemble by MachOView:The NSLog Symbol in Lazy Symbol Pointers is offset to 0x5000 in the file.

throughimage listCommand to view all image files. The first one in the list is the Mach-o file of Testfishhook. The actual offset address in memory is 0x000000010d59d000. Because of ASLR, mach-O files are loaded at random addresses. Mach-o files are loaded at different addresses every time they are loaded into memory.The true offset of the NSLog Symbol in Lazy Symbol Pointers is 0x000000010D59d000 +0x5000, which passesx 0x000000010d59d000+0x5000You can view the instructions stored in the address. throughdis -sCommand disassembly shows that this instruction is an NSLog function.

Then drop the breakpoint to the second NSLog:The disassembly results show that the NSLog jump instruction has beennewNslog.

conclusion

Fishhook’s code is short and snappy. By exploring fishhook’s principles, we can gain a more complete understanding of the dynamic linking process and the internal structure of Mach-O files, and use this very classic library to do a lot of dark magic work.