Fishhook is a FaceBook open source library that can be used to rebind symbols from external dynamic libraries in Mach-O format. It is important to understand why a hook is a dynamic library. To really understand the principles of this library, you can read the book Programmer Self Training and understand what a static library is and what a dynamic library is. This article focuses on the analysis of the whole library implementation process, the understanding of the implementation code
use
Test the NSLog function in the Hook framework Foudation framework, and add some custom print information to it
static void (*sys_NSLog)(NSString *format,...) ; static void hook_nslog(NSString *format, ...) {// Modify the printed content format = [format stringByAppendingFormat:@" haha"]; // call the hook function sys_NSLog(format); } int main(int argc, const char * argv[]) { @autoreleasepool { NSLog(@"Hook before !" ); // NSLog function struct rebinding rebindSymbol; rebindSymbol.name = "NSLog"; rebindSymbol.replacement = (void *)hook_nslog; // Keep the address of the original function in sys_NSLog rebindsymbol.replace = (void **)&sys_NSLog; struct rebinding rebs[] = {rebindSymbol}; rebind_symbols(rebs,1); NSLog(@"Hook after !" ); } return 0; }Copy the code
Print the result
Hook before !
Hook after ! haha
Copy the code
We have hooked up the NSLog function and printed the custom information. At the same time, it is important to call the saved function in the function we replaced, so that the original function will be called
Analysis of implementation Principle
Dyld binds so-called lazy and non-lazy symbols by updating Pointers to specific parts of the __Data segment of the Mach-O binary. Fishhook locates the symbol by passing in the name of the symbol to be replaced by the rebind_symbols function, and then performs the substitution to rebind the symbol
In a Mach-O file, the __Data section might contain dynamically bound symbol-dependent sections: __nl_symbol_ptr and __la_symbol_ptr, __nl_symbol_ptr are non-lazily loaded arrays of Pointers (think of as function addresses, which are bound at program load time), __la_symbol_ptr is also an array of Pointers to imported functions, The symbol is usually populated by the dyLD_STUB_binder function the first time it is called, and several indirection layers need to be skipped in order to find the symbol name for a particular location in the corresponding sections. For the two related sections, the reserveD1 field in the corresponding section header (defined in the < Mach-o /loader.h> header) provides the starting position of their associated symbol in the indirect symbol table, The indirect symbol table can be located by the __LINKEDIT segment, which is an array of indexes in the symbol table in the same order as Pointers in lazy and non-lazy parts, so for struct section NL_symbol_ptr, Indirect_symbol_table [NL_symbol_ptr -> reserveD1]. The symbol table is an array of struct nlists, and each nlist corresponds to the index in the character table. A character table can also be located by the __LINKEDIT section. The character table stores an array of character names for symbols. So finally we can find the symbol position by comparing the character table with the symbol name that needs the hook, and then we can replace the function pointer.
The whole process is to make it clear that the data we want to replace is in the data area, of course, we can not modify the data in the code area. The symbols of the dynamic library are divided into so-called lazy loading symbols and non-lazy loading symbols. Non-lazy loading symbols must be bound during the loading stage of the program. Binding is dyLD to find the corresponding function address of the corresponding symbol, and then write the address into the non-lazy loading data area. The first time this function is called, it looks for the address of the function in the data section of the lazy loading symbol. This address points to a fixed piece of code in the __stud_helper section, which in turn jumps to dyLD_stub_binder. Dyld_stub_binder then looks for the external symbolic address and writes it to the corresponding data area.
The implementation process
The whole implementation process is like a file parsing, if you have parsed MP4, FLV and similar files, you may understand the whole process better.
Step 1: Find the image file of the current executable
Int count = _dyLD_image_count (); int executeIndex = -1; for (int i = 0; i<count; Mach_header const struct mach_header* machHeader = _dyLD_GEt_image_header (I); If (machHeader-> fileType == MH_EXECUTE) {if (machHeader-> fileType == MH_EXECUTE) { break; }}Copy the code
The function _dyLD_image_count is used to obtain the number of image files loaded by the current program, and then the index of the main program image is searched.
Step 2: Find symbolic commands, dynamic symbolic commands, link commands
#ifdef __LP64__ typedef struct mach_header_64 mach_header_t; typedef struct segment_command_64 segment_command_t; typedef struct section_64 section_t; typedef struct nlist_64 nlist_t; #define LC_SEGMENT_ARCH_DEPENDENT LC_SEGMENT_64 #else typedef struct mach_header mach_header_t; typedef struct segment_command segment_command_t; typedef struct section section_t; typedef struct nlist nlist_t; #define LC_SEGMENT_ARCH_DEPENDENT LC_SEGMENT #endif const struct mach_header* machHeader = _dyld_get_image_header(executeIndex); uintptr_t cur = (uintptr_t)machHeader + sizeof(mach_header_t); Struct symtab_command *symCommand = NULL; Struct dysymtab_command *dysymCommand = NULL; // The link command segment_command_t *linked_cmd = NULL; for (int i = 0; i<machHeader->ncmds; i++) { struct load_command *command = (struct load_command *)cur; If (command-> CMD == LC_SEGMENT_ARCH_DEPENDENT) {segment_command_t *segmentCmd = (segment_command_t *)command; If (STRCMP (segmentCmd->segname, SEG_LINKEDIT) == 0) {linked_cmd = segmentCmd; STRCMP (segmentCmd->segname, SEG_LINKEDIT) == 0) {linked_cmd = segmentCmd; If (command-> CMD == LC_SYMTAB) {symCommand = (struct symtab_command *)command; If (command-> CMD == LC_DYSYMTAB) {dysymCommand = (struct dysymtab_command *)command; } cur += command->cmdsize; }Copy the code
Through the above code you can find the symbol command, dynamic symbol command, link symbol command.
Step 3: Get the section Hearder for lazy and non-lazy symbol function addresses
The function pointer array for the non-lazy loading symbol is in the __got Section of the data area, and the function pointer array for the lazy loading symbol is in the __la_symbol_ptr Section. These two section headers are in segment_command SEG_DATA and SEG_DATA_CONST commands,, The __got section header is in the segment_command of SEG_DATA_CONST and the __la_symbol_ptr section header is in the segment_command of SEG_DATA
if (strcmp(segmentCmd->segname, SEG_DATA) == 0 || strcmp(segmentCmd->segname, SEG_DATA_CONST) == 0) { section_t *sections = (section_t *)((uintptr_t)segmentCmd + sizeof(segment_command_t)); for (int j = 0; j<segmentCmd->nsects; j++) { section_t mSection = sections[j]; If ((msection.flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {// lazySection = §ions[j]; NSLog(@"section name %s",lazySection->sectname); } if ((msection.flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {// Non-lazy loading to section header nonLazySection = §ions[j]; int index = nonLazySection->reserved1; NSLog(@"section name %s",nonLazySection->sectname); }}}Copy the code
The above method of determining SEG_DATA and SEG_DATA_CONST is the same as the method of finding SEG_LINKEDIT. They are both commands that belong to LC_SEGMENT_ARCH_DEPENDENT, Section headers and non-lazy-loaded symbol function Pointers. The reserveD1 field in the section header indicates that the first data in the current array of function Pointers is indexed in the indirect symbol table. The addr field represents the in-memory address of the symbolic function pointer array, which of course requires the value of ASLR to obtain the actual virtual memory address
Step 4: understand ASLR, calculate symbol information, indirect symbol information, character information in memory actual address
- ASLR: Popular said is in each time you start the app will give a random address offsets, because modern computers are using virtual memory, will cause the program loaded into memory each time are likely to be a fixed address, so that will have security issues, with each application starts to the program loading address add a random offset value, is the so-called ASLR
- Calculate the base address of the program load: Calculate the base address of the program load without ASLR from the vmADDR and FileOFF fields of the link segment, and then add the value of ASLR to this address to get the actual base address of the program
- Calculate symbol information address: use base + symtab offset to calculate the first address of the symtab table, and obtain the nLIST_t structure instance
- Calculates the address of indirect symbol information: Calculates the head address of the dynamic symbol table with the Base + InDirectSymoff offset
- Calculate the address of the character information: / Calculate the first address of the character table by base + stroff offset to obtain the string table
Intptr_t slide = _dyLD_GET_image_vmaddr_slide (executeIndex); Uintptr_t linked_base_address = linked_cmd->vmaddr-linked_cmd->fileoff+slide; Nlist_t *symbolList = (nlist_t *)(linked_base_address+symCommand->symoff); Uint32_t *dysmList = (uint32_t *)(uintd_base_address +dysymCommand-> indirectSymoff); Char *strList = (char *)(linked_base_address+symCommand->stroff);Copy the code
The above is the calculation method, where symCommand and dysymCommand are obtained through step 2 above
Step 5: Iterate through the __got and __la_symbol_ptr segments
The final step is to find the symbol position to replace by comparing the name of the symbol to the __got segment of the data area (the array of function addresses for non-lazy-loaded symbols) and __la_symbol_ptr (the array of function addresses for lazy-loaded symbols)
Int gotSymbolNum = nonLazySection->size/(sizeof(void*)); void **gotSymbolValue = (void **)((uintptr_t)slide + nonLazySection->addr); for (int i = 0; i<gotSymbolNum; I ++) {// index int dysm_index = nonLazySection->reserved1+ I; Uint32_t symtab_index = dysmList[dysm_index]; if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL || symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) { continue; } // find the corresponding symbol nlist_t findSymbol = symbolList[symtab_index]; char *mSymbolName = strList+findSymbol.n_un.n_strx; bool symbol_name_longer_than_1 = mSymbolName[0] && mSymbolName[1]; If (STRCMP (&mSymbolName[1],symbolName) == 0) {// If (STRCMP (&mSymbolName[1],symbolName) == 0) { NSLog(@"Find symbolName : %s",symbolName); //break; GotSymbolValue [I] = replaceFunc; }}}Copy the code
LazySymbolNum = lazySection->size/sizeof(void*); void **laSymbolValue = (void **)((uintptr_t)slide + lazySection->addr); for (int i = 0; i<lazySymbolNum; Index int dysm_index = lazySection->reserved1+ I; Uint32_t symtab_index = dysmList[dysm_index]; if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL || symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) { continue; } // Index int str_offset = symbolList[symtab_index].n_un.n_strx; char *symbolStr = strList+str_offset; bool symbol_name_loger_than_1 = symbolStr[0] && symbolStr[1]; if (symbol_name_loger_than_1) { if (strcmp(&symbolStr[1], symbolName) == 0) { NSLog(@"Find symbol : %s",symbolName); LaSymbolValue [I] = replaceFunc; }}}Copy the code
conclusion
The above mainly analyzes the whole process of finding symbols, the specific implementation can be compared and analyzed according to Fishhook source code.
The resources
Self-cultivation of the Programmer
Mac OSX & iOS
Explore the Mach-O file
Self-cultivation for iOS programmers