This is the third article in the Mach-O series
Before you read the FishHook source code, you may want to have a brief understanding of the following
- Mach-o file format: Interesting exploration mach-O: File format analysis
- Mach-O dynamic linking process, interesting exploration Mach-O: loading process
- Understanding of operating system and compilation principle: In-depth analysis of Mac OS X & iOS operating system and self-cultivation of programmers
This article follows the sequence of function calls
What can Fishhook do
Here, an analysis chart of Alibaba can be used to clearly understand the roles played by FishHook
FishHook here works on the dynamically linked library, modifying the corresponding function implementation
For the C function in the dynamic link library, the first time we call it, we get the connection between the function and its implementation address, which is stored in a place called la_symbol_ptr, and the second time we call it, we just find the function address through la_symbol_ptr. No more tedious process of getting the address of the function. (For details, see the link above: Dynamic linking for Mach-O.)
So, the meaning of the picture above is clear
When the program runs, the dynamically linked C function dynamic(…) The address is recorded in la_symbol_ptr under the DATA segment; At the beginning, the program only knows the symbol name of the dynamic function but not the implementation address of the function. On the first call, the program uses stub_helper in the TEXT segment to get binding information and dyLD_STUB_binder to update the symbol implementation address in la_SYMBOL_ptr. This way, when called again, the implementation of the dynamic function can be found directly through la_symbol_ptr; If we need to replace the implementation of the dynamic function, we can simply change __la_symbol_ptr, which is Fishhook
The realization of the Fishhook
Fishhook’s official documentation shows how to use Fishhook:
static int (*original_open)(const char *, int, ...) ; int new_open(const char *path, int oflag, ...) { va_list ap = {0}; mode_t mode = 0;if((oflag & O_CREAT) ! = 0) { // mode only applies to O_CREAT va_start(ap, oflag); mode = va_arg(ap, int); va_end(ap);printf("Calling real open('%s', %d, %d)\n", path, oflag, mode);
return original_open(path, oflag, mode);
} else {
printf("Calling real open('%s', %d)\n", path, oflag);
return original_open(path, oflag, mode);
}
}
int main(int argc, const char * argv[]) {
@autoreleasepool {
struct rebinding open_rebinding = { "open", new_open, (void *)&original_open };
rebind_symbols((struct rebinding[1]){open_rebinding}, 1);
__unused int fd = open(argv[0], O_RDONLY);
}
return 0;
}Copy the code
Let’s start with rebind_symbols, which uses _dyLD_register_func_for_add_image to register callback functions that perform operations when loading the dynamic library
Int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) { Int retval = prepend_rebindings(&_REBindingS_head, rebindings, rebindings_nel);if (retval < 0) {
returnretval; } // If this was the first call, register callbackfor image additions (which is also invoked for
// existing images, otherwise, just run on existing images
if(! _rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); }else {
uint32_t c = _dyld_image_count();
for(uint32_t i = 0; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); }}return retval;
}Copy the code
The code for prepend_rebindings is as follows
Struct rebindings_entry {struct rebinding *rebindings; size_t rebindings_nel; struct rebindings_entry *next; }; static struct rebindings_entry *_rebindings_head; static int prepend_rebindings(struct rebindings_entry **rebindings_head, struct rebinding rebindings[], size_t nel) { struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));if(! new_entry) {return- 1; } new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);if(! new_entry->rebindings) { free(new_entry);return- 1; } // insert rebindings into the linked list header memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel); new_entry->rebindings_nel = nel; new_entry->next = *rebindings_head; *rebindings_head = new_entry;return 0;
}Copy the code
Infrastructure interpretation
Dl_info
/*
* Structure filled in by dladdr().
*/
typedef struct dl_info {
const char *dli_fname; /* Pathname of shared object */
void *dli_fbase; /* Base address of shared object */
const char *dli_sname; /* Name of nearest symbol */
void *dli_saddr; /* Address of nearest symbol */
} Dl_info;Copy the code
Any valid information that we will later process through dladdr () will be put into this structure
Fname:
Path name, for example
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator.sdk/System/ Library/Frameworks/CoreFoundation.framework/CoreFoundationCopy the code
Dli_fbase:
The Base address of a shared object (such as CoreFoundation above)Dli_saddr:
Address of symbolDli_sname:
The name of the symbol, the function information in the fourth column below
Thread 0:
0 libsystem_kernel.dylib 0x11135810a __semwait_signal + 94474
1 libsystem_c.dylib 0x1110dab0b sleep + 518923
2 QYPerformanceMonitor 0x10dda4f1b -[ViewController tableView:cellForRowAtIndexPath:] + 7963
3 UIKit 0x10ed4d4f4 -[UITableView _createPreparedCellForGlobalRow:withIndexPath:willDisplay:] + 1586420Copy the code
LC_SYMTAB
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};Copy the code
This provides the offset of the symbol table, the number of elements, and the offset and length of the string table. The address of the symbol table in the Mach-O object file can be found by symoff specified by the LC_SYMTAB load command. The corresponding symbol name is in stroff, and there is a total of NSYMS bar symbol information
LC_DYSYMTAB
This array structure is a bit complex, you can read loader.h file, internal marking dynamic symbol table offset and symbol number
struct dysymtab_command {
uint32_t cmd; /* LC_DYSYMTAB */
uint32_t cmdsize; /* sizeof(struct dysymtab_command) */
uint32_t indirectsymoff; /* file offset to the indirect symbol table */
uint32_t nindirectsyms; /* number of indirect symbol table entries */
.......Copy the code
_rebind_symbols_for_image
The key code for _rebind_SYMBOLs_for_image is as follows
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
Dl_info info;
if (dladdr(header, &info) == 0) {
return; } // segment_command_64 segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL; // LC_SYMTAB struct symtab_command* symtab_cmd = NULL; // LC_DYSYMTAB struct dysymtab_command* dysymtab_cmd = NULL; Uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if(STRCMP (cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {// traverse to find __LINKEDIT linkedit_segment = cur_seg_cmd; }}else if(cur_seg_cmd-> CMD == LC_SYMTAB symtab_cmd = (struct symtab_command*)cur_seg_cmd; }else if(cur_seg_cmd-> CMD == LC_DYSYMTAB) {struct dysymtab_cmd = (struct dysymtab_command)cur_seg_cmd; }}Copy the code
Why are we looking for loadCommands? __LINKEDIT, LC_DYSYMTAB, LC_SYMTAB all provide important information.
The __LINKEDIT section contains raw data for the dynamic link library, such as symbols, strings, relocation table entries, and so on
Before reading the code below, let’s look at a formula
The base address = __linkedit.vm_address – __linkedit.file_offset + silde change value when linking
There’s a slide, so what’s a slide? So let’s look at ASLR
ASLR: Address space Layout randomization, which loads the executable into memory at random. This randomization is offset, not scrambled, by “shifting” the mach-O segment by some random factor through the kernel. Slide is the offset introduced by ASLR
That is, the program’s base address is equal to the __LINKEDIT address minus the offset, and then the offset caused by ASLR
Uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment-> vmaddr-linkedit_segment ->fileoff; Nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff); Char *strtab = (char *)(linkedit_base + symtab_cmd->stroff); Uint32_t *indirect_symtab = (uint32_t *)(uintdit_base + dysymtab_cmd-> indirectSymoff);Copy the code
The symbol table elements are all nlist_t structures. There is a lot of knowledge in nlist_T, so let’s take a look at the infrastructure
/*
* This is the symbol table entry structure for32-bit architectures. */ struct nlist { union { uint32_t n_strx; /* index into the string table */ } n_un; uint8_t n_type; / *type flag, see below */
uint8_t n_sect; /* section number or NO_SECT */
int16_t n_desc; /* see <mach-o/stab.h> */
uint32_t n_value; /* value of this symbol (or stab offset) */
};Copy the code
Then iterate over loadCommands again, looking for __DATA and __DATA_CONST sections, and rebind __NL_symbol_ptr and __la_symbol_ptr
cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) {continue; } // Find the __DATA and __DATA_CONST sections and rebind __nl_symbol_ptr and __la_symbol_ptrfor (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
if((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {sect for sections, symtab for symbol tables, strtab for string tables, Indirect_symtab (indirect Symbol table) Perform_REBinding_WITH_section (Rebindings, Sect, Slide, symtab, strtab, indirect_symtab); }if((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) { perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab); }}}Copy the code
##perform_rebinding_with_section
The reserveD1 field in nl_SYMBOL_ptr and LA_symbol_ptrsection indicates the corresponding index from the indirect Symbol table.
For the two relevant sections, the section headers (struct sections from ) provide an offset (in the reserved1 field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the LINKEDIT segment of the binary, is just an array of indexes into the symbol table (also in LINKEDIT) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections
So, given struct section nl_symbol_ptr, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]. The symbol table itself is an array of struct nlists (see ), and each nlist contains an index into the string table in LINKEDIT which where the actual symbol names are stored. So, for each pointer nl_symbol_ptr and __la_symbol_ptr, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.
Combined with English, the following code is easy to understand
// sect for Section, symtab for symbol table, strtab for string table, Indirect_symtab static void perform_REBinding_with_section (struct Rebindings_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, The 'reserved1' field in the uint32_t *indirect_symtab) {// 'nl_symbol_ptr' and 'la_symbol_ptr' sections indicates the corresponding 'indirect symbol' Uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);for(uint i = 0; i < section->size / sizeof(void *); // uint32_t symtab_index = indirect_symbol_indices[I];if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue; Uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; Char *symbol_name = strtab + strtab_offset;Copy the code
In fact, the above code can be used in an official image is very intuitive representation
I’m going to go here and I’m going to find the symbol for the string table
How to replace the implementation
Traverse the rebindings array, symbols for comparison, the same symbols for implementation of replacement, here the code is more clear, directly posted
struct rebindings_entry *cur = rebindings;
while (cur) {
for (uint j = 0; j < cur->rebindings_nel; j++) {
if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if(cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; }Copy the code
Refer to the link
- Dynamically modify C language function implementation
- MRH Fihshook source analysis
- fishhook
- In-depth analysis of Mac OS X & iOS operating systems
- Programmer self-cultivation
- Compilation system roaming