Distance from the last article has been a period of time, although not many people read but good self entertainment, some time ago to go out a wave of abuse, so began to work hard, began to more commonly used before some severe open source library learning. Without further further, Fishhook is a library created by Facebook that can modify the external link with a C method (a method not written by facebook, which usually exists in the dynamic library loaded with dyld at app launch). The entire file is just over 200 lines of code.
usage
If you are using a tripartite framework and it is printing some useless information all the time and there is no good product to replace it on the market, you can hook the corresponding printing function by calling fishhook, for example
static void (*orig_printf)(char *format, ...) ; int main(int argc, const char * argv[]) {printf("abcd");
return0; } static void my_printf(const char * s, ...) {// you can write whatever code you want to replace // for example orig_printf("dcba");
}
__attribute((constructor)) void injected_function(){
rebind_symbols((struct rebinding[1]){{"printf", my_printf, (void *)&orig_printf}},1);
}
Copy the code
Int rebind_symbols(struct rebinding rebindings[], struct rebinding rebindings[], The first argument to size_t rebindings_nel is an array of structs
struct rebinding {
const char *name;
void *replacement;
void **replaced;
};
Copy the code
Name is the name of the function you want to hook,replacement is the function pointer after replacement, and replaced is a pointer to the function pointer passed in (if the function is successfully replaced, the value of the original function is put into it).
Of course, in many cases, we will use Fishhook in reverse, and fishhook can only be used to replace the code in the external dynamic library linked, not the C function written by ourselves
Mach-O
When I went to read Fishhook, I personally thought it was necessary to know something about Mach-o. Mach-o is an executable file for iOS/MacOS. When compiled using command+b under iOS project, it will generate a.app file in the Products directory. A file with the same name in our app is our Mach-o file, which contains our app’s classes, methods, and constants that are determined at compile time.
- Header: Contains some basic mach-O information, such as 32-bit / 64-bit, number of loadCommands, etc
- LoadCommand: This section is followed by headers, which are used to determine memory distribution when loading Mach-o
- Data: This contains specific Data, which is subdivided into multiple segments. Segments are divided into multiple sections, which contain specific information such as code and Data
Let’s actually use the tools to look at Mach-o, create an iOS project, do nothing, compile it, and use it. RightMachOView
To view
Because the focus of this article is
fishhook
, our main concern__Data
In the period of__la_symbol_ptr
This section(this section represents the lazy-loaded symbol table, if we write our own function will determine that the address is written to Macho at compile time, whereas system functions such as printf do not recognize the address during compile time), and then the related concerns need to be addressedSymbol Table
,Dynamic Symbol Table
with__LINKEDIT
Period of
There are so many things in MachO that I don’t want to describe them. If you know something about MachO, you can read the fishhook source code. If you want to know more about MachO, you can read this blog
Read the source code
Let’s get started and read the source code for Fishhook. Int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel)
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
if (retval < 0) {
return retval;
}
// If this was the first call, register callback for image additions (which is also invoked for
// existing images, otherwise, just run on existing images
if(! _rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); }else {
uint32_t c = _dyld_image_count();
for(uint32_t i = 0; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); }}return retval;
}
Copy the code
Static struct rebindings_entry *_rebindings_head (struct) The third value is the length of the array of structures
static int prepend_rebindings(struct rebindings_entry **rebindings_head,
struct rebinding rebindings[],
size_t nel) {
struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));
if(! new_entry) {return- 1; } new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);if(! new_entry->rebindings) { free(new_entry);return- 1; } memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel); new_entry->rebindings_nel = nel; new_entry->next = *rebindings_head; *rebindings_head = new_entry;return 0;
}
Copy the code
The code here is relatively simple
- Initialize a
struct rebindings_entry
The structure of the body - Initialize the array in the struct
- Copy the values of the struct array we passed into the newly initialized array
- After placing the newly initialized structure at the top of the list and going down the list, you see that the next pointer to the list is used to determine whether the method is called for the first time, and if so, call it
_dyld_register_func_for_add_image
Method and pass in_rebind_symbols_for_image
A function pointer
_dyLD_register_func_for_add_image registers custom callbacks and also executes callbacks for any dynamic libraries or executables that have been loaded
Each dynamic library calls back to the method _rebind_symbols_for_image, and that method is just a wrapper around rebind_symbols_for_image, which is a long implementation code that we can split into two parts see
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
Dl_info info;
if (dladdr(header, &info) == 0) {
return;
}
segment_command_t *cur_seg_cmd;
segment_command_t *linkedit_segment = NULL;
struct symtab_command* symtab_cmd = NULL;
struct dysymtab_command* dysymtab_cmd = NULL;
uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if(strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; }}else if (cur_seg_cmd->cmd == LC_SYMTAB) {
symtab_cmd = (struct symtab_command*)cur_seg_cmd;
} else if(cur_seg_cmd->cmd == LC_DYSYMTAB) { dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd; }}... }Copy the code
This code is used to retrieve the structure of the corresponding Symbol Table, Dynamic Symbol Table and __LINKEDIT segment
- Because LoadCommand follows mac_header
uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
Gets the location of the first LoadCommand - Then we iterate to get the corresponding structure based on the value of CMD
Looked down on
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
...
// Find base symbol/string table addresses
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
// Get indirect symbol table (array of uint32_t indices into symbol table)
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if(strcmp(cur_seg_cmd->segname, SEG_DATA) ! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) {continue;
}
for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
}
}
}
}
Copy the code
The base address of the program is = sild + __LINKEDIT->vmaddr – __LINKEDIT->vmaddr is the address of __LINKEDIT in memory,fileoff is the offset of __LINKEDIT in mach-o, so what is silde? In fact, SILde is ASLR, then what is ASLR? ASLR: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization: Address space layout randomization After obtaining the base address of the program, the data in the symbol table is calculated according to the offset in the symbol table, and then the LoadCommand is iterated to find the __DATA and __DATA_CONST sections. And rebind __NL_symbol_ptr and __la_symbol_ptr. Next, the perform_rebinding_with_section function is called
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
section_t *section,
intptr_t slide,
nlist_t *symtab,
char *strtab,
uint32_t *indirect_symtab) {
uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
for (uint i = 0; i < section->size / sizeof(void *); i++) {
uint32_t symtab_index = indirect_symbol_indices[i];
if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue;
}
uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
char *symbol_name = strtab + strtab_offset;
bool symbol_name_longer_than_1 = symbol_name[0] && symbol_name[1];
struct rebindings_entry *cur = rebindings;
while (cur) {
for (uint j = 0; j < cur->rebindings_nel; j++) {
if (symbol_name_longer_than_1 &&
strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if(cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; }}Copy the code
This function looks a bit long, but the logic is easy to understand, first based on the address in the dynamic symbol table + the index in the symbol table Obtain the position of the segment in the dynamic symbol table (the value of reserved1 represents the offset), perform a for loop to obtain the method name of the corresponding method on each loop, and then iterate through the private structure struct Rebindings_entry *, the list of each method of the structure of the array compared with the current methods of the table, if the same information stored in the symbol table pointer to give out the incoming call pointer to a function pointer, the pointer in the table as we pass in a function pointer, so just completed a process of the Italian job. Finally borrow the official picture