preface
Recently, I had a whim to study whether I could perfectly intercept all network requests from WKWebView, so I went to look at the WebKit source code, and found that the source code was basically implemented in c++. Suddenly, I wanted to study whether I could hook the c++ functions in the private library. And so began a learning journey.
search
All the research starts with search. If someone has already researched it, it does not take much time. From Google to stackOverflow to gitHub, I searched for keywords related to hook and c++, but almost no information was found, and no one could tell me clearly. Can hook c++ methods in iOS?
explore
Plan for
When the search failed to find any useful information, I was a bit confused because I didn’t know how to start (I had little knowledge of the mach-o file format). I knew that Fishhook can hook C functions before, so I wondered if I could use fishhook to hook C ++ functions in private library as well (which reflected my ignorance of the implementation principle of Fishhook). My attempt at that time failed. Later, with the help of a colleague who studies reverse, I learned that HOOkzz library can be used to hook C/C ++ functions. The principle of hookzz has not yet to understand, the use of the method is as follows:
extern "C" {
extern int ZzReplace(void *function_address, void *replace_call, void **origin_call);
}
size_t (*origin_fread)(void * ptr, size_t size, size_t nitems, FILE * stream);
size_t (fake_fread)(void * ptr, size_t size, size_t nitems, FILE * stream) {
// Do What you Want.
return origin_fread(ptr, size, nitems, stream);
}
void hook_fread(a) {
ZzReplace((void *)fread, (void *)fake_fread, (void **)&origin_fread);
}
Copy the code
The first argument is the address of the function to be hooked. The second argument is the address of the function to replace the original function. The third argument is a pointer to the function pointer, which is used to store the function pointer. Since the second and third arguments are only created by themselves, the problem now is how to find the function address of the hook function. Hook with hookzz as long as you can find the function address.
Hook function address search
So, how do you find a function pointer to a function? This is where you need to understand the iOS dyld file format — Mach-o. In iOS, all dyld’s are in Mach-O format. In Mach-o, A Symbol Table is used to store all symbols and their addresses in the code. Function names are also symbols, so they can be found directly in the symbol table. So let’s go straight to the MachOView tool and look at the dyld file.
- Get the WebKit dyld file, for convenience, we directly take the MAC system WebKit library, in the file directory
/System/Library/Frameworks
Can be found in the following picture:
- Open the WebKit file in the WebKit Framework with the MachOView tool and scroll down to the bottom to see the Symbol Table, as shown below:
The first red box on the right side of the figure above is the symbol of C ++ functions, which is different from the definition of C ++ functions we are used to. This is because compared with C functions, c++ entity definition is more complex, so the compiler will perform mangle operation on C ++ entities to distinguish different entities. Thus the uniqueness of the program entity name is guaranteed. Demangle can be performed using the c++filt tool (the GCC and MSVC c++ Demangler site suddenly failed to open, the site also supports demangle c++ functions) as shown in the figure below
As you can see, will be demangle symbol __ZNK7WebCore30MediaDevicesEnumerationRequest23userMediaDocumentOriginEv after operation, To get to the WebCore: : MediaDevicesEnumerationRequest: : userMediaDocumentOrigin const () function name.
Code implementation
Now that we’ve analyzed how to get the address of a function, it’s time to get the symbol table in code, which requires some knowledge of the Mach-o file format
- Obtain the mirror address of WebKit dyld, code as follows:
- (void*)findDyldImageWithName:(NSString *)targetName {
int count = _dyld_image_count();
for (int i = 0; i < count; i++) {
const char* name = _dyld_get_image_name(i);
if(strstr(name, [targetName cStringUsingEncoding:NSUTF8StringEncoding) >0) {
return (void*)_dyld_get_image_header(i); }}return NULL;
}
Copy the code
- Select * from ‘_TEXT’ where ‘linkedit’ = ‘_TEXT’ and ‘_LINKEDIT’
// Traverse all segments in the mirror
void _enumerate_segment(const mach_header *header, std::function<bool(struct load_command *)> func) {
// We only consider 64-bit applications here. The first command starts with the next bit of the header
struct load_command *baseCommand = (struct load_command((*)struct mach_header_64 *)header + 1);
if (baseCommand == nullptr) return;
struct load_command *command = baseCommand;
for (int i = 0; i < header->ncmds; i++) {
if (func(command)) {
return;
}
command = (struct load_command *)((uintptr_t)command + command->cmdsize); }}void _log_dyld_all_symbol(char *dyld_name) {
const struct mach_header *header = NULL;
uint64_t slide;
int count = _dyld_image_count();
// Get the WebKit image header and slide size
for (int i = 0; i < count; i++) {
const char* name = _dyld_get_image_name(i);
if(strstr(name, dyld_name) > (char *)0) {
header = _dyld_get_image_header(i);
slide = _dyld_get_image_vmaddr_slide(i);
break;
}
}
segment_command_64 *seg_linkedit = NULL;
segment_command_64 *seg_text = NULL;
struct symtab_command *symtab_command = NULL;
// Iterate over the load_command to get the _LINKEDIT segment, _TEXT segment, and load_commond for the symbol table
_enumerate_segment(header, [&](struct load_command *command) {
if (command->cmd == LC_SEGMENT_64) {
struct segment_command_64 *segCmd = (struct segment_command_64 *)command;
if (0= =strcmp((segCmd)->segname, SEG_LINKEDIT))
seg_linkedit = segCmd;
else if (0= =strcmp((segCmd)->segname, SEG_TEXT))
seg_text = segCmd;
} else if (command->cmd == LC_SYMTAB) {
symtab_command = (struct symtab_command *)command;
}
return false;
});
/ /...
}
Copy the code
- Calculates the position of the symbol table and character table
// Get the first address of the _LINKEDIT segment
uintptr_t linkedit_addr = (uintptr_t)seg_linkedit->vmaddr -(uintptr_t)seg_text->vmaddr - (uintptr_t)seg_linkedit->fileoff;
// Get the first address of the symbol table
struct nlist_64 *nlist = (struct nlist_64((*)uintptr_t)header + (uintptr_t)symtab_command->symoff + linkedit_addr);
// Get the first address of the character table
intptr_t string_table = (intptr_t)header + ((uintptr_t)symtab_command->stroff + (uintptr_t)linkedit_addr);
Copy the code
- Traverse the symbol table
// Iterate to print all symbols
for (int i = 0; i < symtab_command->nsyms ; i++) {
char * symbol_name = (char *)(string_table + nlist->n_un.n_strx);
char * demangle_symbol = _demangle_symbol(symbol_name);
printf("symbol name: %s\n", demangle_symbol);
nlist = (struct nlist_64 *)((uintptr_t)nlist + sizeof(struct nlist_64));
}
Copy the code
- Demangle c + + symbols
char * _demangle_symbol(char* mangle_symbol) {
size_t str_len = strlen(mangle_symbol);
if (str_len < 3) {
return mangle_symbol;
}
if (PLATFORM_IOS) {
if (strstr(mangle_symbol, "__Z") == mangle_symbol) {
char *new_mangle_symbol = mangle_symbol + 1;
int status;
char *demangle_symbol = abi::__cxa_demangle (new_mangle_symbol, nullptr.0, &status);
return status == 0? demangle_symbol : mangle_symbol; }}else {
int status;
char *demangle_symbol = abi::__cxa_demangle (mangle_symbol, nullptr.0, &status);
return status == 0 ? demangle_symbol : mangle_symbol;
}
return mangle_symbol;
}
Copy the code
In iOS, direct demangle will return status = 4, which means that the format does not match. After testing, it is found that on iOS, Simply change the beginning of the character __Z to _Z and demangle succeeds, for reasons I don’t know.
Just when I thought I was close to success, reality threw cold water on me. Since the previous tests were conducted in the emulator, we can print the symbols and addresses of all functions in the WebKit image, as shown in the figure below:
However, when I ran it on the real machine, I was confused. Most of the symbols I got were
, and only part of the addresses were resolved, and the corresponding address of the partial symbols was 0x0. As shown in the figure below:
After analysis, it is found that in the real machine, the compiler should have done the following optimization (purely personal guess)
- The symbols corresponding to the internal functions in dyld can be addressed (desymbolized) because the symbols are meant to be read by humans. A binary address is sufficient for a machine. It can also effectively reduce the volume of dyLD in memory.
- For functions exposed in dyld, the symbol and the offset value in dyld can be obtained in the symbol table. Because these functions need to be called externally, they cannot be addressed.
- Functions in a third-party library referenced in dyld are not addressed, but because they are external symbols, they need to be redirected to get the real address.
conclusion
After my own research, I found that there may be no way to hook the private methods in c++ in real machines. If it is just for debugging, we can directly use MachOView or Hooper on MAC to get the offset value of the private function in the corresponding dyLD, and then hook the code directly with the offset. There is no way (at least I can’t think of it now) to hook the internal private method in dyld.
If you want to hook public methods in private libraries, it should be possible. You can directly modify the fishhook source code to demangle symbols taken from the Dyld symbol table during external symbol matching and then compare them, since the only difference between C and C ++ is whether symbols stored in the symbol table go through a layer of demangle. So if you remove this distinction, you can equate c++ hooks with C.
Ps: The same code, in the iOS real machine to get the internal function is, but on the Mac or iOS emulator can parse out. In the process, to see if there was a mismatch between dyLD built in iOS and DYLD built in Mac, I also pulled the iOS shared cache dyLD_SHARED_CACHE_ARM64 from a jailbroken phone. After pulling the WebKit library out of the shared cache, It’s not that different from the Mac.
Modified on 14 October 2019
Hookzz cannot be used for inline hooks, so there is no method hook C++ function that uses hookzz instead of mach_msg method on non-jailbroken machines
An attempt to hook the mach_msg of the system with fishhook to take over the entire process communication also failed. The reason is: Although fishhook can only hook part of mach_msg, it cannot hook the mach_msg that is called in WebKit. For the specific reason, check the discussion link on iOSer to see if Fishhook cannot hook all mach_msg
The resources
- Mach-o executable file
- Probe mach-o files
- Mach-o For iOS (1)
- Mach-o: File format analysis
- Dynamically modify C language function implementation
- Clever Use of Symbol Table – Exploring fishhook Principle (1)
- Hook principle fishhook source code analysis
- HookZz
- Dyld,
- IOS reverse —- one-click frida-ios-dump
- monkeyDev
- IOS reverse —- Extract the downloaded App from the App Store from the jailbroken phone
- frida-ios-dump
- IOS reverse —-SSH connection jailbroken iPhone
- Iproxy – Connects to an iOS device over USB over SSH