preface

This article begins by sharing the principles of Hook, including the native Method Swizzle of iOS, as well as well-known Hook third-party frameworks, such as fishHook, Cydia Substrate, and inlineHook. It will then focus on the underlying processes of fishHook.

1. Hook Overview

I have a Hook. In iOS reverse is the technique of changing the flow of a program. Hook allows other people’s programs to execute their own code. This technique is often used in reverse-engineering. Only by understanding its principle can malicious code be effectively protected.

For example, a long time ago, wechat automatic grab red envelope plug-in 👇

1.1 Several ways of Hook

Hook technology in iOS can be roughly divided into five categories: Method Swizzle, Fishhook, Cydia Substrate, libffi, and Inlinehook. 1. Method Swizzle (OC) uses the Runtime feature of OC to dynamically change the corresponding relationship between SEL (Method number) and IMP (Method implementation) to achieve the purpose of changing the process of OC Method call 👇 (mainly used for OC Method)

The relationship between SEL and IMP can be thought of as the catalog of a book. SEL is like title, IMP is like page number. There is a one-to-one correspondence between them 👇

Method interchange can be implemented in three main ways 👇

  1. method_exchangeImplementations👉 inclassificationStudent: You can just swap them ifNot in the classification, need to cooperateclass_addMethodimplementationJump back to the original method.
  2. class_replaceMethod👉 directly replace the original method.
  3. method_setImplementation👉 reassign the original method, passgetImpandsetImpTo cooperate.

For specific use cases, see my previous article 👉 (11- Code injection)(⚠️ note: drag to the end of 😁).

2. Fishhook is a tool provided by Facebook to dynamically modify linked MachO files. By using MachO file loading principle, C function (system C function) HOOK is achieved by modifying the pointer of lazy loading table and non-lazy loading table.

Fishhook official link

The process 👉 dyld updates the pointer in the __la_symbol_str of the __DATA segment of the Mach-O binary and rebinds the symbol using rebind_symbol to update the two symbol positions. I will analyze the underlying process in detail later.

Cydia Substrate, formerly known as Mobile Substrate, is mainly used for HOOK operations against OC method, C function and function address. It’s not just designed for iOS, it works on Android as well.

Cydia Substrate official link

Cydia Substrate structure Cydia Substrate is mainly divided into three parts: Mobile Hooker, MobileLoader, safe mode.

  1. Mobile Hooker

It defines a series of macros and functions, with the objC Runtime and Fishhook called at the bottom to replace system or target application functions. There are two functions:

  • MSHookMessageEx: Applies primarily to the OC method MSHookMessageEx

    void MSHookMessageEx(Class class, SEL selector, IMP replacement, IMP result) 
    Copy the code
  • MSHookFunction :(inline hook) mainly applies to the C and C++ functions MSHookFunction. The %hook of Logos syntax encapsulates this function.

    void MSHookFunction(voidfunction,void* replacement,void** p_original)
    Copy the code
  1. MobileLoader

MobileLoader is used to load third-party dylib running applications. When MobileLoader starts up, it automatically loads a third-party dynamic library in the specified directory, which is the hack we wrote.

  1. safe mode

The decryption program is essentially dylib parasitized in someone else’s process. If an error occurs in a system process, the entire process may crash and the iOS system may break down. Therefore, CydiaSubstrate introduced the safe mode, in which all the tripartite dylib based on CydiaSubstratede would be disabled, so as to facilitate error detection and repair.

Libffi dynamically calls C functions based on libbfi. Use ffi_closure_alloc in libffi to construct a “stingerIMP” that matches the original method argument, replacing the original method function pointer; In addition, the parameter templates CIF and blockCif for the call to the original method and Block are generated. Void st_FFi_function (ffi_cif *cif, void *ret, void **args, void *userdata); The original method implementation and slice block are called based on CIF mainly through FFi_call.

The AOP libraries Stinger and BlockHook are made using LibbFI.

An inlinehook is a way to rob a running process by inserting a jump instruction into the process. It can be roughly divided into three steps 👇

  1. willThe functionThe first N bytes of theHook functionThe first N bytes of;
  2. thenThe functionFill in the first N bytes ofjumptoHook functionJump instruction;
  3. inEnd of Hook functionSeveral bytes to fillJump back to the original function+N jump instruction;

The general process is shown as 👇

The MSHookFunction in Cydia Substrate framework is the Inlinehook principle used.

Dobby Dobby (formerly HOOKZz) is a full-platform inlineHook framework that works just like Fishhook. Dobby uses Mmap to map the entire Mach-O file to the user’s memory space and save it locally. So instead of doing something on top of the original Mach-O, Dobby regenerates and replaces it.

Dobby inserts __zDATA and __zTEXT segments into Mach-O.

  • __zDATA👉 record Hook information (number of hooks, address of each Hook method), information of each Hook method (function address, jump instruction address, interface address of writing Hook function), interface of each Hook (pointer).

* __zText 👉 records the jump instruction for each Hook function.

Dobby making links

Second, the fishHook

2.1 Use of Fishhook

First let’s see how Fishhook is used 👉 of course see.h header file 👇

/* * A structure representing a particular intended rebinding from a symbol * name to its replacement */ struct rebinding { const char *name; // The name of the function that needs a HOOK, C string void *replacement; // Address of the new function void **replace; // Pointer to the original function address! }; /* * For each rebinding in rebindings, rebinds references to external, indirect * symbols with the specified name to instead point at replacement for each * image in the calling process as well as for all future images that are loaded * by the process. If rebind_functions is called more than once, the symbols to * rebind are added to the existing list of rebindings, and if a given symbol * is rebound more than once, the later rebinding will take precedence. */ FISHHOOK_VISIBILITY int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel); /* * Rebinds as above, but only in the specified image. The header should point * to the mach-o header, the slide should be the slide offset. Others as above. */ FISHHOOK_VISIBILITY int rebind_symbols_image(void *header, intptr_t slide, struct rebinding rebindings[], size_t rebindings_nel);Copy the code

Very simple, only a structure rebinding and two functions are provided.

rebinding
struct rebinding { const char *name; // The name of the function that needs a HOOK, C string void *replacement; // Address of the new function void **replace; // Pointer to the original function address! };Copy the code
  • name👉 To HOOK the function name, C string.
  • replacement👉 Address of the new function. (Function pointer, that is, function name).
  • replaced👉 pointer to the address of the original function. (Secondary pointer).
Two functions
FISHHOOK_VISIBILITY
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel);

FISHHOOK_VISIBILITY
int rebind_symbols_image(void *header,
                         intptr_t slide,
                         struct rebinding rebindings[],
                         size_t rebindings_nel);

Copy the code
  • header👉 Header image
  • slide 👉 ASLR
  • rebindings[]👉 An array of rebinding structures (multiple functions can be exchanged simultaneously)
  • rebindings_nel👉 rebindings array length

Example demonstrates

Example 1: HOOK NSLog

Now we use fishHook hook system NSLog function, code 👇

- (void)hook_NSLog { struct rebinding rebindNSLog; rebindNSLog.name = "NSLog"; rebindNSLog.replacement = LG_NSLog; rebindNSLog.replaced = (void *)&sys_NSLog; struct rebinding rebinds[] = {rebindNSLog}; rebind_symbols(rebinds, 1); Static void (*sys_NSLog)(NSString *format,...) ; Void LG_NSLog(NSString *format,...) {format = [format stringByAppendingFormat:@" Hook up!! "] ; NSLog sys_NSLog(format); }Copy the code

Call code 👇

-(void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event
{
    NSLog(@"hello");
}
Copy the code

The run 👇

Now I’ve hooked up to the NSLog, and I’m going to go to LG_NSLog. Hook code is called, sys_NSLog save system NSLog original address, NSLog points to LG_NSLog.

Example 2: HOOK custom C functions

Next, let’s Hook the custom C function 👇

void func(const char * str) { NSLog(@"%s",str); } - (void)hook_func { struct rebinding rebindFunc; rebindFunc.name = "func"; rebindFunc.replacement = LG_func; rebindFunc.replaced = (void *)&original_func; struct rebinding rebinds[] = {rebindFunc}; rebind_symbols(rebinds, 1); } static void (*original_func)(const char * STR); // new function void LG_func(const char * STR) {NSLog(@"Hook func"); original_func(str); }Copy the code

Call code 👇

-(void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event
{
    [self hook_func];
    func("hello");
}
Copy the code

Run 👇

We find that there is no Hook to the func function at this time. Hence 👇

The custom fishhook function cannot hook, but the system fishhook function can hook.

2.2 fishhook principle

Fishhook can HOOK C functions, but we know that functions are static, which means that the implementation address is determined at compile time. This is why C functions only write function declarations and will report errors when called. So why is Fishhook able to change the call to C functions? Did you change the address of the function implementation like Method Swizzle did? With these questions in mind, let’s move on.

What is the difference between 👉 system functions and local functions?

2.2.1 Symbols & Symbol binding & symbol table & rebinding symbols

The address of the NSLog function at compile time, our App does not know the actual address of the NSLog function implementation 👉 because NSLog is in the Foundation framework, and the address of the NSLog function implementation is in the shared cache at runtime. Only the dyLD of the system knows this real address.

When the LLVM compiler generates MachO files, we know that MachO is divided into Text (read-only) and Data (readable and writable). If you empty the address of the system function first, then replace the address of the system function after running, obviously this method will not work, because you do not know how much space to empty, and it is a waste of space.

Possible solution 👉 put a placeholder (8 bytes) in the Data section and let the code compile directly with the BL placeholder. At run time (when dyLD loads the application), change the address of the Data segment to the real NSLog address with the code BL placeholder unchanged, thus ensuring that the real implementation code is executed when NSLog is run. This technique is called PIC(Position Independent Code). (Of course, the actual implementation is not that simple.)

  • A placeholderIs calledsymbol
  • dyldwillThe data segmentsymbolModify theThis process is calledSymbol binding
  • One symbol after another is put together to form a list, called a symbol table

So, the external C function is to find the address by symbol, then, we have the opportunity to Hook the external C function dynamically. OC Method Swizzle is to modify SEL and IMP corresponding relationship, for symbols, of course, can also modify the symbol of the corresponding address. This action is called rebinding the symbol table. That’s how Fishhook Hook works.

2.2.2 Example Verification

First, call NSLog👇 before and after Hook NSLog

NSLog (@ before the "Hook"); [self hook_NSLog]; NSLog (@ "Hook");Copy the code

Then compile to see the lazy and non-lazy symbol table for Mach-o 👇

We found NSLog in lazy load table, indicating that NSLog is lazy load symbol 👉 is unbound only when called.

In MachO you can see that _NSLog’s Data is 10000064EC and offset is 0x8010.

Address before binding

And then we have NSLog(@”Hook “); On the breakpoint, LLDB debugging as follows 👇

We use the image list instruction to check that the starting address of the program is 0x0000000100624000, where the value of ASLR is 0x624000. Next we open assembly debugging 👇

Then enter theNSLog👇

Finally, we get the NSLog address in memory is 0x00000001043464EC.

Back to Mach-O, the NSLog Data value is 0x10000064EC + ASLR value 0x4340000 = 0x00000001043464EC. From this we can conclude that 👇

The Data value of the NSLog recorded in Mach-O has no ASLR(virtual address offset).

Bound address

Continue running the breakpoint to the bound NSLog, again at 👇

Program starting address 0x0000000104340000 + NSLog offset address 0x8010 to get the real address of NSLog, and then through the X instruction of LLDB to check the value stored in the initial 8 bytes is address 0x0104345650, and then through dis-S to check the assembly code corresponding to the changed address. It turns out to be LG_NSLog. It can be seen that 👇

The binding address in the lazy loading symbol table has changed.

2.3 Symbol binding process

Next, let’s analyze the process of changing the address of the binding in the lazy loading symbol table above, that is, the process of symbol binding.

  • IOS function name, variable name, method name, after the compilation will generate aThe symbol table
  • There are two types of symbols 👉Internal symbol & External symbol

2.3.1 Internal symbol: internal function, method name

Such as ViewDidLoad. The internal symbol is subdivided into 👇

  1. Local symbol👉 for internal use
  2. Global symbol👉 is also available externally
Example demonstrates

Create a new project symbolTest and define a global function code 👇

Void test(){}Copy the code

Local functions 👇

Static void test1(){NSLog(@"test1"); }Copy the code

⚠️ Note: when the App is launched, it will remove symbols, which are local symbols.

We can view all symbols in Mach-o through the dump directive 👇

Objdump –macho -t XXX (name of your macho file)

Use MachOView to check out 👇

The symbol table Symbols contains all Symbols 👉 local Symbols, global Symbols, and indirect Symbols.

2.3.2 External Symbols (Indirect symbol table)

The MachO file calls the external method name, such as NSLog, and the LLVM compile time does not know the address of the external (outside of the MachO file) method.

There is a special symbol table called Indirect Symbols. The external Symbols used, such as NSStringFromClass, generate a symbol 👇 at compile time

2.3.3 Symbol binding process

Let’s get back to the point and look at the symbol binding process. First, there is the following code 👇

- (void)viewDidLoad { [super viewDidLoad]; // Do any additional setup after loading the view. NSLog(@" external function first called "); NSLog(@" external function second call "); }Copy the code

The breakpoint breaks to the first NSLog, see assembly 👇

You can see that the two calls to NSLog are at the same address 0x102C06524, and the starting address of the program is 0x0000000102C00000 from the image list, so 0x1049EE524-0x00000001049E8000 = 0x6524, The 0 is x6524 👇

0x6524In a MachOSymbol StubsIn the. This is NSLogPiles (piles with external symbols)And has a value of1F2003D510D7005800021FD6(iscode), the code is 👇

Look at the value 👇 stored at address 0x1049EE524 in the first sentence assembly

Is the value of the NSLog pile!!

Continue with the NSLog code 👇

As can be seen from the above figure, 👇 can be known by reading the value (return value) of register X16 after NSLog assembly

Execute the code in the Symbol Stubs peg to find the code in the Symbol Stubs Symbol.

At this point, we have located the address 00000001000065CC, and the 65CC address is in __stub_helper 👇

Where the green box is executedb 0x1000065b4isSymbol bindingProcedure, we continue to execute the assembly code 👇

The first line assembly in MachO actually corresponds to ADR X17,12204, because 0x1049EE5b4-0x00000001049E8000 (program start address) = 0x65B4.

To continue, go to 👇

And the MachOdyld_stub_binderIs 👇

And the above assembly on one to one corresponding!! In fact, 🍺🍺🍺🍺 port 🍺 is executeddyld_stub_binderFrom the above, it can be concluded that 👇

The initial values in the lazy loading symbol list are functions that perform symbol binding.

Dyld_stub_binder is also an external symbol, so the next question is 👉 how to find dyLD_stub_binder.

Continue with the assembly code and go to the line 0x1049EE5C8: br x16 👇

The figure above is read by the LLDB instructionX16 registerThe address is0x0000000181041474The address isdyld_stub_binderThe implementation address of, then the value isHow to figure out? Still see a MachO 👇

This symbol is in a non-lazy load table (bound as soon as run) 👇

To sum up,For the first time,The symbol binding process 👇

  1. Once the program runs, bind firstNo-Lazy Symbol PointersIn the tabledyld_stub_binderThe value of the function.
  2. callNSLogWhen looking for firstSymbol Stubs pile, execute the code in the pile, the code in the pile is corresponding to findLazy loading of symbol tablesTo execute.
  3. The initial value in the lazy loading symbol table isNative source code, this code goes toNoLazy tableIn looking forBinding function address.
  4. And then execute itdyld_stub_binderfunctionSymbol binding.
I’m going to do NSLog

The second time I execute NSLog, I jump directly to the real address through the peg, because the address execution code is already stored in the symbol table.

summary

The entire flowchart for symbol binding is shown below at 👇

  1. The outer functionCall-time executionPile (__TEXT __stubs)The code in
  2. The code in the pile goesLazy loading symbol table (__DATA,__la_symbo_ptrl)To find the address to execute
    • 👉 is bound or the address of the bound function is called directly
    • Unbound 👉 go__TEXT,__stubhelperIn looking forThe binding function dyLD_stub_binderBind.
    • Lazy loading of symbol tablesIs saved by defaultLook for binder code
  3. Lazy loading of code to go__TEXT,__stubhelperPerforms binding code (binder functions) in.
  4. dyld_stub_binderinNon-lazy loading symbol table (__data._got)The program runs and is bound.

2.4 Finding strings by symbols

When we use Fishhook we do this by rebindnslog. name = “NSLog”; To hook NSLog. So how did Fishhook find the NSLog function symbol using the NSLog string?

According to the Symbol binding process analyzed above, we know that the binding code 👇 corresponding to the NSLog is found in the Lazy Symbol during binding

0x00008008This address, atLazy SymbolThe NSLog came inThe first one. inIndirect SymbolsYou can see that in the indirect symbol tableThe orderandLazy SymbolsIn theThe same👇

Instead, to find the symbol under Lazy Symbols, you only need to find the index under the Indirect Symbols. The next step is to determine the index. We notice that in the indirect symbol table above, the Data value corresponding to NSLog is 000000BD (hexadecimal), which is 189 when converted to decimal. This 189 represents the corner symbol 👇 of NSLog in the total symbol table (Symbols)

Notice that Data is stored in000000D4(16 mechanisms), this isNSLoginString TableThe medium offset is 👇

Calculated by the offset value0xD334“And found it_NSLog (length + initial address).

⚠️ Note:. Indicates the separator, and the function name is preceded by _

From Lazy Symbols -> Indirect Symbols -> Symbols -> String Table we have found the String by symbol. This is what FishHook does when it comes to finding symbols, by going through all the symbols and comparing them to the strings in the array to hook.

There is a diagram at fishhookgitHub that illustrates this relationship 👇

The figure above is based onsymbolTo find theClose the stringThe process of 👇

  1. Lazy Symbol Pointer TableThe close of the index1061
  2. inIndirect Symbol Table1061 corresponds to0X00003fd7(Decimal 16343)
  3. inSymbol TableLooking for a corner mark16343In the corresponding string tableOffset value 70026
  4. inString TableIn looking forFirst address + Offset (70026)To find theClose the string

Conversely, the process of finding a symbol through the string 👇

  1. inString TableFind the string in, calculateOffset value
  2. throughOffset valueinSymbolsFound in theAngle of the
  3. throughAngle of theinIndirect SymbolsTo find the correspondingsymbolYou can also get this symbolindex
  4. By findingindexinLazy SymbolsFind the corresponding inindexthesymbol.

2.5 Remove symbol & Restore symbol

The symbol itself is in the MachO file, it takes up the package size, and when we analyze someone else’s App the symbol is removed. ####2.5.1 Remove symbols

  • forAppWill,To get rid ofAll symbols (The indirect signExcept)
  • forThe dynamic librarySpeaking toReserved global symbol(External to call)
Unsigned Settings

Remove the symbol and set 👇 in Build Setting

Strip StyleExplain 👇

  • All Symbols to removeAll symbols(Except for indirect)
  • Non – Global SymbolsIn addition to global symbolsThe symbol of
  • Was Debugging Symbols to removeDebug symbols

⚠️ Note: Deployment Postprocessing 👉 set to YES is unsigned at compile time, otherwise unsigned at package time.

All Symbols

Set Deployment Postprocessing to YES and Strip Style to All Symbols, then compile and open the package location 👇

Check one too many.bcsymbolmapThe file, this file isbitcode. And then we look atMachOIn the fileSymbolsGeneral symbol table 👇

In the figure above, we see that the Value segment of the NSLog stores the address0000000000000000.valueFor the function ofImplementation Address (IMP)So the break point in the code isDo not hold off👇

Straight to the finish. If you want to break NSLog, you have to hit itSymbol breakpoint👇

Then run 👇

btThe command looks at the call stack and finds 👇

frame #0: 0x0000000182762ba8 Foundation`NSLog
frame #1: 0x0000000104e51fc4 symbolTest`___lldb_unnamed_symbol2$$symbolTest + 72
Copy the code

Test1 is a custom method which is unnamed. In this case, it’s hard to analyze the code.

The oc method call reads x0 directly, and x1 gets self and CMD, such as 👇

Next, we can playAddress the breakpointAnd then through theimage listInstruction, combinationASLR valueTo calculate theOffset value👇

In the back, you canASLR + offset valueJust hit the break point and findMethod imp addressAnd this isDynamic debugging.

2.5.2 Recovering symbols

Dynamic debugging breakpoints, use is still more troublesome, need to calculate, if you can restore the symbol of a lot of convenience.

I know that after removing all symbols in the above example, only indirect symbols are left in the Symbol Table. Although there are no symbols in the Symbol Table, the class list and method list still exist.

And that gives us recoverySymbol TableThe opportunity.

Restore order

Symbols can be restored using the restore-symbol tool (only oc, resulting from the Runtime mechanism) 👇

./restore-symbol Original Macho file -o restored file

View machO after recovery 👇

At this time you can re-sign the dynamic debugging.

Restore-symbol tool link

restore-symbol

2.6 Fishhook source code parsing

Finally, and also the focus of this article, is the fishhook source code analysis, nonsense not to say, directly on the source.

2.6.1 rebind_symbols

// Get the dyld callback first, then manually get all the images to call. Here, because we didn't specify image, we need to get all of them. int rebind_symbols(struct rebinding rebindings[], Size_t rebindingS_nel) {// the prepend_rebindings function adds the entire REbindings array to the head of the _rebindings_head list Fishhook uses a linked list to store the parameters passed in each call to Rebind_symbols. Each call inserts a node into the head of the list. The head of the list is: _rebindings_head int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel); If (retval < 0) {return retval; if (retval < 0) {return retval; } // Check whether _rebindingS_head ->next is null. if (! _dyLD_register_func_for_add_image; // The image that has been loaded by dyLD is immediately called back. The subsequent image triggers a callback when dyLD is loaded. This registers a callback to the _rebind_symbols_for_image function. _dyld_register_func_for_add_image(_rebind_symbols_for_image); // Hook uint32_t c = _dyLD_image_count (); // Image list count for (uint32_t I = 0; i < c; I ++) {image header ASLR _rebind_SYMBOLS_for_image (_DYLD_GET_image_header (I), _DYLD_GET_image_vmaddr_slide (I)); } } return retval; }Copy the code
  1. First of all byprepend_rebindingsFunction to generate a linked list of allThe function to Hook.
  2. According to the_rebindings_head->nextNull Check whether it is nullFor the first time,Call, the first call out of the system callback, the second call itself to get allimage listI’m going to iterate.
  3. They all leave at the end_rebind_symbols_for_imageFunction.
Rebindings_entry list

Where _rebindings_head is a pointer to the linked list rebindings_entry structure 👇

struct rebindings_entry { struct rebinding *rebindings; Size_t rebindings_nel; Struct rebindings_entry *next; // Next pointer to the list}; static struct rebindings_entry *_rebindings_head;Copy the code
rebind_symbols_image
int rebind_symbols_image(void *header,
                         intptr_t slide,
                         struct rebinding rebindings[],
                         size_t rebindings_nel) {
    struct rebindings_entry *rebindings_head = NULL;
    int retval = prepend_rebindings(&rebindings_head, rebindings, rebindings_nel);
    rebind_symbols_for_image(rebindings_head, (const struct mach_header *) header, slide);
    if (rebindings_head) {
      free(rebindings_head->rebindings);
    }
    free(rebindings_head);
    return retval;
}
Copy the code

The rebind_symbolS_image process is much simpler than rebind_symbols. It calls rebind_symbols_for_image directly because void * headers are specified and there is no need to iterate over all images.

2.6.2 _rebind_symbols_for_image

ASLR static void _rebind_symbols_for_image(const struct mach_header *header, Intptr_t slide) {// _rebindingS_head is the data to be exchanged, the head of the head rebind_symbolS_for_image (_rebindings_head, header, slide); }Copy the code

Call rebind_SYMBOLS_for_image directly, passing the head address.

2.6.3 rebind_symbols_for_image

// The end of the callback is this function! Three parameters: Static void rebind_SYMBOLS_for_image (struct RebindingS_entry *rebindings, Const struct mach_header *header, intptr_t slide) {/*dladdr() determines whether the specified address is in one of the loading modules (executable or shared library) that make up the address space of the process, If an address lies between the base address on which the loaded module is mapped and the highest virtual address mapped for the loaded module (including both ends), the address is considered to be in the range of the loaded module. If a loading module meets this condition, its dynamic symbol table is searched for the symbol closest to the specified address. The closest symbol is the symbol whose value is equal to, or closest but less than, the specified address. */ /* Returns 0 if the specified address is not in the range of one of the loading modules; The contents of the Dl_info structure are not modified. Otherwise, a non-zero value is returned with the fields of the Dl_info structure set. If no symbol with a value less than or equal to address is found in the loading module containing address, the dli_sNAME, dli_SADDr, and dli_size fields are set to 0. The dli_bind field is set to STB_LOCAL and the dli_type field is set to STT_NOTYPE. */ // typedef struct dl_info { // const char *dli_fname; //image image path // void *dli_fbase; // const char *dli_sname; // void *dli_saddr; } Dl_info; Dl_info info; If (dladdr(header, &info) == 0) {return; } // Here are a few variables, ready to find MachO! segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL; struct symtab_command* symtab_cmd = NULL; struct dysymtab_command* dysymtab_cmd = NULL; LoadCommand uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t); for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) { cur_seg_cmd = (segment_command_t *)cur; if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) { if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) { linkedit_segment = cur_seg_cmd; } } else if (cur_seg_cmd->cmd == LC_SYMTAB) { symtab_cmd = (struct symtab_command*)cur_seg_cmd; } else if (cur_seg_cmd->cmd == LC_DYSYMTAB) { dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd; }} // Return if (! symtab_cmd || ! dysymtab_cmd || ! linkedit_segment || ! dysymtab_cmd->nindirectsyms) { return; } // Find base symbol/string table addresses // Program base address = __LINKEDIT.VM_Address -__LINKEDIT.File_Offset + silde change value uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff; // printf(" address :%p\n",linkedit_base); Nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff); Char *strtab = (char *)(linkedit_base + symtab_cmd->stroff); // How to direct the uint32_t indices into symbol table? // How to direct the uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff); cur = (uintptr_t)header + sizeof(mach_header_t); for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) { cur_seg_cmd = (segment_command_t *)cur; If (cur_seg_cmd-> CMD == LC_SEGMENT_ARCH_DEPENDENT) {STRCMP (cur_seg_cmd->segname, SEG_DATA)! = 0 && strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) ! = 0) { continue; } for (uint j = 0; j < cur_seg_cmd->nsects; j++) { section_t *sect = (section_t *)(cur + sizeof(segment_command_t)) + j; If ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab); } if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {perform_rebinding_with_section(rebindings, SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) { sect, slide, symtab, strtab, indirect_symtab); } } } } }Copy the code

The core steps are 👇

  • According to thelinkeditandOffset valueRespectively to findAddress of the symbol tableandThe address of the string tableAs well asIndirect symbol table address.
  • traverseload commandsandThe data segmentfindLazy loading of symbol tablesandNon lazy loading symbol table.
  • Call directly when the table is foundperform_rebinding_with_sectionforhookReplace the function symbol.

2.6.4 perform_rebinding_with_section

// Slide: ASLR: //symtab: //indirect_symtab: //indirect_symtab: Static void perform_rebinding_with_section(struct RebindingS_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, Uint32_t *indirect_symtab) {// the reserveD1 field in the NL_symbol_ptr and LA_symbol_ptrsection indicates the corresponding INDEX at the start of the indirect symbol table. // Index uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; // Slide +section->addr is the array of function implementations for the symbols, that is, the corresponding function Pointers for __nl_symbol_ptr and __la_symbol_ptr are in there, so you can find the address of the function. // Indirect_symbol_Bindings is an array, and arrays are function Pointers. Data void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr); For (uint I = 0; i < section->size / sizeof(void *); Uint32_t symtab_index = indirect_symbol_indices[I]; if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL || symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) { continue; // Uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; // Get the first address of symbol_name + offset. Char *symbol_name = strtab + strtab_offset; Symbol_name_longer_than_1 = symbol_name[0] &&symbol_name [1]; symbol_name_longer_than_1 = symbol_name[0] &&symbol_name [1]; Hook struct rebindingS_entry *cur = rebindings; hook struct rebindings_entry *cur = rebindings; while (cur) { for (uint j = 0; j < cur->rebindings_nel; J++) {// if the name of the function is the same as that of the function symbol_name[1]. If (symbol_name_longer_than_1 && STRCMP (&symbol_name[1], [j]. Name) == 0) {// Determine that the address of the replacement is not NULL. The method to be replaced is different from the method of rebindings[j]. if (cur->rebindings[j].replaced ! = NULL && indirect_symbol_bindings[i] ! = cur->rebindings[j].replacement) {// Change the function address of indirect_symbol_bindings[I]. *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } // the lazy/non-lazy table will be replaced by the lazy/non-lazy table. indirect_symbol_bindings[i] = cur->rebindings[j].replacement; // The replacement completes the jump to the outer loop to the next data in the (lazy-loaded/non-lazy-loaded) array. goto symbol_loop; }} // Find the next function in the array of functions you want to replace. cur = cur->next; } symbol_loop:; }}Copy the code

Core steps 👇

  1. First of all byLazy-loaded/non-lazy-loaded symbol tableandIndirect symbol tableFind allindexAngle,reserved1To confirm theLazy loading and non-lazy loading symbolsIn the indirect symbol tableindexValue.

  1. willLazy-loaded/non-lazy-loaded symbol tabletheThe Data valueIn theIndirect_symbol_bindings arrayIn the.

  1. traverseLazy-loaded/non-lazy-loaded symbol table👇
  • readindirect_symbol_indicesFind the symbol inIndrect Symbol TablePut values into a tablesymtab_index.
  • In order tosymtab_indexAs a subscript, accesssymbol tableAnd getstring tablethestrtab_offsetOffset value.
  • According to thestrtab_offsetOffset gets the character addresssymbol_nameCharacter name.
  • To iterate overrebindingsLinked lists (i.e. custom Hook data)
  • judge&symbol_name[1]andrebindings[j].nameAre the names of both functionsconsistentAs well as judge charactersWhether the length is greater than 1 heart step 👇.
  • The same👉 saves the original address to the custom function pointer (if there is no replacement, it will not be saved). And use the target function to Hookreplacementreplaceindirect_symbol_bindingsAnd we’re done hereHook.

conclusion