An iOS programmer's Self-cultivation (5) Mach-O file dynamic linking

Self-cultivation of an iOS programmer (I) Compile and link
What’s in Mach-O
An iOS programmer’s Self-cultivation (iii) Mach-O file static links
Self-cultivation of an iOS programmer (iv) Executable file loading
An iOS programmer’s Self-cultivation (5) Mach-O file dynamic linking
The Self-cultivation of an iOS Programmer (6) Dynamically linked Applications: The Fishhook Principle
The self-cultivation of an iOS programmer (7) Static link Applications: The Principle of static library staking
Self-cultivation of an iOS programmer (8) Memory

Why dynamic linking

Why do you need dynamic links when you have static links?

In the statically linked case, for example, there are two programs Program1 and Program2, and they share a lib. o external module, so there are two copies of the output executable Program1 and Program2. When you run Program1 Program2 at the same time, there are two copies of lib. o both in the memory and in the disk. When there are a large number of target files like lib. o in the memory, the memory space is greatly wasted. For example, lib. o is provided by a third party manufacturer. When the third party manufacturer updates lib. o, Program1 has to link to lib. o again and release it to users. This results in any minor change to lib.o requiring users to download the entire program again.

To solve the problem of space waste and update the difficulties both the simplest way is to put the program module separated from each other, form a separate file, rather than a static linking them together again, simply speaking, is not of the application of the target file link, wait until runtime link again, this is the basic idea of dynamic linking.

Simple dynamic linking example

// ViewController.m #import "ViewController.h" #import "TestDyld.h" @interface ViewController () @end @implementation ViewController - (void)viewDidLoad { [super viewDidLoad]; // Do any additional setup after loading the view. [TestDyld testPrint]; swap(10, 20); NSString *aString = bString; NSString *enterBG = UIApplicationDidEnterBackgroundNotification; } @end // TestDyld.m #import "TestDyld.h" NSString *bString = @"test"; @implementation TestDyld + (void)testPrint {NSLog(@" test "); } void swap(int a, int b) { a+b; } @endCopy the code

M, viewController.m, and TestDyld. M are merged into TestDyld’s Mach-o file using the xcrun command. Disassemble it through MachOView to explore the principle of dynamic linking.

As shown in the figure above, the LC_LOAD_DYLIB load command is specifically used to load dynamic libraries that the main module of the program depends on at dynamic linking time. The parameters to the LC_LOAD_DYLIB command describe the basic dylib information:

struct dylib { union lc_str name; // Path uint32_t timestamp; // Uint32_t current_version; // uint32_t compatibility_version; // compatible version of dylib};Copy the code

In the process of dynamic linking, the operating system will load dyLD dynamic linker through LC_LOAD_DYLINKER load command, and then load other Dylib libraries through DYLD, and then bind and relocate various libraries. For example, Foundation and UIKit in this picture.

Address Independent code (PIC)

How do dynamic libraries such as Foundation or UIKit locate in the process’s virtual address space when they are loaded by DYLD?

Almost all applications on iOS use Foundation and UIKit. In the static link part, we know that the program’s instructions and data may contain references to some absolute addresses, so we need to determine the address to load the dynamic library. If different libraries are loaded to the same address, there will be target address conflicts. To solve the problem of fixed loading address, the first thought is to be able to again when loading to relocation of dynamic libraries, but loading relocation is not suitable for solving this problem, the dynamic library after being loaded mapping to the virtual space, instruction part is to be Shared between multiple processes, and not like a static library every program has a copy. Since relocation requires modification of the instruction jump target address in the library, the instruction part of the shared library cannot be shared by multiple processes, thus losing the great memory saving advantage of dynamic linking.

To share when loading instruction part of the don’t need because of the change of address to change, you need to put a part of the instructions that need to be modified, put together, with the data portion of the instruction part can guarantee the same, and the data part can have a copy in each process, this is not address code technology.

There are four types of address references in modules:

Function calls and jumps inside modules.
Data access within a module, such as global and static variables defined within the module.
Function calls and jumps outside the module.
Data access outside a module, such as global variables defined in other modules.

Module internal instruction jump

Because the relative positions within the same module are fixed, jumps and function calls within the module can be called from relative addresses. Instruction jump is used in x86callThe instruction, the instruction code isE8This is mentioned in static linksNear address relative displacement call instructionThis command does not require relocation. Addressing is somewhat different on the ARM64 architecture, the ARM64 jump usesblThe instruction, the instruction code is94or97, 97 represents forward jump. 94 represents backward jump. Its offset calculation formula is as follows:(target address - instruction address) / 4. Explore the addressing process of swap by MachOView: By looking up the symbol table, you know that the target address of swap is 0x10007EC4.

Module internal data access

Data access inside the module can also not contain absolute address reference, but relative to the module jump BL instruction is more complex, ARM throughadrp+addTo get the address of the data. The following is an assembly instruction that accesses the bString string:

Before understanding ADRP instruction, we should first understand ADR instruction.

Adr command:

A small range of address reading instructions. The ADR instruction reads the address value based on the relative offset of the PC into the register. The signed 21-bit offset, plus PC, is written to a general-purpose register that can be used to compute the valid address of any byte in the +/- 1MB range.

Adrp command:

A wide range of address reading instructions in pages. The symbol extends a 21-bit offset (IMMhi + IMMlo), moves 12 bits to the left, clears the lower 12 bits of the value of PC, and then adds the two, writing the result to the X8 register, Used to get the base address of a 4KB aligned memory region containing the bString (that is, the address of the bString must fall in the 4KB memory region), which can be addressed in the +/ -4GB range (power 2^33).

In plain English, the ADRP instruction is to do PC+ IMM (offset) and find a 4KB page where the bString is located, then get the base address of the bString, and then add the offset to address it.

Adrp +add addresses bString as follows:

0xB0000008 = 1011 0000 0000 0000 0000 0000 0000 1000 immlo = 01 immhi = 0000 0000 0000 0000 000 imm = immlo + immhi = 0x01 IMM << 12 = 0x1000 PC = 0x10007EAC Low 12 Bit Clearing of PC = 0x10007000 0x10007000 + 0x1000 = 0x10008000 Base address of aString. Use add command to complete offset in base address: target address: 0x10008000 + 0x2A0 = 0x100082A0Copy the code

You can see that 0x100082A0 is the data stored in the data segment.

Data access between modules

Data access between modules is slightly more troublesome than data access within modules, because the destination address for data access between modules is not determined until load time. Such as in the above example UIApplicationDidEnterBackgroundNotification are defined in the UIKit, and the address to determine when loading. We mentioned earlier that to make code address independent, the basic idea is to put the address-related parts in the data segment. Mach-o has a Global Offset Table (Global Offset Table). When the code needs to reference the Global Offset Table, it can reference the Global Offset Table indirectly.

When a variable in the global offset table is accessed, the program will find the destination address of the variable according to the corresponding item in the global offset table. As shown below:

Can see access constants x10008000 UIApplicationDidEnterBackgroundNotification address is 0, the specific way of addressing and module internal addressing the same way, this address is located in the red box marked place GOT, The real address of this symbol is 0x0, which is determined at load time. GOT period of relative to the current instruction of migration is at compile time can determine, GOT corresponds to each address in which variable or constant is determined by the compile time, such as the first address corresponds to the UIApplicationDidEnterBackgroundNotification constants. The global offset segment itself is placed in the data segment, so it can be modified when modules are loaded, and each process can have independent copies of each other, so that data access between modules becomes address-independent.

Call and jump between modules

The global offset system (GOT) is the same as the data access between modules, but the corresponding item in the GLOBAL offset system (GOT) stores the address of the target function.

Summary PIC implementation is as follows:

Delayed binding (PLT)

background

Dynamic linking is a performance sacrifice compared to static linking. The main reason is that dynamic linking requires complex GLOBAL global and static data access and calls between modules, and then indirect addressing, so that the program must run slowly. Another reason is that the dynamic linker has to search and relocate all the dynamic libraries, which will inevitably slow down the program startup speed.

Implementation of deferred binding

The basic idea is to bind (the symbol lookup and relocation process) only when the function is first used. The common way to call external module functions is to make an indirect jump through the corresponding item in the GLOBAL offset. In the x86 architecture, PLT adds another layer of indirect jumps to this process for delayed binding. For example, in a dynamic library there is a bar() function whose address in the PLT is called bar@plt. ELF executables under Linux implement the following:

bar@plt:
jmp *(bar@GOT)
push n
push moduleID
jump _dl_runtime_resolve
Copy the code

bar@GOT indicates the entry corresponding to bar() in the global offset segment. If the entry has a value, that is, it is bound, then jump to bar() directly. The linker doesn’t actually fill in the bar() address to delay binding, and then goes into push n and pushes a number n onto the stack that corresponds to the subscript in the relocation table. Then push the module ID onto the stack and jump to _dl_runtime_resolve. This is essentially a matter of finding out what method bar() is called on which module and then calling _dl_runtime_resolve (dyLD_STUB_binder on iOS) to relocate and jump to the next call.

Analysis of Mach-O dynamic linking process

Another important loading command during dynamic linking is LC_DYSYMTAB, the dynamic symbol table, which is a subset of the symbol table and contains only symbols associated with dynamic linking. Essentially an array of indexes, that is, the contents of each entry are an index value that (starting from 0) points to the corresponding symbol in the symbol table.

In addition to the LC_DYSYMTAB Segment, there are two important sections in Mach-o: __got and __stubs. The former stores mainly global variables or constants, while the latter stores references to functions. Constant or variable references between modules is relatively small, reference too much will produce certain coupling, and the function call in between modules is very frequently, so the two of the binding is divided into Non – Lazy and Lazy two kinds, the former is in the process of dynamic link symbol relocation and binding, and the latter is used in the first time for binding.

got

When the image file is loaded, the DYLD dynamic linker relocates the symbol corresponding to each entry in the GLOBAL offset segment to fill in its real address. So how does Dyld find out where the symbols in the GOT are in the symbol table? Each segment is defined by the LC_SEGMENT command. The following parameter describes the section information contained in the segment: section_64

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	...
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};
Copy the code

For got, STUBS, reserved1 specifies the starting index of the list entry in the dynamic Symbol table. Index +n, and then search the symbol in the global symbol table according to the index returned by ynamic symbol table, this process pseudo-code is as follows:

__got[0]->symbol = symbolTable[indirectSymbolTable[__got.sectionHeader.reserved1]] // -> __got.sectionHeader.reserved1 == 2 // -> indirectSymbolTable[2] == 2 // -> symbolTable[2] = Symbol(_kHelloPrefix) // -> __got[0]->symbol = Symbol (_kHelloPrefix) similarly __got [1] - > Symbol = symbolTable [indirectSymbolTable [__got. SectionHeader. Reserved1 + 1]] / / - > __got.sectionHeader.reserved1 + 1 == 3 // -> indirectSymbolTable[3] == 4 // -> symbolTable[2] = Symbol(dyld_stub_binder)  // -> __got[0]->symbol = Symbol(dyld_stub_binder)Copy the code

So let’s disassemble a MachOView and look at UIKitUIApplicationDidEnterBackgroundNotificationSymbol relocation process:

Stubs, LA_SYMBOL_ptr, stub_helper

In Mach-O, a reference to a function will eventually point to a STUBS segment. The following is a code segment’s NSLog call instruction. 0x100007F08 is in the STUBS segment:

Disassemble the STUBS segment using otool as follows:

0x100008010 is located in the __la_symbol_ptr segment. Not just NSLog, all functions that refer to the outside end up in the __la_symbol_ptr segment. Let’s see what’s inside __la_symbol_ptr.

All items in __la_symbol_ptr point to an assembly instruction in __stub_helper, which eventually jumps to the br instruction on line 6, the target address 0x100008008, This address stores the dyLD_stub_binder function in the section(__DATA __got). Dyld_stub_binder is a function that looks for the addresses of external functions. The Lazy binding symbol is triggered by dyLD_STUB_binder and must be pre-bound. So it’s in the same segment as constants and global variables. Finally, the real address of the NSLog instruction is backfilled into the __la_symbol_ptr segment. The whole process is summarized as follows:

The first time you access NSLog:

The NSLog corresponding __la_symbol_ptr entry content points to __stub_helper.
The code logic in __stub_helper eventually calls dyLD_stub_binder through various iterations.
The dyLD_STUB_binder function finds the real address of the NSLog symbol by calling a function inside dyLD.
Dyld_stub_binder writes the address to the __la_symbol_ptr corresponding function.
Dyld_stub_binder jumps to the real address of the NSLog symbol.
The next time you access NSLog, jump to the __la_symbol_ptr segment and jump directly to the real address of the symbol.

reference

Self-cultivation of the Programmer
Blog.csdn.net/liao392781/…
www.jianshu.com/p/9e4ccd3cb…

An iOS programmer’s Self-cultivation (5) Mach-O file dynamic linking