0x0 References

Self-cultivation of the Programmer

The Mach – O and dynamic link | not bad blog

IOS Programmer self-cultivation -MachO file dynamic link (4)

0 x1 background

Recently, I was asked by interns about the concepts related to __stubs, __stub_stub_helper, __la_symbol_ptr and __got in Mach-O.

I’ve studied how these sections work many times before, but I can’t systematically describe them to others (embarrassing).

It is not easy to explain this clearly, so I summarized it systematically over the weekend.

Afraid that the next time someone asks me, I will not be able to explain, so I decided to write an article to record it.

Ps: The intern has already read the first three chapters of Self-Cultivation for Programmers, and readers need to know these chapters as well

0x2 Basic Concepts

Links are divided into static links and dynamic links. Symbols in the “relocatable object file” referenced in the code are resolved when statically linked; Symbols in the referenced “dynamic library” are parsed during dynamic linking, a process called Binding. Binding is a Lazy Binding and a non-lazy Binding.

All the sections mentioned above are related to Binding in the process of dynamic linking, which is also difficult to understand.

0 x3 Demo experience

To explain this, you have to start with a Demo.

Make a dynamic library

Start by writing a say.c file:

// say.c
#include <stdio.h>
char *kHelloPrefix = "Hello";
void say(char *prefix, char *name) {
    printf("%s, %s\n", prefix, name);
}
Copy the code

This file defines a string constant kHelloPrefix and a function say.

Compile it into a dynamic library using the Clang directive.

$ clang -shared say.c -o say.dylib
Copy the code

-shared indicates that a dynamic library can be dynamically linked to an executable file. Resources also uses the -fpIC option when compiling dynamic libraries. I tested that this option is enabled by default for clang-shared libraries, so I didn’t add it.

When you are done, you can use the file directive to view the types of the file:

$ file say.dylib
> say.dylib: Mach-O 64-bit dynamically linked shared library x86_64
Copy the code

The kHelloPrefix and say symbols defined in the say.dylib dynamic library are also saved in the Export Info of Mach-o. The symbols in Export Info are GLOBAL and can be used by other modules:

Make a relocatable intermediate file

Write another main.c file:

//main.c
void say(char *prefix, char *name);
extern char *kHelloPrefix;
int main(void) {
    say(kHelloPrefix, "Jack");
    return 0;
}
Copy the code

The say and kHelloPrefix symbols are used in main.c.

Compile it into a relocatable object file using Clang:

$ clang -c main.c -o main.o
The meaning of # -c can be used to query through man clang, meaning that only object files are generated, and no link is made to generate executable files
$ file main.o
> main.o: Mach-O 64-bit object x86_64
Copy the code

In the main.o relocatable target file, symbols are marked as relocations, and the linker relocations them when linking:

Make an executable and execute it

Use clang to link main.o and libsay.dylib to an executable:

$ clang main.o -o main -L . -l say
$ file main
> main: Mach-O 64-bit executable x86_64
Copy the code

-l.. indicates the current path, that is, to find the library to link under the current path. -l say indicates the link libsay.dylib

Since libsay.dylib was specified when linking, the linker finds kHelloPrefix and say in libsay.dylib and marks them in the executable main, but does not calculate their address, as shown in the following figure:

When the executable file main is executed, main and libsay.dylib will be loaded, these symbols will be dynamically bound by the dynamic linker dyld, and the program will execute correctly:

$ ./main # execute program
> Hello, Jack # correct output
Copy the code

0x4 Binding

From a logical point of view, symbols fall into two categories, “data” and “functions”.

Binding of these two symbols is called a non-lazy binding and a lazy binding.

Non-lazy binding means that the binding takes place immediately during dynamic linking, resolving the real address of the symbol.

Lazy binding means binding symbols only when they are used.

Non-Lazy Binding

The address of dyLD_STUB_binder and dataprint in the dynamic library used by the program are resolved during dynamic linking. The addresses of these symbols are stored in the __DATA __got, with an initial value of 0, and then overwritten by the real address. This is called a non-lazy binding.

The procedure for accessing data type symbols in the dynamic library is described below.

Use otool to view the assembly code of the main executable __TEXT __TEXT:

1:  $ otool -tv main
2:  main:
3:  (__TEXT,__text) section
4:  _main:
5:  0000000100003f60        pushq        %rbp
6:  0000000100003f61        movq        %rsp, %rbp
7:  0000000100003f64        subq        $0x10, %rsp
8:  0000000100003f68        movq        0x91(%rip), %rax
9:  0000000100003f6f        movl        $0x0, -0x4(%rbp)
10: 0000000100003f76        movq        (%rax), %rdi
11: 0000000100003f79        leaq        0x2e(%rip), %rsi
12: 0000000100003f80        callq        0x100003f8e
13: 0000000100003f85        xorl        %eax, %eax
14: 0000000100003f87        addq        $0x10, %rsp
15: 0000000100003f8b        popq        %rbp
16: 0000000100003f8c        retq
Copy the code

Line 8 is the instruction to access the kHelloPrefix symbol.

Movq 0x91(%rip), %rax means to store the value of 0x91(%rip) in the RAX register.

The RIP register stores the address of the next instruction 0000000100003F6F.

The value of 0x91(%rip) is 0x91 + 0x000000100003F6F = 0x100004000.

0x100004000 is the address of the first element in __DATA __got, which stores the address of kHelloPrefix.

Thus, when the program accesses the data type symbol in the dynamic library, it actually looks for the address in __DATA __got.

Dyld_stub_binder, as described later, is special and must be resolved in advance during the dynamic linking phase.

Lazy Binding

Symbols for function types in dynamic libraries are not bound during dynamic linking, because programs use function symbols in dynamic libraries heavily (far more than data symbols), and resolving them during dynamic linking can slow down program startup. And even if these symbols are parsed, they may not be used during program execution. So to avoid wasting startup time, these function symbols are parsed the first time they are used, which is called Lazy Binding.

The following details the process by which a program accesses function symbols in a dynamic library for the first time.

  1. useotoolTo viewmainExecutable file__TEXT __textAssembly code for:
1:  $ otool -tv main
2:  main:
3:  (__TEXT,__text) section
4:  _main:
5:  0000000100003f60        pushq        %rbp
6:  0000000100003f61        movq        %rsp, %rbp
7:  0000000100003f64        subq        $0x10, %rsp
8:  0000000100003f68        movq        0x91(%rip), %rax
9:  0000000100003f6f        movl        $0x0, -0x4(%rbp)
10: 0000000100003f76        movq        (%rax), %rdi
11: 0000000100003f79        leaq        0x2e(%rip), %rsi
12: 0000000100003f80        callq        0x100003f8e
13: 0000000100003f85        xorl        %eax, %eax
14: 0000000100003f87        addq        $0x10, %rsp
15: 0000000100003f8b        popq        %rbp
16: 0000000100003f8c        retq
Copy the code

Line 12 calls the say function to the address 0x100003f8e.

0x100003F8e is located in __TEXT __stubs, and the reference to the function symbol of the dynamic library in __TEXT points to __stubs.

  1. useotoolTo view__TEXT __stubsAssembly code for:
$ otool -v main -s __TEXT __stubs
main:
Contents of (__TEXT,__stubs) section
0000000100003f8e        jmpq        *0x406c(%rip)
Copy the code

In this example, there is only one line of instruction JMPQ *0x406c(%rip).

JMPQ calculates the address referred to by 0x406c(%rip) and takes the value (*) as the address to jump to.

The RIP register stores the address of the next instruction, that is 0x100003F949 = 0x0000000100003F8E + 0x6 (this instruction takes up 6 bytes).

0x406C (%rip) = 0x100008000= 0x406C + 0x100003F949.

  1. address0x100008000Located in the__DATA __la_symbol_ptr, the stored value is100003FA4.
  2. 100003FA4__TEXT __stub_helperThe address of the assembly instruction in.
  3. useotoolTo view__TEXT __stub_helperAssembly code for:
1: $ otool -v main -s __TEXT __stub_helper
2: main:
3: Contents of (__TEXT,__stub_helper) section
4: 0000000100003f94        leaq        0x406d(%rip), %r11
5: 0000000100003f9b        pushq        %r11
6: 0000000100003f9d        jmpq        *0x65(%rip)
7: 0000000100003fa3        nop
8: 0000000100003fa4        pushq        $0x0
9: 000000100003fa9        jmp        0x100003f94
Copy the code

100003FA4 is located in line 8, the instruction will return to line 4 to continue execution when it reaches line 9, and will jump to the value stored in 100004008 = 0x65 + 0x ‘ ‘100003FA3 address to execute it when it reaches line 6.

  1. 100004008__DATA __gotThe second element in thedyld_stub_binderAddress of the function.

Dyld_stub_binder is called (the address of dyLD_STUB_binder was resolved during dynamic linking), and it looks for the address of say. Write the address of say to the __DATA __la_symbol_ptr data segment in step 3, replace 100003FA4, and call say.

Above, I have divided the program’s first access to a function symbol in a dynamic library into six steps. Later, when the program calls say again, the __DATA __la_symbol_ptr in step 3 will find the address of SAY and call it directly.

0 x 5 summarizes

The data symbol in the dynamic library referenced by the program and the function symbol dyLD_STUB_binder, which is bound during dynamic linking, are non-lazy bindings.

The function symbol in the dynamic library referenced by the program is Lazy Binding, which is not bound until the first call.

Call from __text to __stubs, and from __stubs find the address of the __stub_helper instruction stored in __la_symbol_ptr and execute it. It then jumps to __got and executes the dyLD_stub_binder function to address and call the function. Finally, when the address is found, the function is called and the value in __la_symbol_ptr is modified and the function is called.

The second time the function is called, it is called from __text to __stubs, and from __stubs it finds the address of the function stored in __la_symbol_ptr and makes the call.

__stubs can be thought of as a table, and each entry is a small piece of JMP code called a “symbol pile” that finds and jumps to the symbolic execution of functions in the dynamic library.

Since dynamic library function symbols are lazily loaded, __stub first JMP needs to find instructions in __stub_helper to execute the dyLD_STUB_binder addressing function for addressing and modification. Thus __stub_helper can be understood as an auxiliary section of __stubs.

Here’s another chart to conclude:

I think the interns get it.