directory

  • Self-cultivation for iOS Programmers – Introduction (Zero)
  • Self-cultivation for iOS Programmers – Compile, Link process (Part 1)
  • IOS Programmer self-cultivation -MachO File Structure Analysis (II)
  • IOS Programmer self-cultivation -MachO file Static link (3)
  • IOS Programmer self-cultivation -MachO file dynamic link (4)
  • Self-cultivation for iOS Programmers — The Fishhook Principle (part 5)

This article will use the following examples to analyze:

// extern int global_var; // extern int global_var; void func(int a); intmain() {
    int a = 100;
    func(a+global_var);
    return0; } = = = = = = = = = = = = = = = = = = = = = = = = = / / biggest file int global_var = 1; void func(int a) { global_var = a; } = = = = = = = = = = = = = = = = = = = = = = = = = / / generated a.o b.o xcrun - SDK iphoneos clang -c a.c biggest - target arm64 - apple - ios12.2 / / A.o and b.O are linked to executable files ab xcrun - SDK iphoneOS clang A.O.B.O -o ab - Target arm64-Apple -ios12.2Copy the code

Note that the generated A.O and B.O object files are arm64 based. The a.o and b.O object files are statically linked to generate the executable file AB. Libsystem.b.dylib (system library) is also involved in the linking process due to arm64, but this article ignores dynamic linking and only discusses static linking. If based on X86, there is no dynamic library involved. The next article will be devoted to dynamic linking.)

Two concepts are introduced here: modules and symbols.

  1. Module: We can understand a source code file as a module. For example, modules A and B above. Now we write a program that could not have all the code in a source file, are all points of the module, general a class in a source file, became a module, modularity benefits is to reuse, maintenance, and compile time, unchanged module, without recompiling, directly compiled before using the cache.
  2. Symbols: simple to understand the function name and variable name, such as the above three symbols: global_var, main, func

Space and address allocation

Merging of similar segments

Static linking: Enter multiple object files and output one file (usually an executable file). In this process, sections of the same nature from multiple object files are merged together. For example, the above a.o and B.O object files are merged into the executable ab. The merging process is that the code segments in a.o and b.o are merged into the code segments in ab, and the data segments in the same way, the data segments in the two object files are merged into the data segments in ab…

Step two links

Step 1 Space and address allocation

Scan all the input object files, and obtain the length, attributes and positions of their segments. Collect all the symbol definitions and symbol references (that is, the definitions and references of functions and variables) in the symbol table of the input object file (explained in detail below) and put them into a global symbol table. In this step, the linker can obtain the length of all the segments of the input target file, combine them, calculate the combined length and position of each segment in the output file, and establish a mapping relationship.

Step 2 symbol analysis and relocation

Using the information collected in the first step above, read the data in the middle of the input file, relocate information, and perform symbol resolution and relocation, adjust the address in the code, etc.

relocation

The global_var and func symbols are used in the a module.

In the A.O object file:

In the ab executable after the link:

Relocation table

How the linker knows which instructions in module A are to be adjusted and how to adjust them. In fact, there is a relocation table in a.o that stores relocation related information. And the reloff (offset in relocation table) and nRELOc (several symbols that need relocation) of the section_64 header for each section let the linker know which section of module A’s instructions need to be adjusted.

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};
Copy the code

A relocation table can be thought of as an array with elements in the structure relocation_info.

Struct relocation_info {int32_t r_address; struct relocation_info {int32_t r_address; /* offsetin the section to what is being
				   relocated */
   uint32_t     r_symbolnum:24,	/* symbol index if r_extern == 1 or section
				   ordinal ifr_extern == 0 */ r_pcrel:1, /* was relocated pc relative already */ r_length:2, /* 0=byte, 1=word, 2=long, 3=quad */ r_extern:1, /* does not include value of sym referenced */ r_type:4; / *if not 0, machine specific relocation type */
};
Copy the code

Each argument is commented out, and r_address and r_length are enough to let us know which bytes to relocate; R_symbolnum (when external) is the index of the symbol table. Other parameters can be ignored.

R_symbolnum (when used as an external symbol) is the index of the symbol table

Load command — symbol table

Struct symtab_command {uint32_t CMD; /* LC_SYMTAB */ uint32_t cmdsize; /* sizeof(struct symtab_command) */ uint32_t symoff; /* symbol table offset */ uint32_t nsyms; /* number of symbol table entries */ uint32_t stroff; /* string table offset */ uint32_t strsize; /* string table sizein bytes */
};
Copy the code

As mentioned in the previous article, the first two arguments to the load command are CMD and cmdsize. Symoff and nSYms of the symbol table load commands tell the linker the position (offset) and number of symbol tables; Stroff and strsize tell the position and size of the string table.

The symbol table is also an array in which the element is the structure nlist_64

struct nlist_64 { union { uint32_t n_strx; /* index into the string table */ } n_un; uint8_t n_type; / *type flag, see below */
    uint8_t  n_sect;       /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
Copy the code

N_un historical reasons, ignored; N_strx string table index, you can find the string corresponding to the symbol; N_sect Number of sections; Address value of the n_valuen symbol. If you are interested, you can check the <mach-o/nlist.h> header file.

Symbol resolution

From an ordinary programmer’s point of view, why link? Because one module (module A) may reference symbols from another (module B), you need to link all modules (object files) together. Relocation is: the linker will look up the global symbol table composed of all the input object file symbol table, find the corresponding symbol for relocation. There are two common mistakes:

  1. “Ld: dumplicate symbols”, the same symbols are stored in multiple object files, causing the same symbols to appear in the global symbol table.
  2. “Undefined symbols”, the symbol that needs to be relocated, is not found in the global symbol table (a symbol: referenced, Undefined).

Static library link

A static library can be viewed simply as a set of object files, that is, multiple object files compressed and packaged to form a file.

Static library link: a module is linked to a module in the static library (using an object file, or multiple object files) into an executable. It’s the same concept as static linking, except that here we’re taking one or more object files from the static library along with our own object files as input.

Static libraries typically contain multiple object files, and one object file may have only one function. Because the linker links static libraries in object files. If we put all the functions in one object file, we might only use one function, but link many useless functions together in the executable file.

reference

  1. Programmer Self-cultivation – Linking, loading, and Libraries