+

Before the order

This is the second part of the iOS Gods Trail, which will cover how symbols in the linker are bound to addresses. If you want to understand the relevant knowledge of iOS launch, I believe that through a few blogs will deepen everyone to launch what in the end do! Welcome to like blog and pay attention to myself, later will continue to share more dry goods for everyone to analyze reference comments!!

The research on App startup and the underlying framework will be divided into five blogs with the following logic:

  • App system kernel loading

  • LLVM+Clang+ compiler + linker — Value preservation

  • App startup optimization idea 【 Advanced road 3 】

  • Binary rearrangement can be seen in this way.

  • How to monitor App startup? [Road to Progress 5]

Background and Issues

For programmers who have experienced several projects, are they often curious to participate in so many projects, why some compile quickly, while others are slow; Once compiled, some start up fast, while others are slow. Once you understand what the linker does at compile and startup time, you can find the answer to these questions at the root. With that in mind, what exactly does the linker do at compile and launch time?

Prepare foreplay

To cover compilation and linking, the basic concepts are as follows:

1.1 Compiled and interpreted languages

Programming languages are divided into compiled languages and interpreted languages, and their execution processes are also different.

Compiled languages

Compiled languages use compilers to write code directly into machine code and then run the machine code directly on the CPU, making it more efficient and faster. C, C++, OC and other languages, are used by the compiler, generate related executable files.

Explain language

Interpreted languages use interpreters. The program is translated into machine language at runtime, so it runs slower than compiled languages.

1.2 Compilers and interpreters

The advantage of using compiler-generated machine code execution is high efficiency, but the disadvantage is long debugging cycle

The advantage of interpreter execution is that it is easy to write and debug, but the disadvantage is that the execution efficiency is low

The compiler

A program that converts one programming language (the source language) into another programming language (the target language) is called a compilerCopy the code

The interpreter

The interpreter interprets the executing code at runtime, takes a piece of code, translates it into object code (known as Bytecode), and executes the object code sentence by sentence. That is to say, parsing the code at run time is naturally less efficient than running the compiled executable directly, but after running, you can directly modify the code to see the effect without restarting the compilation, similar to hot update, which can help us shorten the development cycle and function update cycle of the whole program.

Second, the LLVM

2.1 before the LLVM array

Translate the following

The LLVM project is a collection of modular, reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines. The name “LLVM” itself is not an acronym; It is the full name of the project.

The Association for Computing Machinery of America (ACM) has presented its 2012 Software Systems Award to LLVM. Previous recipients of this award include: Java, Apache, Mosaic, The World Wide Web, Smalltalk, UNIX, Eclipse, and moreCopy the code

Chris Lattner, the founder of Swift and LLVM, hopes that his portrait rights are not infringed. Out of respect, I shall not be used for commercial purposes.

2.2 architecture

2.2.1 Traditional compiler architecture

  • The Frontend: front-end

The main tasks are: lexical analysis, syntax analysis, semantic analysis, generation of intermediate code

  • Optimizer

The main task is: intermediate code optimization

  • Backend

The main task is: generate machine code

2.2.2 LLVM architecture

Interpretation of the

  • The different front and back ends use the uniform intermediate code LLVM IR
  • If you need to support a new programming language, you just need to implement a new front end
  • If you need to support a new hardware device, you just need to implement a new back end
  • The optimization phase is a generic phase that addresses the uniform LLVM IR and does not need to be modified to support either new programming languages or new hardware devices
  • In contrast, the front end and back end of GCC are not very separate; the front end and back end are coupled together. So it becomes particularly difficult for GCC to support a new language, or to support a new target platform
  • LLVM is now used as a common infrastructure to implement a variety of static and runtime compiled languages (GCC family, Java,.NET, Python, Ruby, Scheme, Haskell, D, etc.)

Third, Clang

3.1 before the Clang array

Translate the following

The Clang project provides a language front end and tool infrastructure for the LLVM project’s C language family (C, C ++, Objective C/ C ++, OpenCL, CUDA, and RenderScript). A GCC-compliant compiler driver (CLANG) and an MSVC-compliant compiler driver (clang-cl.exe) are provided. You can get the source code and build it now.

To simplify: Clang is a front end to an LLVM architecture based C/C++/Objective-C compiler

Compared with GCC, Clang has the following advantages:

  • Fast compilation: Clang compiles significantly faster than GCC on some platforms

  • Small footprint: The AST generated by Clang takes up about one-fifth of the memory of GCC

  • The design is clear and simple, easy to understand, and easy to expand and enhance

  • Modular design: Clang uses a library-based modular design for easy IDE integration and reuse for other purposes

3.2 Differences between Clang and LLVM

LLVM in broad sense: The entire LLVM architecture

LLVM in a narrow sense: LLVM backend [code optimization, object code generation, etc.]

Four, compile,

4.1 Compilation Process

Compilers are divided into front end and back end ** Objective-C/C/C++ compilers use clang front end and LLVM back end **

  • The front end is responsible for lexical analysis, grammar analysis, generation of intermediate code;
  • The back end takes intermediate code as input for architecture-independent code optimization, followed by different machine code generation for different architectures

The following is a flow chart of the compilation process ** *

Here are the details of the flowchart:

  1. Preprocessor: Clang preprocesses code, such as putting macros in place, removing comments, and conditional compilation being processed.
  2. Lexical analysis: The lexical analyzer reads the byte stream of the source file and organizes it into a sequence of morphemes. For each morpheme, the lexical analyzer generates a lexical unit [token] as the output and uses Loc to record position.
  3. Parsing: This step parses the stream of markup generated by lexical parsing into an abstract syntax tree. Again, each node in this step marks its location in the source code.
  4. Static analysis: After converting the source code into an abstract syntax tree, the compiler can statically divide the tree. Static analysis improves code quality by checking for errors, such as variables defined but not used. Finally, AST generates IR, which is a language closer to machine code. The difference is that it is platform-independent. Multiple machine codes suitable for different platforms can be generated through IR. The static analysis phase carries out type checking, such as setting a property to an object that does not match its own type, and the compiler gives a warning that it may be used incorrectly.
  5. Intermediate code generation and optimization: LLVM will compile and optimize the code at this stage; Such as global variable optimization, circular optimization, tail recursive optimization, etc., the final output assembly code XX.ll file;
  6. Link: The linker will compile the resulting. O files and (dylib,a, TBD) files to produce a Mach-o file. Mach-o file-level executables. The compilation process is complete and the executable file Mach -o is generated

4.2 the actual combat

The demo code is as follows

#import <Foundation/Foundation.h>

#define aa 10
int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        NSObject *obj = [[NSObject alloc] init];
        id __weak obj1 = obj;
        NSLog(@"------%@--%d--",[obj1 class],aa);
        
    }
    return 0;
}
Copy the code

4.2.1 Preprocessor

The project directory is displayed

Preprocessing commands:

xcrun clang -E main.m
Copy the code

The generated code is as follows: [During preprocessing, comments are removed, conditional compilation is processed, and macro definitions are put in place]

int main(int argc, const char * argv[]) {
    @autoreleasepool {

        NSObject *obj = [[NSObject alloc] init];
        id __attribute__((objc_ownership(weak))) obj1 = obj;
        NSLog(@"------%@--%d--",[obj1 class],10);

    }
    return 0;
}
Copy the code

4.2.2 Lexical Analysis [Lexical Anaysis]

Main tasks: The lexical analyzer reads the byte stream of the source file and organizes it into a sequence of morphemes. For each morpheme, the lexical analyzer generates a lexical unit [token] as the output and uses Loc to record position.

Use the following command:

xcrun clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code

Run the result:

annot_module_include '#import <Foundation/Foundation.h> #define aa 10 int main(int argc, const char * argv[]) { @autoreleasep' Loc=<main.m:8:1> int 'int' [StartOfLine] Loc=<main.m:11:1> identifier 'main' [LeadingSpace] Loc=<main.m:11:5> l_paren '(' Loc=<main.m:11:9> int 'int' Loc=<main.m:11:10> identifier 'argc' [LeadingSpace] Loc=<main.m:11:14> comma ',' Loc=<main.m:11:18> const 'const' [LeadingSpace] Loc=<main.m:11:20> char 'char' [LeadingSpace] Loc=<main.m:11:26> star '*' [LeadingSpace] Loc=<main.m:11:31> identifier 'argv' [LeadingSpace] Loc=<main.m:11:33> l_square '[' Loc=<main.m:11:37> r_square ']' Loc=<main.m:11:38> r_paren ')' Loc=<main.m:11:39> l_brace '{' [LeadingSpace] Loc=<main.m:11:41> at '@' [StartOfLine] [LeadingSpace] Loc=<main.m:12:5> identifier 'autoreleasepool' Loc=<main.m:12:6> l_brace '{' [LeadingSpace] Loc=<main.m:12:22> identifier 'NSObject' [StartOfLine] [LeadingSpace] Loc=<main.m:14:9> star '*' [LeadingSpace] Loc=<main.m:14:18> identifier 'obj' Loc=<main.m:14:19> equal '=' [LeadingSpace] Loc=<main.m:14:23> l_square '[' [LeadingSpace] Loc=<main.m:14:25> l_square '[' Loc=<main.m:14:26> identifier 'NSObject' Loc=<main.m:14:27> identifier 'alloc' [LeadingSpace] Loc=<main.m:14:36> r_square ']' Loc=<main.m:14:41> identifier 'init' [LeadingSpace] Loc=<main.m:14:43> r_square ']' Loc=<main.m:14:47> semi '; ' Loc=<main.m:14:48> identifier 'id' [StartOfLine] [LeadingSpace] Loc=<main.m:15:9> __attribute '__attribute__' [LeadingSpace] Loc=<main.m:15:12 <Spelling=<built-in>:329:16>> l_paren '(' Loc=<main.m:15:12 <Spelling=<built-in>:329:29>> l_paren '(' Loc=<main.m:15:12 <Spelling=<built-in>:329:30>> identifier 'objc_ownership' Loc=<main.m:15:12 <Spelling=<built-in>:329:31>> l_paren '(' Loc=<main.m:15:12 <Spelling=<built-in>:329:45>> identifier 'weak' Loc=<main.m:15:12 <Spelling=<built-in>:329:46>> r_paren ')' Loc=<main.m:15:12 <Spelling=<built-in>:329:50>> r_paren ')' Loc=<main.m:15:12 <Spelling=<built-in>:329:51>> r_paren ')' Loc=<main.m:15:12 <Spelling=<built-in>:329:52>> identifier 'obj1' [LeadingSpace] Loc=<main.m:15:19> equal '=' [LeadingSpace] Loc=<main.m:15:24> identifier 'obj' [LeadingSpace] Loc=<main.m:15:26> semi '; ' Loc=<main.m:15:29> identifier 'NSLog' [StartOfLine] [LeadingSpace] Loc=<main.m:16:9> l_paren '(' Loc=<main.m:16:14> at  '@' Loc=<main.m:16:15> string_literal '"------%@--%d--"' Loc=<main.m:16:16> comma ',' Loc=<main.m:16:32> l_square '[' Loc=<main.m:16:33> identifier 'obj1' Loc=<main.m:16:34> identifier 'class' [LeadingSpace] Loc=<main.m:16:39> r_square ']' Loc=<main.m:16:44> comma ',' Loc=<main.m:16:45> numeric_constant '10' Loc=<main.m:16:46 <Spelling=main.m:10:12>> r_paren ')' Loc=<main.m:16:48> semi '; ' Loc=<main.m:16:49> r_brace '}' [StartOfLine] [LeadingSpace] Loc=<main.m:18:5> return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:19:5> numeric_constant '0' [LeadingSpace] Loc=<main.m:19:12> semi '; ' Loc=<main.m:19:13> r_brace '}' [StartOfLine] Loc=<main.m:20:1> eof '' Loc=<main.m:20:2>Copy the code

4.3 Intermediate code generation and optimization

Main tasks: LLVM will compile and optimize the code at this stage; For example, global variable optimization, circular optimization, tail recursive optimization, etc., the final output assembly code XX.ll file

Use the following command:

clang -O3 -S -emit-llvm main.m -o main.ll
Copy the code

Multiple files in main.m

View file contents:

; ModuleID = 'main.m' source_filename = "main.m" target datalayout = "E-m: O-P270:32:32-P271:32:32-p272:64-i64:64-f80:128-N8:16:32:644-s128" target triple = "x86_64-apple-macosx10.15.0" %struct._class_t = type { %struct._class_t*, %struct._class_t*, %struct._objc_cache*, i8* (i8*, i8*)**, %struct._class_ro_t* } %struct._objc_cache = type opaque %struct._class_ro_t = type { i32, i32, i32, i8*, i8*, %struct.__method_list_t*, %struct._objc_protocol_list*, %struct._ivar_list_t*, i8*, %struct._prop_list_t* } %struct.__method_list_t = type { i32, i32, [0 x %struct._objc_method] } %struct._objc_method = type { i8*, i8*, i8* } %struct._objc_protocol_list = type { i64, [0 x %struct._protocol_t*] } %struct._protocol_t = type { i8*, i8*, %struct._objc_protocol_list*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct._prop_list_t*, i32, i32, i8**, i8*, %struct._prop_list_t* } %struct._ivar_list_t = type { i32, i32, [0 x %struct._ivar_t] } %struct._ivar_t = type { i64*, i8*, i8*, i32, i32 } %struct._prop_list_t = type { i32, i32, [0 x %struct._prop_t] } %struct._prop_t = type { i8*, i8* } %struct.__NSConstantString_tag = type { i32*, i32, i8*, i64 } @"OBJC_CLASS_$_NSObject" = external global %struct._class_t @"OBJC_CLASSLIST_REFERENCES_$_" = internal global %struct._class_t* @"OBJC_CLASS_$_NSObject", section "__DATA,__objc_classrefs,regular,no_dead_strip", align 8 @__CFConstantStringClassReference = external global [0 x i32] @.str = private unnamed_addr constant [15 x i8] c"------%@--%d--\00", section "__TEXT,__cstring,cstring_literals", align 1 @_unnamed_cfstring_ = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* @__CFConstantStringClassReference, i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str, i32 0, i32 0), i64 14 }, section "__DATA,__cfstring", align 8 #0 @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (%struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_" to i8*)], section "llvm.metadata" ; Function Attrs: ssp uwtable define i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #1 { %3 = tail call i8* @llvm.objc.autoreleasePoolPush() #2 %4 = load i8*, i8** bitcast (%struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_" to i8**), align 8 %5 = tail call i8* @objc_alloc_init(i8* %4) %6 = tail call i8* @objc_opt_class(i8* %5) notail call void (i8*, ...). @NSLog(i8* bitcast (%struct.__NSConstantString_tag* @_unnamed_cfstring_ to i8*), i8* %6, i32 10) tail call void @llvm.objc.autoreleasePoolPop(i8* %3) ret i32 0 } ; Function Attrs: nounwind declare i8* @llvm.objc.autoreleasePoolPush() #2 declare i8* @objc_alloc_init(i8*) local_unnamed_addr declare void @NSLog(i8*, ...) local_unnamed_addr #3 declare i8* @objc_opt_class(i8*) local_unnamed_addr ; Function Attrs: nounwind declare void @llvm.objc.autoreleasePoolPop(i8*) #2 attributes #0 = { "objc_arc_inert" } attributes #1 = { ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "Target - the features" = "+ cx16, + cx8, + FXSR, + MMX, + sahf, + sse, + sse2, + sse3, + sse4.1, + ssse3, + x87" "unsafe - the fp - math" = "false" "use-soft-float"="false" } attributes #2 = { nounwind } attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "Target - the features" = "+ cx16, + cx8, + FXSR, + MMX, + sahf, + sse, + sse2, + sse3, + sse4.1, + ssse3, + x87" "unsafe - the fp - math" = "false" "use-soft-float"="false" } ! llvm.module.flags = ! {! 0,! 1,! 2,! 3,! 4,! 5,! 6,! 7}! llvm.ident = ! {! 8}! 0 =! {i32 2, !" SDK Version", [3 x i32] [i32 10, i32 15, i32 6]} ! 1 =! {i32 1, !" Objective-C Version", i32 2} ! 2 =! {i32 1, !" Objective-C Image Info Version", i32 0} ! 3 =! {i32 1, !" Objective-C Image Info Section", !" __DATA,__objc_imageinfo,regular,no_dead_strip"} ! 4 =! {i32 4, !" Objective-C Garbage Collection", i32 0} ! 5 =! {i32 1, !" Objective-C Class Properties", i32 64} ! 6 =! {i32 1, !" wchar_size", i32 4} ! 7 =! {i32 7, !" PIC Level", i32 2} ! 8 =! {!" Apple Clang Version 12.0.0 (clang-1200.0.32.2)"}Copy the code

4.4 Generating Assembly

Use the following command:

xcrun clang -S -o - main.m | open -f
Copy the code

View file contents:

.section __TEXT,__text,regular,pure_instructions .build_version macos, 10, 15 sdk_version 10, 15, 6 .globl _main ## -- Begin function main .p2align 4, 0x90 _main: ## @main .cfi_startproc ## %bb.0: pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp subq $32, %rsp movl $0, -4(%rbp) movl %edi, -8(%rbp) movq %rsi, -16(%rbp) callq _objc_autoreleasePoolPush movq _OBJC_CLASSLIST_REFERENCES_$_(%rip), %rcx movq %rcx, %rdi movq %rax, -32(%rbp) ## 8-byte Spill callq _objc_alloc_init movq %rax, -24(%rbp) movq -24(%rbp), %rax movq %rax, %rdi callq _objc_opt_class leaq L__unnamed_cfstring_(%rip), %rcx movq %rcx, %rdi movq %rax, %rsi movl $10, %edx movb $0, %al callq _NSLog movq -32(%rbp), %rdi ## 8-byte Reload callq _objc_autoreleasePoolPop xorl %eax, %eax addq $32, %rsp popq %rbp retq .cfi_endproc ## -- End function .section __DATA,__objc_classrefs,regular,no_dead_strip .p2align 3 ##  @"OBJC_CLASSLIST_REFERENCES_$_" _OBJC_CLASSLIST_REFERENCES_$_: .quad _OBJC_CLASS_$_NSObject .section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "------%@--%d--" .section __DATA,__cfstring .p2align 3 ## @_unnamed_cfstring_ L__unnamed_cfstring_: .quad ___CFConstantStringClassReference .long 1992 ## 0x7c8 .space 4 .quad L_.str .quad 14 ## 0xe .section __DATA,__objc_imageinfo,regular,no_dead_strip L_OBJC_IMAGE_INFO: .long 0 .long 64 .subsections_via_symbolsCopy the code

The assembler takes assembly code as input, converts assembly code into machine code, and finally outputs object files.

Use the command again:

xcrun clang -fmodules -c main.m -o main.o
Copy the code

Found multiple files main.o in directory

Contents of main.o file:

4.5 the link

The linker combines the compiled.o files with the (dylib,a, TBD) files to generate a Mach-o file.

Use the following command:

xcrun clang main.o -o main
Copy the code

An executable binary Mach-O file is generated

This is what xcode-build does when the App is compiled. Next Part

5. Linker

The main function of the linker is to bind symbols to addresses.

5.1 Compile-time linker tasks

The linker is used at compile time to perform tasks such as variables, function symbols and other bindings.

The contents of the Mach-O file are mainly code and data: code is the definition of a function; Data is the definition of a global variable, including its initial value. Instances of both code and data need to be associated by symbols.

==> From the above, you may have a question: why do you let the linker do symbol and address binding? What’s wrong with not binding?

If addresses and symbols are not bound, to let the machine know what memory address you are operating on, you need to tell the memory address of each instruction when you write the code. However, the result is poor readability and maintainability, for example, if you try to modify or maintain it later, the developer will crash because the code is bound to the memory address too early. The first thought to solve this problem was to use assembly language to make the binding lag. As programming languages evolved, we quickly realized that using any high-level programming language could solve the problem of premature code and memory binding, while eliminating the annoyance of writing programs in assembly, so we put it in the linker.

From the above, you might wonder why the linker would want to merge multiple Mach-O files in a project into one.

Variables and interface functions between files in the project are interdependent, so you need to bind the symbols and addresses of multiple Mach-O files generated in the project through a linker. Without this binding process, a mach-O file generated from a single file would not work properly. Because if the runtime encounters a call to a function implemented in another file, the address of the calling function will not be found and execution cannot continue. In the process of linking multiple object files, the linker creates a symbol table that records all defined and undefined symbols. If the same symbols appear during linking, an error message “LD: dumplicate Symbols” will appear. If no symbols are found in another object file, the error message “Undefined Symbols” is displayed.

What are the main things the linker does with your code?

  • Look in the project file for variables that are not defined in the object code file.

  • Scan the different files in the project, collect all symbol definitions and reference addresses, and place them in the global symbol table.

  • Calculate the length and position after merging, generate segments of the same type for merging, and establish binding.

  • Address relocation of variables in different files in the project.

The linker will help you sort out which functions are not called when sorting out the symbolic call relationships. So how does that work?

When the linker collates function calls, it follows each reference from the main function and marks it as live. After following, functions that are not marked live are useless. The linker can then automatically remove unwanted code by turning on Dead Code stripping switches. Also, this switch is on by default.

5.2 Dynamic Library Link Dyld- Another great use of connectors

In real iOS development, you will find that many features are already available, not only for you, but also for other apps, such as GUI framework, I/O, networking, etc. Linking these shared libraries to your Mach-O files is also done through the linker.

The linked common libraries are divided into static libraries and dynamic libraries. Static libraries are compile-time linked libraries that need to be linked into your Mach-O file. If you need to update them, you have to compile them again. Dynamic libraries are run-time linked libraries that can be dynamically loaded using DYLD.

The Mach-O file is the result of compilation, whereas the dynamic library is linked at run time and does not participate in compilation and linking of the Mach-O file, so the Mach-O file does not contain the symbol definition of the dynamic library. That is, the symbols are shown as “undefined,” but their names and the path to the corresponding library are recorded. When the runtime imports dynamic libraries through Dlopen and DLSYM, it first finds the corresponding library path according to the record, and then finds the binding address through the record name symbol.

Dlopen will load the shared library into the address space of the running process. The loaded shared library will also have undefined symbols, which will trigger more shared libraries to be loaded. Dlopen also has the option of parsing all references at once or doing so later. Dlopen opens the dynamic library and returns the reference pointer. Dlsym uses the dynamic library pointer and function symbol returned by Dlopen to get the address of the function and then use it.

At the beginning of the loading process, address offset will be corrected, iOS will use ASLR address offset to avoid attacks, determine non-lazy Pointer address for symbolic address binding, load all classes, Finally, the load method and the constructor constructor function of the Clang Attribute are executed.

5.3 Dyld link actual combat

The demo code is as follows

#import <Foundation/Foundation.h> @interface Person : NSObject - (void)eat; @end #import "person. h" @implementation Person - (void)eat {NSLog(@" eat apple "); } @end #import <Foundation/Foundation.h> #import "Person.h" int main(int argc, const char * argv[]) { @autoreleasepool { Person *person = [[Person alloc]init]; [person eat]; } return 0; }Copy the code

5.3.1 Compiling Multiple Files

Use the following command:

xcrun clang -c Person.m
xcrun clang -c main.m  
Copy the code

Generate two more files main.o and person. o from the directory above

5.3.2 Link compiled files to generate an A. out executable file

Use the following command:

Xcrun clang main. O Person. O - Wl, ` xcrun - show - SDK - path ` / System/Library/Frameworks/Foundation framework/FoundationCopy the code

The generated file is as follows:

Through the command

xcrun nm -nm a.out
Copy the code

View the contents of A. out as shown below

(undefined) external _NSLog (from Foundation)
                 (undefined) external _OBJC_CLASS_$_NSObject (from libobjc)
                 (undefined) external _OBJC_METACLASS_$_NSObject (from libobjc)
                 (undefined) external ___CFConstantStringClassReference (from CoreFoundation)
                 (undefined) external __objc_empty_cache (from libobjc)
                 (undefined) external _objc_alloc_init (from libobjc)
                 (undefined) external _objc_autoreleasePoolPop (from libobjc)
                 (undefined) external _objc_autoreleasePoolPush (from libobjc)
                 (undefined) external _objc_msgSend (from libobjc)
                 (undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003ec0 (__TEXT,__text) external _main
0000000100003f20 (__TEXT,__text) non-external -[Person eat]
0000000100008020 (__DATA,__objc_const) non-external __OBJC_METACLASS_RO_$_Person
0000000100008068 (__DATA,__objc_const) non-external __OBJC_$_INSTANCE_METHODS_Person
0000000100008088 (__DATA,__objc_const) non-external __OBJC_CLASS_RO_$_Person
00000001000080e0 (__DATA,__objc_data) external _OBJC_METACLASS_$_Person
0000000100008108 (__DATA,__objc_data) external _OBJC_CLASS_$_Person
0000000100008130 (__DATA,__data) non-external __dyld_private
Copy the code

Because undefined means that the file class is undefined, the linker will try to resolve all undefined symbols when linking objects to Foundation Framework dynamic libraries.

Dylib is a dynamically linked format, which is not compiled into the executable file at compile time. It is linked at program execution time, so that the package size is not counted, and the library can be updated without updating the executable.

For details about the order in which dynamic linkers work, check out Dyld Linking.

In short, Dyld does a few things:

  • First execute the Mach-o file, load the corresponding dynamic library according to the undefined symbol in the mach-O file, and the system will set a shared cache to solve the recursive dependency problem of loading.

  • After loading, the symbol of undefined is bound to the corresponding address in the dynamic library.

  • Finally, the +load method is processed, and static terminator is run after main returns.

5.4 Dynamic linker application – speed up compilation and debugging

The compilation and debugging of iOS native code is carried out by compiling and restarting the App over and over again. Therefore, the larger the amount of project code, the longer the compilation time. Although we can speed up the compilation by compiling part of the code into binary first and integrating it into the project to avoid full compilation every time, even so, we still need to restart the App and go through the debugging process again every time we compile. Here is a tool to speed up compilation and debugging.

Injection for Xcode [Tool address]

John Holdsworth developed a tool called Injection that dynamically executes Swift or Objective-C code in a running program to speed up debugging without restarting the program.

Under the use way is clone code, build InjectionPluginLite/InjectionPlugin xcodeproj; To do this, run the following line of code from the terminal:

rm -rf ~/Library/Application\ Support/Developer/Shared/Xcode/Plug-ins/InjectionPlugin.xcplugin
Copy the code

Once the build is complete, we can compile the project. Add a new method:

- (void)injected
{
    NSLog(@"I've been injected: %@", self);
}
Copy the code

Njection listens for changes to the source code file. If the file is changed, the Injection Server will perform rebuildClass to recompile and package it into a dynamic library, that is, a.dylib file. WriteSting notifies apps running on sockets after compiling, packaging as a dynamic library. The code for writeString is as follows:

- (BOOL)writeString:(NSString *)string { const char *utf8 = string.UTF8String; uint32_t length = (uint32_t)strlen(utf8); if (write(clientSocket, &length, sizeof length) ! = sizeof length || write(clientSocket, utf8, length) ! = length) return FALSE; return TRUE; }Copy the code

The Server sends and listens on Socket messages in the background, which is implemented in the runInBackground method of injectionServer.mm. The Client also opens a background to send and listen to Socket messages, which is implemented in the runInBackground method in injectionClient.mm.

After receiving the message, the Client invokes inject(tmpFile: String) to dynamically replace the class during runtime. Inject (tmpFile: String) method concrete implementation code, the concrete code is here

The inject(tmpFile: String) method mostly replaces old classes dynamically with new ones. Inject (tmpFile: String) TmpFile is the file path of the dynamic library. How does the dynamic library load into the executable? Concrete implementation in inject(tmpFile: String) method start, as follows:

let newClasses = try SwiftEval.instance.loadAndInject(tmpfile: tmpfile)
Copy the code

See SwiftEval. Instance. LoadAndInject (tmpfile: tmpfile) code implementation of this method:

@objc func loadAndInject(tmpfile: String, oldClass: AnyClass? = nil) throws -> [AnyClass] { print("???? Loading .dylib - Ignore any duplicate class warning..." ) // load patched .dylib into process with new version of class guard let dl = dlopen("\(tmpfile).dylib", RTLD_NOW) else { throw evalError("dlopen() error: \(String(cString: dlerror()))") } print("???? Loaded .dylib - Ignore any duplicate class warning..." ) if oldClass ! = nil { // find patched version of class using symbol for existing var info = Dl_info() guard dladdr(unsafeBitCast(oldClass, to: UnsafeRawPointer.self), &info) ! = 0 else { throw evalError("Could not locate class symbol") } debug(String(cString: info.dli_sname)) guard let newSymbol = dlsym(dl, info.dli_sname) else { throw evalError("Could not locate newly loaded class symbol") } return [unsafeBitCast(newSymbol, to: AnyClass.self)] } else { // grep out symbols for classes being injected from object file try injectGenerics(tmpfile: tmpfile, handle: dl) guard shell(command: """ \(xcodeDev)/Toolchains/XcodeDefault.xctoolchain/usr/bin/nm \(tmpfile).o | grep -E ' S _OBJC_CLASS_\\$_| _(_T0|\\$S).*CN$' | awk '{print $3}' >\(tmpfile).classes """) else { throw evalError("Could not list class symbols") } guard var symbols = (try? String(contentsOfFile: "\(tmpfile).classes"))? .components(separatedBy: "\n") else { throw evalError("Could not load class symbol list") } symbols.removeLast() return Set(symbols.flatMap { dlsym(dl, String($0.dropFirst())) }).map { unsafeBitCast($0, to: AnyClass.self) }Copy the code

In the code above, you can see the dynamic library loading function dlopen

guard let dl = dlopen("\(tmpfile).dylib", RTLD_NOW) else {
    throw evalError("dlopen() error: \(String(cString: dlerror()))")
}
Copy the code

As shown in the code above, dlopen will load the tmpFile dynamic library into the running App and return the pointer dl. Next, DLSYM gets the symbolic address of the TMPFile dynamic library and can handle the class replacement. The corresponding code for dlSYM call is as follows:

guard let newSymbol = dlsym(dl, info.dli_sname) else {
    throw evalError("Could not locate newly loaded class symbol")
}
Copy the code

Once the class’s methods have been replaced, we can begin to redraw the interface. There is no need to recompile and restart the App during the whole process, so the purpose of using dynamic library method for fast debugging is achieved.

Conclusion: The working principle of Injection can be drawn according to the above text description and code demonstration, as follows:

conclusion

This article today shares the basic content and application scenarios of compilers and linkers in detail. Only by constantly understanding and laying a good foundation of the underlying knowledge can we use them to improve development efficiency and provide users with more stable and better performance apps.

This blog is the third blog of App startup, I hope to help you, and thank you for your praise and attention to me, common progress, mutual encouragement!!