In iOS development, apps are compiled directly into machine code and run on the CPU, rather than compiled into bytecode and run using an interpreter. From the compilation to the running of app, there are several steps to go through compilation, linking and startup. In iOS, the compilation stage is divided into the front end and the back end. The front end uses Clang developed by Apple, and the back end uses LLVM.

compile

The compilation process mainly includes

  • pretreatment
  • Lexical analysis
  • Syntax analysis
  • Static analysis
  • Intermediate code generation
  • The assembler to generate
  • The link generates the executable

pretreatment

In the preprocessing phase, the compiler Clang preprocesses our code first, doing things like replacing macros into the code, removing comments, and processing precompiled commands

Lexical analysis

At this stage, the lexical analyzer reads the preprocessed codestream, processes the characters into meaningful sequences of morphemes, generates lexical units and marks positions for each morpheme, and then enters the next step. The main purpose of this process is to lay the groundwork for generating the syntax tree in the next step.

Syntax analysis

In this step, the lexical units generated in lexical analysis are used to Abstract a syntax tree (AST). Each node in the abstract syntax tree also marks its location in the source code.The abstract syntax tree is much more iterated than the source code block, and this step is mainly for later static analysis.

Static analysis | intermediate code generation

Once the source code has been converted into an abstract syntax tree, the compiler can traverse the tree for static analysis. ** Common type checks, syntax errors, undefined methods, and so on are found and handled in static analysis, but there is much more that static analysis can do. ** After static analysis, the compiler generates IR. IR is the intermediate product of the whole compiler link system, and it is a form close to machine code, but it is platform-independent, and can generate machine code of multiple platforms through IR. IR is the demarcation point between front-end Clang and back-end LLVM in iOS builds. Clang’s task ends when the IR is generated, and LLVM starts working when the IR is delivered to LLVM.

The assembler to generate

After obtaining IR, LLVM can optimize IR according to optimization strategies, such as tail recursive optimization, circular optimization and global variable optimization. After optimization, LLVM calls the assembly generator to convert the IR into assembly code. At this point, the resulting.o file is a binary file. After generating binary files, we can further optimize our compiled products by means of binary rearrangement, which has achieved the purpose of reducing the size of the compiled products and optimizing the startup speed

link

After compiling the source code into a.o file, start linking. Linking is basically a packaging process that takes all the compiled.o files and links them together with some files such as dylib,.a and TBD to create a Mach-O file. At this point, the compilation process is complete and the executable file Mach -o has been generated. Before linking, symbols are not bound to memory addresses or registers, especially symbols defined in other modules. In the linking stage, the linker completes the above work, binding the symbols except the dynamic library symbols, and linking the object files into an executable file

Mach-o file structure

  • Header
    • Header contains general information about the binary, byte order, schema type, number of load instructions, and so on. This allows you to quickly verify information such as whether the current file is 32-bit or 64-bit, the corresponding processor, and the file type
  • Load Commands
    • It’s a table with a lot of content. The content includes the location of the region, symbol table, dynamic symbol table, etc. This section follows the Header and is used to determine the distribution of memory when loading the Mach-O file
  • Data
    • Data is usually the largest part of the object file, containing seinterfaces specific Data, such as static C strings, OC methods with/without parameters, and C functions with/without parameters. When running an executable, the virtual memory system maps segments to the process’s address space.
    • Segment __PAGEZERO Specifies how much of the process’s address space is unreadable
    • Segment __TEXT contains executable binary code
    • Segment __DATA contains the data to be changed
    • The Segment __LINKEDIT contains metadata for methods and variables, code signatures, and more.

Static link

Compilation is mainly divided into static linking and dynamic linking. What happens in the compiler stage is static linking, which is the process mentioned above. This stage links the various object files and libraries (or Module in Swift) generated earlier to generate an executable file, Mach -o.


run

loading

A program from executable file to run, basically through two stages of loading and dynamic library link. Because static library linking is done before the executable is generated, all source code and static libraries are loaded at load time, while dynamic library linking requires the dynamic linking mentioned below.

An executable, or program, is a static concept, while a process is a dynamic concept. Each program in the run up, he will have the corresponding process independent address space, and the address space is determined by the computer hardware (CPU digits), of course, the process is just thought he was going to have a computer in the address space, in fact, he is Shared with other processes the computer’s memory (virtualization)

Load is the process of mapping executable files on hard disk to virtual memory.

The loading process, also known as the process creation process, generally has the following steps.

  • Create a separate virtual address space
  • Read the executable header to establish the mapping between the virtual address space and the executable. (Bind relative addresses in the executable to addresses in the virtual address space)
  • The CPU instruction register is set to the entry address of the executable file, handed over to the CPU to start running

Dynamic link

A static link is a linked static library that needs to be linked into a Mach-O file. If it needs to be updated, it needs to be recompiled, so it cannot be dynamically updated and loaded. Dynamic link is the use of DYLD dynamic loading dynamic library, can achieve dynamic loading and update. And other processes, frameworks are linked to the same dynamic library, saving memory.

Some commonly used frameworks in iOS, such as UIKit and Foundation, use dynamic linking. In order to save memory, the system places these libraries in Dyld shared cache.

In mach-O files, symbols belonging to dynamic libraries are marked as undefined, but their names and paths are recorded. At runtime, DYLD will import the dynamic library through Dlopen and DLSYM, find the corresponding dynamic library through the record path, find the corresponding address through the record name, and bind the symbol and address.

Dlopen maps dynamic libraries to the virtual address space of the process. Since the loaded dynamic library may also have undefined symbols, that is, the dynamic library depends on other dynamic libraries, more dynamic libraries are triggered to be loaded, but Dlopen can decide whether to load these dependencies immediately or later.

Dlopen opens the dynamic library and returns the reference pointer. Dlsym uses the dynamic library pointer and function symbol returned by Dlopen to get the address of the function and then use it.

Dynamic link solves the problem that static link occupies too much memory and recompiles and packages as long as there are library changes, but it also introduces new problems.

  • Complex structure, dynamic linking delays relocation until runtime.
  • Security issues were introduced, which was the basis for our ability to do PLT HOOK
  • Performance issues

When it comes to dynamic library links, we have to mention dyLD in iOS.

dyld – Dynamic Link Editor

Dyld is a dynamic linker developed by Apple, which is an important part of Apple system. It is responsible for dynamic library linking of Mach-O files and program launching. The code is open source

  • Start the process

Start the project and set a symbolic breakpoint at _objc_init. Xcode will set the breakpoint for us before the main method executes. After entering the LLDB and using the bt command, we can see the call stack before the _objc_init method.

As you can see, DYLD was started first. Dyld :bootstrap is used to bootstrap dyLD. Since the dynamic linker is itself a shared object, it also needs redirection work. So in order to avoid the problem of loop redirection, dynamic linkers need to have some features relative to other shared objects. The first is that it cannot depend on other shared objects, and the second is that its redirection work can be done by itself. This type of startup code with certain restrictions is called bootstrap.

Due to the complexity of DYLD, I will not expand it in detail here and leave it for another article. The general startup process is as follows

  • Dyld starts to initialize the program binary
  • The ImageLoader reads the image, which contains our class, method, and other symbols
  • Since the Runtime binds the callback to dyLD, when the image is loaded into memory, DyLD tells the Runtime to process it
  • The Runtime takes over and calls map_images for parsing and processing, then calls call_load_methods in load_images to iterate over all the loaded classes. Call the Class +load method and its Category +load method by inheritance hierarchy

So the workflow of the dynamic linker is

  1. Dynamic linker bootstrap (dynamic linker address in the executable file.interpSegment) – >
  2. Load shared objects (merge to generate global symbol table at this step) ->
  3. Relocatable (traverse the executable and the relocatable table of each shared object to correct the locations in the GLOBAL offset /PLT that need to be relocated) ->
  4. Init (execute the code in the shared object.init section, which is executed by the program initialization code for the process) ->
  5. Return control to the entry of the program

Write in the last

In the process of writing this article, I systematically studied the process of APP from compilation to operation. In the compilation phase, static linking and dynamic linking this compilation principle related knowledge is very important, you can read the compilation principle book. In the runtime phase, DYLD does a lot of work before the main function is executed, and its implementation is also very complicated. I will write a note focusing on DYLD after studying it carefully.