This is the 29th day of my participation in the August Wenwen Challenge.More challenges in August


My column

  1. Explore the underlying principles of iOS
  2. Summary of iOS underlying principles of exploration

preface

Today, we will have a simple understanding and use of apple’s LLVM architecture compiler framework system. And after understanding the LLVM compilation process, let’s write a Clang plug-in to play around. Ok, without further ado, let’s get started.

LLVM

What is the LLVM

LLVM is a framework for compilen, written in C++, Used to optimize compile-time, link-time, run-time and idle-time of programs written in any programming language, open to developers, compatible with existing scripts. The LLVM program was initiated in 2000 by Dr. Chris Lattner of UIUC University. Chris Lattner joined Apple Inc. in 2006. Well is committed to the application of LLVM in Apple development system. Apple is also a major funder of the LLVM program.

Currently LLVM has been adopted by Apple iOS development tools, Xilinx Vivado, Facebook, Google and other major companies.

  • Let’s look at the differences between compiled and interpreted languages:
    1. Interpreted languages: read code and execute it directly;
    2. Compiled language: first compiled into a BINARY file that can be executed by the CPU, the parameter can be executed.

Traditional compiler design

graph LR
12((sourceCode)) --> 1[Frontend] --> 2[Optimizer] --> 3[Backend] --> 13((MachineCode))

Compiler Frontend

The task of the compiler front end is to parse the source code. It will perform: lexical analysis, Syntax analysis, semantic analysis, check the source code for errors, then build the Abstract Syntax Tree (AST),LLVM’s front end will generate intermediate representation (1R) code.

Optimizer

The optimizer is responsible for various optimizations. Improve code runtime, such as eliminating redundant calculations.

Backend/CodeGenerator

Map code to the target instruction set. Generate machine language and perform machine-specific code optimizations.

The iOS compiler architecture

Objective C/C/C++ uses a compiler with Clang front end, Swift Swift and LLVM back end.

graph LR
1(Clang) --> 2[LLVM Optimizer] --> 3[LLVM CODE Generator]
1-1(Swift) --> 2

The design of the LLVM

The most important part of LLVM comes when the compiler decides to support multiple source languages or multiple hardware architectures. Other compilers, such as GCC, have a very successful approach, but because it was designed as a whole application, their usefulness is very limited. The most important aspect of LLVM design is the use of common code representation (IR), which is the form used to represent code in the compiler. So LLVM can write a separate front end for any programming language and a separate back end for any hardware architecture.

graph LR
1(( C )) --> 2(Clang C/C++/ObjC Fraontend) --> 3[LLVM Optimizer] --> 4[LLVM X86 Backend] --> 5((X86))

1-1((Fortran)) --> 2-1(llvm-gcc Frontend) --> 3
1-2((Haskell)) --> 2-2(GHC Frontend) --> 3

3 --> 4-1[LLVM PowerPC Backend] --> 5-1((PowerPC))
3 --> 4-2[LLVM ARM Backend] --> 5-2((ARM))

Clang

Clang is a subproject of the LLVM project. It is a lightweight compiler based on the LLVM architecture that was created as an alternative to GCC to provide faster compilation times. It is responsible for compiling C, C++, objecte-C compiler, it belongs to the whole LLVM architecture, compiler front-end. For developers, there are many benefits to exploring Clang.

The compilation process

The compile phase of the source code can be printed by command

clang -ccc-print-phases main.m

Pretreatment stage

Run the following command

clang -E main.m

Compilation phase

Lexical analysis

After the preprocessing is complete, a lexical analysis is performed, where the code is sliced into tokens such as brackets, equals, and strings.

clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m

Syntax analysis

Lexical analysis is followed by grammatical analysis, whose task is to verify whether the syntax is correct. On the basis of lexical analysis, word sequences are combined into various grammatical phrases, such as “program”, “statement”, “expression” and so on, and then all nodes are formed into Abstract Syntax Tree (AST). The parser determines whether the source program is structurally correct.

clang -fmodules -fsyntax-only -Xclang -ast-dump main.m

Generate intermediate code IR (Intermediate Representation)

clang -S -fobjc-arc -emit-llvm main.m

The optimization of the IR

The optimization levels of LLVM are -O0-O1-O2-O3-OS (the first one is uppercase English letter O).

clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll

Generating assembly code

We generate assembly code from the final.bc or.ll code

clang -S -fobjc-arc main.bc -o main.s

clang -S -fobjc-arc main.ll -o main.s

Generating assembly code can also be optimized

clang -Os -S -fobjc-arc main.m -o main.s

Generate object file (assembler)

The generation of the object file is that the assembler takes the assembly code as input, converts the assembly code into machine code, and finally outputs the object file.

clang -fmodules -c main.s -o main.o

Using the nm command, check the symbols in main.o:

$xcrun nm -nm main.o
                   (undefined) external _printf
0000000000000000 (__TEXT,__text) external _test
000000000000000a (__TEXT,__text) external _main
Copy the code

_printf is a undefined external

Undefined means the _printf symbol is temporarily not found in the current file

External indicates that the symbol is externally accessible

Generate an executable (link)

The linker compiles the resulting.o and (.dylib.a) files to create a Mach-o file.

clang main.o -o main

See the symbol after the link

$xcrun nm -nm main
        (undefined) external _printf (from libSystem)
        (undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
000000100000f6d (__TEXT,__text) external _test 
000000100000f77 (__TEXT,__text) external _main
Copy the code