The concept of LLVM
Our development tools are more or less related to LLVM, so what is LLVM? What does it do? First we need to understand two concepts: interpreted language and compiled language. Interpreted language: it executes as soon as it reads the current code, such as Python. Compiled language: he needs to translate it into a binary that the CPU can read before he can execute it. LLVM became architecture compiler, it is written in c + +, main effect is to optimize any program written compilation time , link, time, running time and leisure time, to developers and compatible with existing scripts.
Compiler design
The image above shows a traditional compiler design, which is a pattern of front and back end separation.
Compiler Frontend
Parse the source code. It does lexical analysis, syntax analysis, semantic analysis, source code error checking, and builds abstract syntax trees (AST),LLVM
The front end also generates intermediate code (IR).
Optimizer
: Be responsible for various optimizations to improve the runtime of the code, such as optimizing redundant calculations in the code.
Backend/CodeGenerator
: Converts the optimized code into a binary and maps it to the target instruction set.
The iOS compiler architecture
Oc, C, C ++ use the compiler front end is Clang, Swift use the compiler front end is Swift, the back end is LLVM.
The advantages of LLVM
LLVM is designed to use the generic code representation IR, which is used to represent code in the compiler. LLVM can write front-end independently for any language and back-end for any hardware architecture.
Clang
Clang is a subproject of the LLVM project. It is a lightweight compiler that compiles C, C ++, oc languages and is the front end of the compiler in the LLVM architecture.
The input terminalopen /usr/bin
You can see the compiler Clang
Compilation process analysis
- Start by creating a. M file and CD it to the current file path
- The input terminal
clang -ccc-print-phases main.m
, the terminal will print the following
0
Enter the file and find the source file.
1
: preprocessing phase, dealing with macro replacement, header file introduction.
2
: Compilation stage, lexical analysis, grammar analysis, and finally generation of IR.
3
: Back end: At this stage, LLVM is optimized one by one through the pass and finally generates assembly code.
4
: Generates the target file.
5
: link, link required dynamic and static libraries, generate executable files.
6
: Generate executable files based on different schemas.
pretreatment
The terminal executionclang -E main.m
After execution, we can see that the header file has been imported and the macro has been replaced.
compile
What do you meanLexical analysis
? The pre-processing phase splits the code into tokens one by one, such as parentheses, equals signs, strings, etc. This process is calledLexical analysis
.What do you meanSyntax analysis
?Syntax analysis
After lexical analysis, it is mainly to verify that the syntax is correct. On the basis of morphology, word sequence numbers are combined into grammatical phrases, such as programs, expressions, etc., and then all nodes are formed into a grammar tree (AST). It mainly analyzes whether the program is structurally correct.
The input terminalclang -fmodules -fsyntax-only -Xclang -ast-dump main.m
The following figure shows the output
Generating IR (Intermediate Representation)
The code generator will translate the syntax tree from top to bottom into IR code.
The input terminal clang -S -fobjc-arc -emit-llvm main.m
You can look at the IR code and see that a.ll file is generated in the directoryThe OC code performs runtime bridging, property synthesis, and ARC processing in this step
- Basic syntax of IR
@ : global id % : Local ID ALloca: open memory space align: memory alignment i32:32bit, 4 bytes Store: write memory Load: Read data Call: invoke function ret: return
The optimization of the IR
The optimization levels of LLVM are -O0, -O1, -O2, -O3, and -OS terminal instructions clang-os-s-fobjc-ARC-EMIT – LLVM main.m -o main.ll
bitCode
When bitcode is turned on xcode will further optimize the code and generate.bc intermediate code. Terminal instructionsclang -emit-llvm -c main.ll -o main.bc
Can optimize IR code generation. BC code
Generating assembly code
Terminal instructionsclang -S -fobjc-arc main.bc -o main.s
orclang -S -fobjc-arc main.ll -o main.s
You can generate assembly code from.bc or.ll code. Of course, the generated assembly code can also be through terminal instructionsclang -Os -S -fobjc-arc main.m -o main.s
Further optimization
Generate object file
Object file generation, is by the assembler to assembly code as input, assembly code into machine code, and finally output into the object file.
Terminal instructionsclang -fmodules -c main.s -o main.o
The assembly file can be output as an object file. Symbols in the main.o file can be viewed using the nm command terminal instructions xcrun nm -nm main.o
undefined
: _printf said in the current document to find symbol.
external
: Indicates that the symbol can be accessed externally.
Generate an executable file
The linker will eventually compile the generated.o and.dylib.a files, generating a Mach-o file. Terminal instructions clang main.o -o main
In the same way, you can use nm to view the symbol of the executable after the link.
Terminal instructions xcrun nm -nm main
conclusion
- Compilation process:
Input code -> Expand preprocessing -> Lexical analysis (token) -> Syntax analysis -> Generate IR ->IR optimization -> generate assembly code -> generate object file -> link dynamic and static libraries to generate executable files.
typedef
: No preprocessing, not preprocessing instructions.- The optimization level is not the higher the better, too high will be useful code optimization away.
.o
The file cannot be executed and needs to be linked to an external library. The link is just marking.LLVM
Advantages: Front and rear end separation, scalability is very strong.LLVM
This can affect compilation speed, and optimizing the executable can improve compilation speed.- You can optimize on different nodes.