This is the 13th day of my participation in the August More Text Challenge
1. The LLVM overview
1.1 Interpreted languages and compiled languages
In our daily development, such as iOS development, Xcode is compiled successfully to run, the process is compiled by the LLVM compiler work. The computer cannot directly understand the high-level language, only directly understand the machine language, so it must be translated into the high-level language, the computer can execute the high-level language written program. There are two ways of translation, one is compilation, one is interpretation. The two methods are just different in translation time. Interpreted languages read the code and execute it, whereas compiled languages first translate it into binary code that the CPU can read. OC or Swift, which we develop daily, is a compiled language. Python is a compiled language that can be run directly through the Python interpreter, while OC needs to be compiled into an executable binary by the compiler for execution.
1.2 introduce LLVM
LLVM is a framework system of architecture compilers, written in C++, used to optimize compile-time, link-time, run-time, and idle-time of programs written in any programming language. Keep developers open and compatible with existing scripts. The LLVM program was initiated in 2000 by Dr. Chris Lattner of UIUC University in the United States. Chris Lattner joined Apple Inc. in 2006. And committed to LLVM application in Apple development system. Apple is also a major funder of the LLVM program. Traditional compiler design:
Compiler Frontend
Compiler front endThe task isParsing source code. It will do:Lexical analysis
,Syntax analysis
,Check the source code
If there is an error, then buildAbstract syntax tree
(Abstract Syntax Tree AST), the front end of LLVM is also generatedThe middle code
(intermediate representation,IR)
The Optimizer is responsible for various optimizations. Improve the running time of code, eliminate redundant calculation, etc.
Backend/CodeGenerator stores code into the target instruction set. Generate machine language and perform machine-specific code optimizations.
Objcective’S C/C/C++ compiler uses Clang on the front, Swift on the back and LLVM on the back.
The design of the LLVM
The most important aspect of LLVM comes when the compiler decides to support multiple source languages or multiple hard architectures.
Other compilers, such as GCC, have a very successful method, but because it is used as aOverall application design
Therefore, its use is greatly limited.
The most important convenience of the LLVM design is the use of genericCode representation
(IR), which is used to represent code in the compiler. So LLVM can be used for any compiled languageWrite front-end independently
And can be arbitraryHardware architecture
Write the back end independently.
- The idea is that the middle tier pattern, we add a middle tier, we add a new language, we just add the corresponding pattern.
- Clang
Clang is a sub-project of the LLVM project. It is a lightweight compiler based on the LLVM architecture diagram. It was originally created to replace GCC and provide faster compilation speed. There are many benefits to studying Clang
2. LLVM compilation process
Create a.m file
int test(int a,int b){
return a + b + 3;
}
int main(int argc, const char * argv[]) {
int a = test(1.2);
printf("%d",a);
return 0;
}
Copy the code
The compilation process of the source code can be printed by command
clang -ccc-print-phases main.m
Copy the code
+- 0: input, "main.m", objective-c
+- 1: preprocessor, {0}, objective-c-cpp-output
+- 2: compiler, {1}, ir
+- 3: backend, {2}, assembler
+- 4: assembler, {3}, object
+- 5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code
0: input file, that is, find source file. 1: pre-processing stage, that is, processing includes macro processing, header file import. 2: Compilation stage, that is: conduct lexical analysis, grammar analysis, check whether the grammar is correct, and finally generate IR. 3: back end, i.e., where LLVM is optimized one node at a time by passing, each pass does something that eventually generates assembly code. 4: Assembly code generation target file. 5: Link: as mentioned earlier in DYLD, link dynamic libraries and static libraries to generate executable files. 6: Binding, that is, generating corresponding executable files based on different architectures.
2.1 Pretreatment Stage
We add macros to the main.m file
#define C 30
typedef int KB_INT_64;
int test(int a,int b){
return a + b + C;
}
Copy the code
Run clang -e main.m on the terminal to see if the macro has been replaced.
Run clang -e main.m >> main1.m to view the source code generated after the corresponding file is replaced
The header file is imported first, followed by the macro replacement.
- It’s worth noting
typedef
No replacement
runclang -E main.m >> main1.m
2.2 Compilation Phase
Conduct lexical analysis and grammar analysis to check whether the syntax is correct and generate intermediate code IR.
2.1 Lexical analysis
Clang-fmodules-fsyntax-only – xclang-dump-tokens main.m
2.2 Grammar Analysis
After the completion of lexical analysis, grammar analysis is the task of verifying whether grammar is correct, combining word sequences into various grammatical phrases on the basis of lexical analysis, such as “program”, “statement”, “expression” and so on, and then forming all nodes into Abstract Syntax Tree (AST). The parser determines whether the source program is structurally correct. We execute clang-fmodules-fsyntax-only – xclang-ast -dump main.m
- If the import header file is not found, specify the SDK
Clang-isysroot (own SDK path) -fmodules-fsyntax-only - xclang-ast -dump main.m clang-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14. 5.sdk/ -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code
Let’s make this code a little bit less}
Compile again to check for syntax errors
- Under the analysis of
int test(int a,int b){}
Lexical analysis of- FunctionDecl: indicates that the function represents: usedtest ‘int (int, int)’
- ParmVarDeclSaid:
parameter
: 2a ‘int’ - ParmVarDeclSaid:
parameter
: 2b‘int’, both arguments are of the same level - CompoundStmt: indicates the scope of the parentheses
{} area
- DeclStmt: describes the following definition parameters
- VarDecl: Parameter definitions: used D ‘KB_INT_64′:’int’ cinit
- IntegerLiteral: A constant of type integer
- BinaryOperator: operator. This is +
- ParmVarDeclSaid:
- FunctionDecl: indicates that the function represents: usedtest ‘int (int, int)’
Return 20+ (30+(a+b)) is not implemented in addition order as we think. Multiplication and division take precedence.
2.3 Generate intermediate code IR
After completing the above steps, the intermediate Code IR is generated. The Code Generation iterates the syntax tree from top to bottom and translates it into LLVM IR.
- You can run the following command to generate it
Ll text file
, view IR code.
clang -S -fobjc-arc -emit-llvm main.m
Copy the code
The OC code does runtime bridging, : property synthesis, ARC processing, etc
// The following is the basic syntax of IR@ global id % local id alloca open space align memory align i32 32bit,4A byte store writes to memory load reads data call calls function RET returnsCopy the code
Just a quick explanation
So this is IR code for the test function, and it’s not optimized.
Of course, IR files can be optimized in OC, the general setting is intarget - Build Setting - Optimization Level
(Optimizer level). The LLVM optimization levels are respectively-O0 -O1 -O2 -O3 -Os
(the first is a capital O), here is the optimized command to generate the intermediate code IR
clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Copy the code
The amount of code is greatly reduced
- After Xcode7 opens bitcode, Apple will further optimize and generate the intermediate code of.bc. We will generate the.BC code through the optimized IR code
clang -emit-llvm -c main.ll -o main.bc// generate.bc from.ll
Copy the code
2.4 Generate assembly code
- We go through the final
.bc or.ll code
generateAssembly code
clang -S -fobjc-arc main.bc -o main.s
clang -S -fobjc-arc main.ll -o main.s
Copy the code
- Generating assembly code can also be optimized
clang -Os -S -fobjc-arc main.m -o main.s
Copy the code
Less code
2.5 Generating an object File
Object file generation, assembly code as input, generate machine language, output object file
clang -fmodules -c main.s -o main.o
Copy the code
You can run the nm command to view the symbols in main.o
$xcrun nm -nm main.o
Copy the code
_printf
The function is one is oneUndefined, external
的undefined
Indicates that the current file is temporaryThe symbol _printf could not be found
external
That means that the symbol isExternally accessible
the
2.6 link
In DYLD before, I talked about the specific process of linking, which mainly links the required dynamic library and static library, and then generates executable files.
- Static libraries and executables are merged.
- Dynamic libraries exist independently.
The linker creates a Mach-o file by linking the compiled.o file with the.dyld. A file
clang main.o -o main// Use LLVM as clang startup
Copy the code
See the symbol after the link
$xcrun nm -nm main
Copy the code
There are two external symbols_printf
anddyld_stub_binder
.
It becomes an executable file.
2.7 binding
The external function is bound to dyLD_stub_binder as soon as our program is in memory. The dyLD is mandatory, the link is marked, the symbol is in the library (compile-time), and the binding is executed at runtime with the external function address and symbol. Dyld_stub_binder is bound to dyLD_STUB_binder, the other functions are bound to dyLD_STUB_binder, and the corresponding Mach-O executables are generated from different architectures.
3. Summary
LLVM is designed to solve the increasing number of architectures and platforms. Intermediate code IR is generated through the middle layer to improve scalability and reduce coupling. The whole LLVM process is roughly as follows: