This is the 29th day of my participation in the August Wenwen Challenge.More challenges in August
My column
- Explore the underlying principles of iOS
- Summary of iOS underlying principles of exploration
preface
Today, we will have a simple understanding and use of apple’s LLVM architecture compiler framework system. And after understanding the LLVM compilation process, let’s write a Clang plug-in to play around. Ok, without further ado, let’s get started.
LLVM
What is the LLVM
LLVM is a framework for compilen, written in C++, Used to optimize compile-time, link-time, run-time and idle-time of programs written in any programming language, open to developers, compatible with existing scripts. The LLVM program was initiated in 2000 by Dr. Chris Lattner of UIUC University. Chris Lattner joined Apple Inc. in 2006. Well is committed to the application of LLVM in Apple development system. Apple is also a major funder of the LLVM program.
Currently LLVM has been adopted by Apple iOS development tools, Xilinx Vivado, Facebook, Google and other major companies.
- Let’s look at the differences between compiled and interpreted languages:
- Interpreted languages: read code and execute it directly;
- Compiled language: first compiled into a BINARY file that can be executed by the CPU, the parameter can be executed.
Traditional compiler design
graph LR
12((sourceCode)) --> 1[Frontend] --> 2[Optimizer] --> 3[Backend] --> 13((MachineCode))
Compiler Frontend
The task of the compiler front end is to parse the source code. It will perform: lexical analysis, Syntax analysis, semantic analysis, check the source code for errors, then build the Abstract Syntax Tree (AST),LLVM’s front end will generate intermediate representation (1R) code.
Optimizer
The optimizer is responsible for various optimizations. Improve code runtime, such as eliminating redundant calculations.
Backend/CodeGenerator
Map code to the target instruction set. Generate machine language and perform machine-specific code optimizations.
The iOS compiler architecture
Objective C/C/C++ uses a compiler with Clang front end, Swift Swift and LLVM back end.
graph LR
1(Clang) --> 2[LLVM Optimizer] --> 3[LLVM CODE Generator]
1-1(Swift) --> 2
The design of the LLVM
The most important part of LLVM comes when the compiler decides to support multiple source languages or multiple hardware architectures. Other compilers, such as GCC, have a very successful approach, but because it was designed as a whole application, their usefulness is very limited. The most important aspect of LLVM design is the use of common code representation (IR), which is the form used to represent code in the compiler. So LLVM can write a separate front end for any programming language and a separate back end for any hardware architecture.
graph LR
1(( C )) --> 2(Clang C/C++/ObjC Fraontend) --> 3[LLVM Optimizer] --> 4[LLVM X86 Backend] --> 5((X86))
1-1((Fortran)) --> 2-1(llvm-gcc Frontend) --> 3
1-2((Haskell)) --> 2-2(GHC Frontend) --> 3
3 --> 4-1[LLVM PowerPC Backend] --> 5-1((PowerPC))
3 --> 4-2[LLVM ARM Backend] --> 5-2((ARM))
Clang
Clang is a subproject of the LLVM project. It is a lightweight compiler based on the LLVM architecture that was created as an alternative to GCC to provide faster compilation times. It is responsible for compiling C, C++, objecte-C compiler, it belongs to the whole LLVM architecture, compiler front-end. For developers, there are many benefits to exploring Clang.
The compilation process
The compile phase of the source code can be printed by command
clang -ccc-print-phases main.m
Pretreatment stage
Run the following command
clang -E main.m
Compilation phase
Lexical analysis
After the preprocessing is complete, a lexical analysis is performed, where the code is sliced into tokens such as brackets, equals, and strings.
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Syntax analysis
Lexical analysis is followed by grammatical analysis, whose task is to verify whether the syntax is correct. On the basis of lexical analysis, word sequences are combined into various grammatical phrases, such as “program”, “statement”, “expression” and so on, and then all nodes are formed into Abstract Syntax Tree (AST). The parser determines whether the source program is structurally correct.
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Generate intermediate code IR (Intermediate Representation)
clang -S -fobjc-arc -emit-llvm main.m
The optimization of the IR
The optimization levels of LLVM are -O0-O1-O2-O3-OS (the first one is uppercase English letter O).
clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Generating assembly code
We generate assembly code from the final.bc or.ll code
clang -S -fobjc-arc main.bc -o main.s
clang -S -fobjc-arc main.ll -o main.s
Generating assembly code can also be optimized
clang -Os -S -fobjc-arc main.m -o main.s
Generate object file (assembler)
The generation of the object file is that the assembler takes the assembly code as input, converts the assembly code into machine code, and finally outputs the object file.
clang -fmodules -c main.s -o main.o
Using the nm command, check the symbols in main.o:
$xcrun nm -nm main.o
(undefined) external _printf
0000000000000000 (__TEXT,__text) external _test
000000000000000a (__TEXT,__text) external _main
Copy the code
_printf is a undefined external
Undefined means the _printf symbol is temporarily not found in the current file
External indicates that the symbol is externally accessible
Generate an executable (link)
The linker compiles the resulting.o and (.dylib.a) files to create a Mach-o file.
clang main.o -o main
See the symbol after the link
$xcrun nm -nm main
(undefined) external _printf (from libSystem)
(undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
000000100000f6d (__TEXT,__text) external _test
000000100000f77 (__TEXT,__text) external _main
Copy the code