What is a compiler
The interpreter executes the file
Open the terminal, execute the vi hellodemo. py command, and create a Python file:
Execute python hellodemo. py:
- with
python
The interpreter, righthelloDemo.py
The file is interpreted once, and the result is executed. This is the general effect of interpreted languages.
Compiler executes file
Create a c file using vi hellodemo. c:
To use c language compiler clang, perform clang helloDemo. C command, the results did not immediately run out, but got the a.out this thing:
Execute./a.out, and the result is executed:
Interpreter, compiler location
Interpreted languages read to execute, compiled languages are translated into binary files that the CPU can read and understand before execution.
Clang is our compiler for C, OC, and C++, which can be found in /usr/bin:
Python is the interpreter, which can also be found in /usr/bin:
Overview of LLVM
- LLVM is a framework system of architecture compilers, written in C++, used to optimize compile-time, link-time, run-time, and idle-time of programs written in any programming language. Keep it open to developers and compatible with existing scripts.
- The LLVM program was initiated in 2000 by Dr. Chris Lattner of UIUC University. Chris Lattner joined Apple Inc. in 2006. And committed to the application of LLVM in Apple development system.
- Apple is also a major funder of the LLVM program.
- Currently LLVM has been adopted by Apple iOS development tools, Xilinx Vivado, Facebook, Google and other major companies.
Traditional compiler design
Compiler Frontend
The task of the compiler front end is to parse the source code. It will perform: lexical analysis, Syntax analysis, semantic analysis, check the source code for errors, then build an Abstract Syntax Tree (AST), LLVM’s front end will also generate intermediate representation (IR) code.
The Optimizer (Optimizer)
The optimizer is responsible for various optimizations. Improve code runtime, such as eliminating redundant calculations.
Backend/CodeC Generator
Map code to the target instruction set. Generate machine language and perform machine-specific code optimizations.
The iOS compiler architecture
The ObjectiveC/C/C++ compiler uses Clang on the front end, Swift on the back end and LLVM on the back end.
The design of the LLVM
- The most important part of LLVM comes when the compiler decides to support multiple source languages or multiple hardware architectures.
- Other compilers, such as GCC, have a very successful approach, but their usefulness is limited because they were designed as a whole application.
- The most important aspect of LLVM design is the use of common code representation (IR), which is the form used to represent code in the compiler. So LLVM can write a separate front end for any programming language and a separate back end for any hardware architecture.
Clang
Clang is a subproject of the LLVM project. It is a lightweight compiler based on the LLVM architecture that was created as an alternative to GCC to provide faster compilation times. It is responsible for compiling C, C++, objecte-C compiler, it belongs to the whole LLVM architecture, compiler front-end. For developers, there are many benefits to studying Clang.
Iii. Compilation process
Print the source code through the command compilation phase
Create the main. M:
#import <stdio.h>
#define C 30
typedef int SSL_32;
int main(int argc, const char * argv[]) {
int a = 10;
SSL_32 b = 20;
printf("%d",a + b + C);
return 0;
}
Copy the code
Clang – CCC -print- Phases main.m
- Input file: Find the source file.
- Preprocessing stage: This process handles the import of the replacement header file including the macro.
- Compilation stage: conduct lexical analysis, grammar analysis, check whether the grammar is correct, and finally generate IR.
- Back end: here
LLVM
It’s gonna go through one by onePass
To optimize, eachPass
Do something that eventually generates assembly code. - Generate the object file.
- Link: Link required dynamic and static libraries to generate executable files.
- Generate corresponding executable files through different architectures.
Pretreatment stage
Run clang -e main.m >> main1.m to see the import of the header file and the replacement of the macro:
1 "main.m" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 379 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "main.m" 2 # 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 1 3 4 # 64 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 3 4 . Typedef int SSL_32; int main(int argc, const char * argv[]) { int a = 10; SSL_32 b = 20; // typedefs are not preprocessing directives printf("%d",a + b + 30); // return 0; }Copy the code
Compilation phase
Lexical analysis
After the preprocessing is done, we’re going to do a lexical analysis where we’re going to break up the code into tokens like parentheses, equals signs, and strings.
Clang-fmodules-fsyntac-only – xclang-dump-tokens main.m
- The code is broken up into words and marked with lines and characters, for example:
typedef
:Loc=<main.m:5:1>
In the first5
The first1
To begin with.main
On the side of the(
:Loc=<main.m:7:9>
In the first7
The first9
To begin with.
Syntax analysis
Lexical analysis is followed by grammatical analysis, whose task is to verify whether the syntax is correct. On the basis of lexical analysis, word sequences are combined into various grammatical phrases, such as “program”, “statement”, “expression” and so on, and then all nodes are formed into Abstract Syntax Tree (AST). The parser determines whether the source program is structurally correct.
Clang-fmodules-fsyntax-only – xclang-ast -dump main.m: clang-fmodules-fsyntax-only – xclang-ast -dump main.m
FunctionDecl
: a declaration of a function method from line 7, position 1, to line 13, character 1, with the method namemain
At line 7, character 5, the return value isint
The first parameter isint
Type, and the second argument isconst char **
Type.ParmVarDecl
: The first parameter,argc
.int
Type.ParmVarDecl
: The second parameter,argv
.const char **
Type.CompoundStmt
: compound statement, from the 41st character of the current line to the first character of the 13th line, is the contents of the curly bracesDeclStmt
: local variable10
.DeclStmt
: local variable20
.CallExpr
: Function callImplicitCastExpr
: function pointer,printf
The function,const char *, ...
Parameter list.ImplicitCastExpr
: the first argument to the function,%d
.BinaryOperator
The second argument to the function,int
the+
operationBinaryOperator
The first time:int
the+
Operation.
If the import header file is not found, you can specify the SDK: clang -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator12.2. The SDK -fmodules-fsyntax-only -Xclang -ast-dump main.m
Return 0 in main.m (); return 0 in main.m ();
Generate intermediate code IR(Intermediate Representation)
After the above steps are completed, the intermediate Code IR is generated. The Code Generation will iterate the syntax tree from the top down and gradually translate it into LLVM IR. By command can generate.ll text file, to view IR code.
Objective C code Bridges runtime in this step: property synthesis, ARC handling, etc.
Basic syntax of IR:
@global id % local ID ALloca open space align memory align I32 32 bits, 4 bytes Store write memory Load read data call call function RET returnCopy the code
First modify the main.m code:
#import <stdio.h> int test(int a,int b) { return a + b + 3; } int main(int argc, const char * argv[]) {int a = test(1,2); printf("%d",a); return 0 }Copy the code
Clang-s -fobjc-arc-emit -llvm main.m to obtain the main.ll file:
. define i32 @test(i32 %0, i32 %1) #0 { // test(int a, int b) %3 = alloca i32, Align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align / align 4 // a3 = a store i32 %1, i32* %4, align 4 // a4 = b %5 = load i32, i32* %3, align 4 // int a5 = a3 %6 = load i32, i32* %4, align 4 // int a6 = a4 %7 = add nsw i32 %5, %6 // int a7 = a5 + a6 %8 = add nsw i32 %7, 3 // int a8 = a7 + 3 ret i32 %8 // return a8 } ...Copy the code
- You can see how many steps there are in a simple addition operation, so let’s see how the compiler optimizes it.
The optimization of the IR
The optimization levels for LLVM are -00-01-02-03-0S (the first is a capital O).
Clang-os-s -fobjc-arc-emit -llvm main.m -o main.ll
. define i32 @test(i32 %0, i32 %1) local_unnamed_addr #0 { %3 = add i32 %0, 3 // 1 + 3 %4 = add i32 %3, %1 // 4 + 2 ret i32 %4 } ...Copy the code
- You can see how little code has been optimized.
Build Settings -> Code Generation -> Optimization Level:
debug
Mode does not optimize by default,Release
Default optimization intensity in mode-Os
.
bitCode
After xcode7 is enabled, apple will make further optimizations. Middle code to generate.bc. We generate the BC code from the optimized IR code.
clang -emit-llvm -c main.ll -o main.bc
Copy the code
Generating assembly code
We generate assembly code from the final.bc or.ll code.
clang -S -fobjc-arc main.bc -o main.s
clang -S -fobjc-arc main.ll -o main.s
Copy the code
The generated assembly code can also be optimized
clang -Os -S -fobjc-arc main.bc -o main.s
Copy the code
Generate object file (assembler)
The generation of the object file is that the assembler takes the assembly code as input, converts the assembly code into machine code, and finally outputs the object file. This step is at the back end, and clang simply provides an interface instruction.
clang -fmodules -c main.s -o main.o
Copy the code
Use the nm command to check the symbols in main.o, xcrun nm-nm main.o:
(undefined) external _printf
0000000000000000 (__TEXT,__text) external _test
000000000000000a (__TEXT,__text) external _main
Copy the code
_printf
Is aundefined external
.undefined
Symbol temporarily not found in current file_printf
.external
Indicates that the symbol is externally accessible.
Generate an executable (link)
The linker compiles the.o and (.dylib.a) files to generate a Mach-o file.
Clang main. O-o main:
To see the symbol after the link, xcrun nm-nm main:
(undefined) external _printf (from libSystem) (undefined) external dyld_stub_binder (from libSystem) 0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header 0000000100003f51 (__TEXT,__text) external _test // 0000000100003F5e (__TEXT, __TEXT) external _main // 0000000100008008 (__DATA, __DATA) non-external __dyld_privateCopy the code
_printf
Is an external function.dyld_stub_binder
Is also an external function, responsible for symbol binding.