What is a compiler

The interpreter executes the file

Open the terminal, execute the vi hellodemo. py command, and create a Python file:

Execute python hellodemo. py:

  • withpythonThe interpreter, righthelloDemo.pyThe file is interpreted once, and the result is executed. This is the general effect of interpreted languages.

Compiler executes file

Create a c file using vi hellodemo. c:

To use c language compiler clang, perform clang helloDemo. C command, the results did not immediately run out, but got the a.out this thing:

Execute./a.out, and the result is executed:

Interpreter, compiler location

Interpreted languages read to execute, compiled languages are translated into binary files that the CPU can read and understand before execution.

Clang is our compiler for C, OC, and C++, which can be found in /usr/bin:

Python is the interpreter, which can also be found in /usr/bin:

Overview of LLVM

  • LLVM is a framework system of architecture compilers, written in C++, used to optimize compile-time, link-time, run-time, and idle-time of programs written in any programming language. Keep it open to developers and compatible with existing scripts.
  • The LLVM program was initiated in 2000 by Dr. Chris Lattner of UIUC University. Chris Lattner joined Apple Inc. in 2006. And committed to the application of LLVM in Apple development system.
  • Apple is also a major funder of the LLVM program.
  • Currently LLVM has been adopted by Apple iOS development tools, Xilinx Vivado, Facebook, Google and other major companies.

Traditional compiler design

Compiler Frontend

The task of the compiler front end is to parse the source code. It will perform: lexical analysis, Syntax analysis, semantic analysis, check the source code for errors, then build an Abstract Syntax Tree (AST), LLVM’s front end will also generate intermediate representation (IR) code.

The Optimizer (Optimizer)

The optimizer is responsible for various optimizations. Improve code runtime, such as eliminating redundant calculations.

Backend/CodeC Generator

Map code to the target instruction set. Generate machine language and perform machine-specific code optimizations.

The iOS compiler architecture

The ObjectiveC/C/C++ compiler uses Clang on the front end, Swift on the back end and LLVM on the back end.

The design of the LLVM

  • The most important part of LLVM comes when the compiler decides to support multiple source languages or multiple hardware architectures.
  • Other compilers, such as GCC, have a very successful approach, but their usefulness is limited because they were designed as a whole application.
  • The most important aspect of LLVM design is the use of common code representation (IR), which is the form used to represent code in the compiler. So LLVM can write a separate front end for any programming language and a separate back end for any hardware architecture.

Clang

Clang is a subproject of the LLVM project. It is a lightweight compiler based on the LLVM architecture that was created as an alternative to GCC to provide faster compilation times. It is responsible for compiling C, C++, objecte-C compiler, it belongs to the whole LLVM architecture, compiler front-end. For developers, there are many benefits to studying Clang.

Iii. Compilation process

Print the source code through the command compilation phase

Create the main. M:

#import <stdio.h>
#define C 30
typedef int SSL_32;

int main(int argc, const char * argv[]) {
    
    int a = 10;
    SSL_32 b = 20;
    printf("%d",a + b + C);
    return 0;
}
Copy the code

Clang – CCC -print- Phases main.m

  1. Input file: Find the source file.
  2. Preprocessing stage: This process handles the import of the replacement header file including the macro.
  3. Compilation stage: conduct lexical analysis, grammar analysis, check whether the grammar is correct, and finally generate IR.
  4. Back end: hereLLVMIt’s gonna go through one by onePassTo optimize, eachPassDo something that eventually generates assembly code.
  5. Generate the object file.
  6. Link: Link required dynamic and static libraries to generate executable files.
  7. Generate corresponding executable files through different architectures.

Pretreatment stage

Run clang -e main.m >> main1.m to see the import of the header file and the replacement of the macro:

1 "main.m" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 379 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "main.m" 2 # 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 1 3 4 # 64 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h" 3 4 . Typedef int SSL_32; int main(int argc, const char * argv[]) { int a = 10; SSL_32 b = 20; // typedefs are not preprocessing directives printf("%d",a + b + 30); // return 0; }Copy the code

Compilation phase

Lexical analysis

After the preprocessing is done, we’re going to do a lexical analysis where we’re going to break up the code into tokens like parentheses, equals signs, and strings.

Clang-fmodules-fsyntac-only – xclang-dump-tokens main.m

  • The code is broken up into words and marked with lines and characters, for example:
    • typedef:Loc=<main.m:5:1>In the first5The first1To begin with.
    • mainOn the side of the(:Loc=<main.m:7:9>In the first7The first9To begin with.

Syntax analysis

Lexical analysis is followed by grammatical analysis, whose task is to verify whether the syntax is correct. On the basis of lexical analysis, word sequences are combined into various grammatical phrases, such as “program”, “statement”, “expression” and so on, and then all nodes are formed into Abstract Syntax Tree (AST). The parser determines whether the source program is structurally correct.

Clang-fmodules-fsyntax-only – xclang-ast -dump main.m: clang-fmodules-fsyntax-only – xclang-ast -dump main.m

  • FunctionDecl: a declaration of a function method from line 7, position 1, to line 13, character 1, with the method namemainAt line 7, character 5, the return value isintThe first parameter isintType, and the second argument isconst char **Type.
    • ParmVarDecl: The first parameter,argc.intType.
    • ParmVarDecl: The second parameter,argv.const char **Type.
    • CompoundStmt: compound statement, from the 41st character of the current line to the first character of the 13th line, is the contents of the curly braces
      • DeclStmt: local variable10.
      • DeclStmt: local variable20.
      • CallExpr: Function call
        • ImplicitCastExpr: function pointer,printfThe function,const char *, ...Parameter list.
        • ImplicitCastExpr: the first argument to the function,%d.
        • BinaryOperatorThe second argument to the function,intthe+operation
          • BinaryOperatorThe first time:intthe+Operation.

If the import header file is not found, you can specify the SDK: clang -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator12.2. The SDK -fmodules-fsyntax-only -Xclang -ast-dump main.m

Return 0 in main.m (); return 0 in main.m ();

Generate intermediate code IR(Intermediate Representation)

After the above steps are completed, the intermediate Code IR is generated. The Code Generation will iterate the syntax tree from the top down and gradually translate it into LLVM IR. By command can generate.ll text file, to view IR code.

Objective C code Bridges runtime in this step: property synthesis, ARC handling, etc.

Basic syntax of IR:

@global id % local ID ALloca open space align memory align I32 32 bits, 4 bytes Store write memory Load read data call call function RET returnCopy the code

First modify the main.m code:

#import <stdio.h> int test(int a,int b) { return a + b + 3; } int main(int argc, const char * argv[]) {int a = test(1,2); printf("%d",a); return 0 }Copy the code

Clang-s -fobjc-arc-emit -llvm main.m to obtain the main.ll file:

. define i32 @test(i32 %0, i32 %1) #0 { // test(int a, int b) %3 = alloca i32, Align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align // align / align 4 // a3 = a store i32 %1, i32* %4, align 4 // a4 = b %5 = load i32, i32* %3, align 4 // int a5 = a3 %6 = load i32, i32* %4, align 4 // int a6 = a4 %7 = add nsw i32 %5, %6 // int a7 = a5 + a6 %8 = add nsw i32 %7, 3 // int a8 = a7 + 3 ret i32 %8 // return a8 } ...Copy the code
  • You can see how many steps there are in a simple addition operation, so let’s see how the compiler optimizes it.

The optimization of the IR

The optimization levels for LLVM are -00-01-02-03-0S (the first is a capital O).

Clang-os-s -fobjc-arc-emit -llvm main.m -o main.ll

. define i32 @test(i32 %0, i32 %1) local_unnamed_addr #0 { %3 = add i32 %0, 3 // 1 + 3 %4 = add i32 %3, %1 // 4 + 2 ret i32 %4 } ...Copy the code
  • You can see how little code has been optimized.

Build Settings -> Code Generation -> Optimization Level:

  • debugMode does not optimize by default,ReleaseDefault optimization intensity in mode-Os.

bitCode

After xcode7 is enabled, apple will make further optimizations. Middle code to generate.bc. We generate the BC code from the optimized IR code.

clang -emit-llvm -c main.ll -o main.bc
Copy the code

Generating assembly code

We generate assembly code from the final.bc or.ll code.

clang -S -fobjc-arc main.bc -o main.s
clang -S -fobjc-arc main.ll -o main.s
Copy the code

The generated assembly code can also be optimized

clang -Os -S -fobjc-arc main.bc -o main.s
Copy the code

Generate object file (assembler)

The generation of the object file is that the assembler takes the assembly code as input, converts the assembly code into machine code, and finally outputs the object file. This step is at the back end, and clang simply provides an interface instruction.

clang -fmodules -c main.s -o main.o
Copy the code

Use the nm command to check the symbols in main.o, xcrun nm-nm main.o:

                 (undefined) external _printf
0000000000000000 (__TEXT,__text) external _test
000000000000000a (__TEXT,__text) external _main
Copy the code
  • _printfIs aundefined external.
  • undefinedSymbol temporarily not found in current file_printf.
  • externalIndicates that the symbol is externally accessible.

Generate an executable (link)

The linker compiles the.o and (.dylib.a) files to generate a Mach-o file.

Clang main. O-o main:

To see the symbol after the link, xcrun nm-nm main:

(undefined) external _printf (from libSystem) (undefined) external dyld_stub_binder (from libSystem) 0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header 0000000100003f51 (__TEXT,__text) external _test // 0000000100003F5e (__TEXT, __TEXT) external _main // 0000000100008008 (__DATA, __DATA) non-external __dyld_privateCopy the code
  • _printfIs an external function.
  • dyld_stub_binderIs also an external function, responsible for symbol binding.