This article focuses on understanding the LLVM compilation process and the development of the Clang plug-in.

LLVM

LLVM is a framework system for architecture compilers. It is written in C++ to optimize compile-time, link-time, run-time, and idle-time for programs written in any programming language. Keep it open to developers and compatible with existing scripts.

Traditional compiler design

Source Code + Frontend + Optimizer + Back-end CodeGenerator + Machine Code, as shown in the following figure

It is mainly divided into three parts:

1. Compiler Frontend

The task of the front end of the compiler is to parse the source code, it will do lexical analysis, syntax analysis, semantic analysis, check whether there are errors in the source code, The Abstract Syntax Tree (AST) is then constructed, and the LLVM front end generates intermediate representation (IR).

2. Optimizer

The optimizer is responsible for various optimizations that improve the runtime of your code, such as eliminating redundant calculations.

3. Backend/CodeGenerator

Map code to the target instruction set. Generate the specified platform machine language and perform machine-specific code optimizations.

The ios compiler architecture

Objective C/C/C++ uses a compiler with Clang front end,Swift Swift and LLVM back end.

The design of the LLVM

The most important aspect of LLVM is that it supports multiple source languages or multiple hardware architectures. Through the common code representation IR, which is similar to the bridge mode, the front and back ends are separated.

Clang

A sub-project of LLVM project, responsible for C,C++, object-C compiler, in the whole LLVM architecture, belongs to the compiler front end. Through the learning of Clang, it can be better applied to projects. For example, through the Clang plug-in, it can not only check code specifications, but also conduct useless code analysis, automatic pile driving, offline test analysis, method name confusion, etc.

Clang plug-in itself is not complicated to write and use, the key is how to better apply to work,

The compilation process

Let’s look at the whole process with a simple example. Create a new main.m file

#import <Foundation/Foundation.h>
#define DEFINEEight 8
int main(){
    @autoreleasepool {
        int eight = DEFINEEight;
        int six = 6;
        NSString* site = [[NSStringAlloc] initWithUTF8String: "starming"];int rank = eight + six;
        NSLog(@" % @rank %d ", site, rank); }return 0;
}
Copy the code

Enter the clang-ccC-print-Phases main.m command to view the phases.

+- 0: input, "main.m", objective-c
+- 1: preprocessor, {0}, objective-c-cpp-output
+- 2: compiler, {1}, ir
+- 3: backend, {2}, assembler
+- 4: assembler, {3}, object
+- 5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code

There are six main stages

  1. Enter the file and find the source file
  2. pretreatment:Macro replacement, header file import, conditional compilation
  3. compile:Lexical analysis, grammatical analysis, check whether the syntax is correct,Generate IR
  4. The back-endLLVM will pass one by onePass to optimize, and finally generate assembly code
  5. Assembly: Assembly code generates object files
  6. link:Link dynamic and static libraries,Generate an executable file
  7. Binding: Generate corresponding executables from different schemas
Pretreatment stage

perform

clang -E main.m
Copy the code

After execution, you can see the import of the header file and the macro replacement.

Compilation phase

The compilation stage is mainly for analysis and check of morphology, grammar, etc., and then generate intermediate code IR

1. Lexical analysis

Here the code is sliced into tokens, such as brackets, equals signs, and strings. You can run the following command to view the information.

  clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code

If the header file cannot be found, specify the SDK.

Clang-isysroot (own SDK path) -fmodules-fsynth-only -xclang-dump -tokens main.m clang-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator141..sdk/ -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code
2. Grammatical analysis

After the completion of lexical analysis, it is the task of grammar analysis, which is to verify whether the grammar is correct. Based on the lexical analysis, word sequences are combined into all kinds of this method phrases, such as programs, statements, expressions and so on, and then all nodes are formed into Abstract Syntax Tree (AST). The parser determines whether a program is structurally correct. You can run the following command to view the information.

  clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code

Generate abstract syntax tree

3. Generate intermediate code IR (Intermediate Representation)

After completing the above steps, the intermediate Code IR will be generated. The Code Generation will gradually translate the syntax tree from the top down to LLVM IR, which can be generated by the following command. Ll text file.

  clang -S -fobjc-arc -emit-llvm main.m
Copy the code

OC code in this step will do the runtime bridge, : property synthesis, ARC processing and other IR basic syntax

@ % local symbol

unnamed

  • @ global symbol
  • % local symbol
  • Alloca opens up space
  • Align Memory alignment
  • I32 32 bits 4 bytes
  • Store writes to memory
  • Load data
  • Call calling function
  • Ret return

The LLVM optimization levels of IR are -O0-O1-O2-O3-OS

clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Copy the code

IR files can be optimized in OC. The general Setting is target-build setting-optimization Level. The optimization levels of LLVM are -o0-O1-O2-O3-OS (the first one is uppercase O), and the following is the command bitCode with optimization to generate intermediate code IR

  • After Xcode7 opens bitcode, Apple will further optimize and generate the intermediate code of.bc. We will generate the.BC code through the optimized IR code
clang -emit-llvm -c main.ll -o main.bc
Copy the code

Generating assembly code We generate assembly code from our final.bc or.ll code

Clang-s-fobjc-arc main.bc -o main. S clang-s -fobjc-arc main.ll -o main. S Copies the codeCopy the code

In addition, generating assembly code can also be optimized

clang -Os -S -fobjc-arc main.m -o main.s
Copy the code
Generate object file

The generation of the object file is that the assembler takes the assembly code as the insert, converts the assembly code into machine code, and finally outputs the object file.

Clang-fmodules -c main.s -o main.o Copies the codeCopy the code

You can run the nm command to view the symbols in main.o

$xcrun nm -nm main.o
Copy the code

The following symbols in main.o are in object file format

  • undefinedIndicates that the current file is temporaryCan't find symbol
  • externalThat means that the symbol isExternally accessiblethe
link

Links are mainly needed to link dynamic libraries and static libraries, which generate executable files

  • Static libraries are merged with executable files
  • Dynamic libraries are independent
Clang main.o -o main copies the codeCopy the code

See the symbol after the link

$xcrun nm-nm main copies the codeCopy the code
The binding

The binding is mainly used to generate the corresponding mach-O format executable through different architectures

The LLVM compiler

I have sorted out the corresponding download via xcode script

#! /bin/bash
LLVMPath=`pwd`
# 1. Download LLVM project
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm-project.git
# 2. Download Clang
cd llvm/tools/
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/clang.git
# 3. Download compiler-rt,libcxx,libcxxabi
cd ../projects
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/compiler-rt.git
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/libcxx.git
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/libcxxabi.git
# 4. Install extra
cd ../tools/clang/tools
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/clang-tools-extra.git
# 5. Install cmake
if cmake  >/dev/null 2>&1
then
    echo "Cmake has been installed"
else
    echo "Cmake not installed"
    echo "Cmake performs installation"
    brew install cmake >> /dev/null
    if test $? -eq
    then
        echo "Cmake installed successfully"
    else
        echo "Failed to install cmake"
    fi
fi

# 6. Compile with Xcode
echo "Compile with Xcode"
cd $LLVMPath
mkdir llvm_build
cdllvm_build cmake -G Xcode .. /llvmCopy the code

Llvm_build, which is the project we compiled with Xcode.

Clang plug-in

Create a CLPlugin folder in this path,Create a clplugin. CPP file and add the cmakelists. TXT file to the cmakelist.txt file

add_llvm_library( CLPlugin MODULE BUILDTREE_ONLY CLPlugin.cpp) Cmake -g Xcode.. /llvm

Write plug-in code

In the clplugin. CPP file in the CLPlugin directory, add the following code

#include <iostream>
#include "clang/AST/AST.h"
#include "clang/AST/DeclObjC.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/Frontend/FrontendPluginRegistry.h"

using namespace clang;
using namespace std;
using namespace llvm;
using namespace clang::ast_matchers;

namespace CLPlugin {
    class CLMatchCallback: public  MatchFinder::MatchCallback{
    private:
        CompilerInstance &CI;
        // Check if it is your own file
        bool isUserSourceCode(const string fileName){
            if (fileName.empty()) return false;
            // Code that is not in Xcode is assumed to belong to the user
            if (fileName.find("/Applications/Xcode.app/") = =0) return false;
            return  true;
        }
        
        // Check whether copy should be used
        bool isShouldUseCopy(const string typeStr){
            if(typeStr.find("NSString") != string::npos ||
               typeStr.find("NSArray") != string::npos ||
               typeStr.find("NSDictionary") != string::npos){
                return true;
            }
            return false;
        }
        
    public:
        //3. Custom callback class inherits from MatchCallback's scanned callback function
        CLMatchCallback(CompilerInstance &CI):CI(CI){}
        void run(const MatchFinder::MatchResult &Result) {
            // Get the node object from the result
           const ObjCPropertyDecl * propertyDecl =  Result.Nodes.getNodeAs<ObjCPropertyDecl>("objcPropertyDecl");
            
            // Get the file name (including the path)
            string fileName = CI.getSourceManager().getFilename(propertyDecl->getSourceRange().getBegin()).str(a);if (propertyDecl && isUserSourceCode(fileName)) {// If the node has a value && is not a system file!
                // The type of the node is converted to a string
                string typeStr = propertyDecl->getType().getAsString(a);// Get the description of the node
                ObjCPropertyAttribute::Kind attrKind = propertyDecl->getPropertyAttributes(a);// Check whether copy should be used, but copy is not used
                if (isShouldUseCopy(typeStr) && ! (attrKind & ObjCPropertyAttribute::kind_copy)) {// Copy should be used but copy is not used
                    // Diagnostic engine
                    DiagnosticsEngine &diag = CI.getDiagnostics(a);/ / the Report Report
                    diag.Report(propertyDecl->getLocation(),diag.getCustomDiagID(DiagnosticsEngine::Error, "This place should use Copy."));
// cout<}}}};//2. Custom CLConsumer, inherited from ASTConsumer, used to listen for AST node information - filter
    class CLConsumer:public ASTConsumer{
    private:
        // Filter for the MatchFinder AST node
        MatchFinder matcher;
        CLMatchCallback callback;
    public:
        CLConsumer(CompilerInstance &CI):callback(CI){
            // Add a MatchFinder to match the ObjCPropertyDecl node
            / / callback!
            matcher.addMatcher(objcPropertyDecl().bind("objcPropertyDecl"), &callback);
        }
        
        // Once a top-level declaration is parsed, it is called back
       bool HandleTopLevelDecl(DeclGroupRef D){
// cout<<" parsing...." <
            return true;
        }
        
        // Call back when the entire file is parsed!!
        void HandleTranslationUnit(ASTContext &Ctx) {
            cout<<"File parsing complete!!"<<endl;
            matcher.matchAST(Ctx); }};//1. Define a class that inherits PluginASTAction to implement our custom actions and custom AST syntax tree behavior
    class CLASTAction:public PluginASTAction{
    public:
        bool ParseArgs(const CompilerInstance &CI, const vector<string> &arg) {
            return  true;
        } 
        std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef InFile) {
            return unique_ptr<CLConsumer> (new CLConsumer(CI)); }}; }//4. Register the plugin!
static FrontendPluginRegistry::Add<CLPlugin::CLASTAction> X("CLPlugin"."this is the description");

Copy the code

A brief summary of the writing process:

  • The first step is to write the PluginASTAction code to handle the entry parameters.
  • Second, access all AST nodes through ASTConsumer to get the desired content.
  • Third, write the MatchCallback callback function.
  • Step 4: Register the Clang plug-in for external use.

The clang file path is in lib

// Command formatThe clang file path is -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator151..sdk/ -xclang-load-xclang plug-in (.dyld) path -xclang-add-plugin -xclang plug-in name -c source code pathCopy the code

Change to your own SDK path

Xcode integration plug-in

Using the Clang plug-in, you can load the dynamic library containing the plug-in registry with the -load command-line option,

– The load command line loads all registered Clang plug-ins. Use the -plugin option to select the Clang plug-in to run. Other parameters of the Clang plug-in are passed through -plugin-arg-.

The CC1 process is similar to a preprocessing process that takes place before compilation. Cc1 and Clang driver are two separate entities. Cc1 is responsible for front-end preprocessing, while Clang driver is mainly responsible for managing compilation task scheduling. Each compilation task accepts the parameters of CC1 front-end preprocessing and then adjusts them.

There are two ways to get options like -load and -plugin into the CC1 process of Clang:

One is to use the -cc1 option directly, which has the disadvantage of specifying the full system path configuration on the command line.

Alternatively, use -xclang to add these options to the CC1 process. – The Xclang parameter only runs the preprocessor and passes the following parameters directly to the CC1 process without affecting the clang driver.

  • inBuild SettingsTwo user-defined Settings are added to the column, respectivelyCCandCXX
  • CCThe corresponding one is compiled by itselfclangAbsolute path of
  • CXXThe corresponding one is compiled by itselfclang++Absolute path of
  • The next inBuild SettingsIn the searchindexThat will beEnable Index-Wihle-Building FunctionalitytheDefaultInstead ofNO

Finally, by recompiling the project, our plug-in begins code review.