The introduction

In Xcode, that happens when we press Command + B to do build, which is the process of compiling the code. Xcode now uses the LLVM compiler. Xcode’s early use of the GCC compiler was officially transited to the LLVM compiler starting with Xcode5 for some historical reasons. LLVM will be highlighted below.

Compilation principle

Introduction of LLVM

  • The LLVM project is a collection of modular, reusable compilers and toolchain technologies.

  • The ACM presented its 2012 Software System Award to LLVM. Previous recipients of this award include Java, Apache, Mosaic, The World Wide Web, SmallTalk, UNIX, Eclipse, and more.

  • The LLVM project grew out of the work of Vikram Adve and Chris Lattner at the University of Illinois at Urbana-Champaign in 2000, who wanted to create dynamic compilation techniques for all static and dynamic languages. LLVM is open-source software developed under the BSD license. In 2005, Apple hired Chris Ratner and his team to develop an application system for Apple. LLVM is part of the current Mac OS X and iOS development tools.

  • The name LLVM was originally derived from the acronym Low Level Virtual Machine, which caused widespread confusion as the scope of the project was not limited to creating a single Virtual Machine. The name “LLVM” itself is not an ACRONYM; It is the full name of the project. The name LLVM is not an acronym; it is the full name of the project.

  • As LLVM began to grow, it became an umbrella term for many compilation tools and low-level tool technologies, making the name even less appropriate, and the developers decided to drop the acronym. LLVM is now simply a brand that applies to all projects under LLVM. LLVM IR, LLVM debugging tool, LLVM C++ standard library, etc.

  • Currently both NDK and Xcode use LLVM as the default compiler.

Traditional compiler architecture

  • Frontend: front-end, source code to do lexical analysis, grammar analysis, semantic analysis, generate intermediate code
  • Optimizer: an Optimizer for intermediate code optimization
  • Backend: a Backend used to generate machine code

LLVM architecture

  • In LLVM system, different languages have different compiler front ends, such as CLang is responsible for C/C ++/ OC compilation, Flang is responsible for FORTRAN compilation, SwifTC is responsible for swift compilation and so on.

  • The different front and back ends use the uniform Intermediate code LLVM Intermediate Representation(LLVM IR).

  • The optimization stage is a general stage, aiming at the uniform LLVM IR. No matter the new programming language or supporting new hardware devices, there is no need to modify the optimization stage. Specifically, various types of optimization are carried out on bitcode, and some logical conversion of bitcode is carried out to make the code more efficient and smaller. Such as DeadStrip/SimplifyCFG.

  • The back end, also called the CodeGenerator, is responsible for compiling the optimized bitcode into the machine code of the specified target architecture. For example, X86Backend is responsible for compiling bitcode into the machine code of the x86 instruction set.

  • GCC, by contrast, is coupled to the front and back ends. So it’s very difficult for GCC to support a new language, or to support a new platform.

  • LLVM is now used as a common infrastructure to implement a variety of static and runtime compiled languages (GCC family, Java,.NET, Python, Ruby, Scheme, Haskell, D, etc.).

  • In LLVM system, the source code of different languages will be converted into a unified bitcode format. The three modules are independent of each other and can be fully reused. For example, if you’re developing a new language, just build a front end to that language and compile the source code into bitcode, leaving optimization and the back end alone. Similarly, if a new chip architecture comes out, it’s just a matter of rewriting the back end of a target platform based on LLVM.

Clang

  • A subproject of the LLVM project.

  • C/C++/Objective-C/Objective-C++ compiler front-end based on LLVM architecture.

  • Compared with GCC, Clang has the following advantages:

  • Fast compilation: On some platforms, Clang compiles significantly faster than GCC (OC compiles 3 times faster than GCC in Debug mode).

  • Small memory footprint: CLANG-generated AST takes up about one-fifth of the memory footprint of GCC;

  • Modular design: Clang uses a library-based modular design for easy IDE integration and reuse.

  • Diagnostic information is readable: During compilation, Clang creates and retains a large amount of detailed metadata for debugging and error reporting.

  • The design is clear and simple, easy to understand, and easy to expand and enhance.

To be fair, GCC also has many advantages: for example, it supports multiple platforms and can be compiled based on C without a C++ compiler. That advantage turns out to be a weakness for Apple, which needs to be fast.

The Clang and LLVM

  • LLVM in broad sense: The entire LLVM architecture
  • Narrow LLVM: LLVM back end (code optimization, object code generation, etc.)

OC source file compilation process

  • Command line to view the compilation process:
clang -ccc-print-phases main.m
Copy the code

  • View the result of preprocessor:
clang -E main.m -F /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator.sdk/System/ Library/FrameworksCopy the code

1. Lexical analysis

Lexical analysis to generate tokens

int sum(int a,int b){
    int c = a + b;
    return c;
}
Copy the code
clang -fmodules -E -Xclang -dump-tokens main.m
Copy the code

This command displays the type, value, and location of each Token. Refer to this link to see all the Token types defined by Clang. It can be divided into the following four categories:

  • Keywords: keywords in the syntax, such as if, else, while, for, etc.
  • Identifiers: variable names;
  • Literals: values, numbers, strings;
  • Special symbols: symbols such as addition, subtraction, multiplication and division.

2. Grammatical analysis

Using the tokens output above, we first combine semantics syntactically to generate nodes like VarDecl, and then form an abstract syntax tree (AST) based on the hierarchical relationship between these nodes.

  • Syntax analysis, generative Syntax Tree (AST)
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code

TranslationUnitDecl is the root node and represents a compilation unit; Decl means a statement; Expr is an expression; Literal stands for Literal and is a special Expr; Stmt stands for statement.

In addition, Clang has a wide variety of node types. In Clang, there are three main types of nodes: Type, Decl declaration, and Stmt statement, and the rest are all derived from these three types. By extending these three types of nodes, an infinite code form can be represented in a finite form.

3. LLVM IR

LLVM IR has three representations, but they are essentially equivalent.

  • Text: An easy-to-read text format, similar to assembly language, extended by.ll
clang -S -emit-llvm main.m
Copy the code
  • Memory: indicates the memory format
  • Bitcode: binary format, extension.bc
clang -c -emit-llvm main.m
Copy the code

4. IR basic grammar

Part of the.ll file is as follows:

  • Comments with semicolons; At the beginning
  • Global identifiers start with @ and local identifiers start with %
  • Alloca, which allocates memory in the current function stack frame
  • I32, 32bit, 4 bytes
  • Align, memory alignment
  • Store, write data
  • Load, reads data

Application and Practice

There are many practices based on LLVM and Clang, as follows:

  • LibClang, LibTooling, Clang Plugin

Official reference:

Clang.llvm.org/docs/Toolin…

Application: Syntax tree analysis, language conversion, etc

  • OCLint, Clang Static Analyzer

  • Clang plug-in development

Official reference:

Clang.llvm.org/docs/ClangP…

Clang.llvm.org/docs/Extern…

Clang.llvm.org/docs/RAVFro…

Application: code inspection (naming conventions, code specifications), etc

  • Pass the development

Official reference:

Llvm.org/docs/Writin…

Application: intermediate code optimization, code obfuscation, etc

  • Develop new programming languages

LLVM tutorial – cn. Readthedocs. IO/en/latest/I…

Kaleidoscope – LLVM – tutorial – useful – cn. Readthedocs. IO/zh_CN/lates…

Write and run the Clang plug-in

Clang uses a modular design that allows its functions to be invoked by upper-layer applications in a library. For example, upper-layer applications such as code specification checking, syntax highlighting in the IDE, and syntax checking are developed using interfaces to the Clang library. Clang has three libraries that can be invoked by upper-level applications: LibClang, Clang Plugin, LibTooling.

LibClang has a lot less functionality than Clang in order to be compatible with more versions of Clang; Clang Plugin and LibTooling provide full capabilities of Clang. The Clang Plugin writes code in much the same way as LibTooling, except that the Clang Plugin can control the compilation process, either by adding a warning or simply breaking the compilation error. In addition, written LibTooling can be easily converted into a Clang Plugin. Therefore, the Clang Plugin is the most complete in terms of functionality.

1. Download source code

Download the LLVM Project

 git clone https://github.com/llvm/llvm-project.git
Copy the code

In the figure above, the Clang directory is the code directory for the C-like compiler. The LLVM directory code consists of two parts, one is the optimizer code for platform-independent optimization of source code, and the other is the generator code for generating platform-dependent assembly code. The LLDB directory contains the debugger code; In LLD is linker code.

2. Source code compilation

MacOS is a Unix-like platform, so you can generate either makefiles for compilation or Xcode projects for compilation.

Go to the llvm-project file directory and generate the Makefile:

  • Create an llVM_make directory in the LLVM directory
  • Compile using CMake in llVM_make
cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" ../llvm
Copy the code

To generate an Xcode project, you can use the following command

  • Install the CMake tool
  • Create a llVM_xcode directory in the LLVM directory
  • Compile using CMake in llVM_xcode
cmake -G Xcode -DLLVM_ENABLE_PROJECTS=clang .. /llvmCopy the code

To learn more about the syntax and functionality of CMake, check out the official documentation.

When executing the cmake command, you may get the following prompt:

-- The C compiler identification is unknown -- The CXX compiler identification is unknown CMake Error at CMakeLists.txt:39 (project):

No CMAKE_C_COMPILER could be found. 

CMake Error at CMakeLists.txt:39 (project):

No CMAKE_CXX_COMPILER could be found.
Copy the code

This indicates that cmake did not find the command line tool for the code compiler. There are two cases:

  • If Xcode Commandline Tools is not installed, run the following command to install it:
xcode-select --install
Copy the code
  • If you already have Xcode Commandline Tools installed, just reset it
sudo xcode-select --reset
Copy the code

After generating Xcode projects, open the generated LLVM. Xcodeproj file and select Automatically Create Schemes.

Xcode projects are generated and then compiled by Xcode, but the speed is very slow

3. Plug-in directory

  • Create a new plugin directory in the clang/ Tools source directory, such as mskj_plugin, and add the mskjplugin. CPP file and the cmakelists.txt file. Where CMake compilation needs to be guided by the cmakelists.txt file, CPP is the source file.

  • Add the last line of the cmakelist. TXT file in the clang/ Tools directory:
add_clang_subdirectory(mskj-plugin)
Copy the code
  • To customize the compilation process, write clang/tools/mskj-plugin/ cmakelists.txt as follows:
add_llvm_library(MSKJPlugin MODULE MSKJPlugin.cpp PLUGIN_TOOL clang)
Copy the code

MSKJPlugin is the name of the plug-in and mskjplugin. CPP is the source code file. This code refers to the integration of Clang plug-in code into LLVM’s Xcode project and debugging as a module. After adding the directory and files of the Clang plug-in, use the cmake command to generate the Xcode project again, which can integrate the mskjplugin.cpp file.

4. Write plug-in source code

① Write PluginASTAction code

Since the Clang plugin does not have a main function, the entry is the ParseArgs function of PluginASTAction. So, writing Clang plug-ins also implements ParseArgs to handle entry parameters. The code looks like this:

 class MSKJASTAction: public PluginASTAction {
    public:
        unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &ci, StringRef iFile) {
            return unique_ptr<MSKJASTConsumer> (new MSKJASTConsumer(ci));
        }
        
        bool ParseArgs(const CompilerInstance &ci, const vector<string> &args) {
            return true; }};Copy the code

(2) write ASTConsumer

FrontActions is an entry point for writing Clang plug-ins and an interface that is an abstract base class based on ASTFrontendAction. FrontActions provides a portal and work environment for subsequent functions based on AST operations.

This interface allows you to write custom operations during compilation in the following ways: To customize operations on the AST with ASTFrontendAction, override CreateASTConsumer to return your own Consumer to retrieve the ASTConsumer cell on the AST. ASTConsumer provides a number of entrances. It is an abstract base class that accesses the AST and can override the HandleTopLevelDecl() and HandleTranslationUnit() functions to receive callbacks when accessing the AST. The HandleTopLevelDecl() function calls back when it accesses the top-level declarations such as global variables and function definitions, and the HandleTranslationUnit() function calls back when it receives each node access.

class MSKJASTConsumer: public ASTConsumer {
    private:
        MatchFinder matcher;
        MSKJHandler handler;
        
    public:
        MSKJASTConsumer(CompilerInstance &ci) :handler(ci) {
            matcher.addMatcher(objcInterfaceDecl().bind("ObjCInterfaceDecl"), &handler); } void HandleTranslationUnit(ASTContext &context) { matcher.matchAST(context); }};Copy the code

③ Processing node

 class MSKJHandler : public MatchFinder::MatchCallback {
    private:
        CompilerInstance &ci;
     
    public:
        MSKJHandler(CompilerInstance &ci) :ci(ci) {}
        
        void run(const MatchFinder::MatchResult &Result) {
            if (const ObjCInterfaceDecl *decl = Result.Nodes.getNodeAs<ObjCInterfaceDecl>("ObjCInterfaceDecl")) {
                size_t pos = decl->getName().find('_');
                if(pos ! = StringRef::npos) { DiagnosticsEngine &D = ci.getDiagnostics(); SourceLocation loc = decl->getLocation().getLocWithOffset(pos); D.Report(loc, D.getCustomDiagID(DiagnosticsEngine::Error,"MSKJ: No underscores in class names")); }}}};Copy the code

5. Register the Clang plugin

Write the registration code in the source code of the Clang plug-in. The compiler loads the Clang plug-in from the dynamic library during compilation. Using FrontendPluginRegistry: : Add < > registered plug-ins in the repository. The code to register the Clang plug-in is as follows:

static FrontendPluginRegistry::Add<MSKJPlugin::MSKJASTAction> X("MSKJPlugin"."The MSKJPlugin is my first clang-plugin.");
Copy the code

At The bottom of The Clang plug-in code, The MSKJPlugin string is defined as a command line string for later invocation. The MSKJPlugin is my first clang-plugin is a description of The Clang plug-in.

Use the Clang plugin

The MSKJPlugin can be found in Loadable modules using the CMake command:

Select the target of MSKJPlugin to compile, which will generate a dynamic library file.

LLVM officially has a complete example of the Clang plug-in available to help us print out the name of the uppermost function.

Learn how to use the Clang plug-in by studying this plug-in example.

Using the Clang plug-in, you can load the dynamic library containing the plug-in registry with the -load command line option, which loads all the Clang plug-ins that have been registered. Use the -plugin option to select the Clang plug-in to run. Other parameters of the Clang plug-in are passed through -plugin-arg-.

The CC1 process is similar to a preprocessing process that takes place before compilation. Cc1 and Clang driver are two separate entities. Cc1 is responsible for front-end preprocessing, while Clang driver is mainly responsible for managing compilation task scheduling. Each compilation task accepts the parameters of CC1 front-end preprocessing and then adjusts them.

There are two ways to get options like -load and -plugin into the CC1 process of Clang:

  • Using the -cc1 option directly has the disadvantage of specifying the full system path configuration on the command line;
  • Use -xclang to add these options to the CC1 process. – The Xclang parameter only runs the preprocessor and passes the following parameters directly to the CC1 process without affecting the clang driver.

Here is an example of compiling the Clang plug-in and then loading it using -xclang:

$ export BD=/path/to/build/directory 
$ (cd $BD && make PrintFunctionNames ) 
$ clang++ -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS \
          -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_GNU_SOURCE \ 
          -I$BD/tools/clang/include -Itools/clang/include -I$BD/include -Iinclude \                                        tools/clang/tools/clang-check/ClangCheck.cpp -fsyntax-only \
          -Xclang -load -Xclang $BD/lib/PrintFunctionNames.so -Xclang \
          -plugin -Xclang print-fns
Copy the code

In the preceding command, the build path is set, the make command is used to compile printfunctionnames. so, and the clang plug-in is loaded using the clang command and the -xclang parameter.

You can also use the -cc1 argument directly, but you need to specify the full file path as follows:

$ clang -cc1 -load .. /.. /Debug+Asserts/lib/libPrintFunctionNames.dylib -pluginprint-fns some-input-file.c
Copy the code

7. More

To achieve more complex plug-in functions, clang’s API can be used to analyze and process the syntax tree.

Information about AST:

Clang.llvm.org/doxygen/nam…

Clang.llvm.org/doxygen/cla…

Clang.llvm.org/doxygen/cla…

Clang plug-in itself is not complicated to write and use, the key is how to better apply to work, through the Clang plug-in can not only check code specification, but also useless code analysis, automatic buried pile, offline test analysis, method name confusion, etc.

conclusion

Understanding the compilation principle of iOS helps us to have a deeper understanding of the program, so that we can look at the problem and think about the solution of the problem from the bottom point of view.

Author’s brief introduction

Fan Chong is a development engineer of mobile financial development platform in user Experience Technology Department of Minsheng Technology Co., LTD

Thanks!