Summary of basic principles of iOS

This article is mainly to understand the compilation process of LLVM and the development of clang plug-in

LLVM

LLVM is a framework system for architecture compilers. It is written in C++ to optimize compile-time, link-time, run-time, and idle-time for programs written in any programming language. Keep it open to developers and compatible with existing scripts

Traditional compiler design

Source Code + Frontend + Optimizer + Back-end CodeGenerator + Machine Code, as shown in the following figure

The ios compiler architecture

OC, C and C++ use a compiler with Clang front end, Swift Swift and LLVM back end, as shown in the figure below

The module specification

  • The front-end Frontend: compiler front-endtaskisParsing source code(compile phase), it will proceedLexical analysis, grammar analysis, semantic analysis, check whether the source code errorAnd then buildAbstract syntax tree(the Abstract Syntax TreeAST),LLVMThe front end will also generateThe middle codeIntermediate representationIR), can be understoodllvmisCompiler + optimizerThe receiver isIRIntermediate code, output is stillIR, which is translated into the target instruction set by the back end
  • Optimizer: The Optimizer is responsible for various optimizations that improve the runtime of your code, such as eliminating redundant calculations
  • Backend (Code Generator)Will:The code maps to the target instruction set, generating machine codeAnd optimize the machine code

The design of the LLVM

The most important aspect of the LLVM design is the use of a common code representation (IR), which is used to represent code in the compiler. All LLVMS can be written independently on the front end for any programming language and independently on the back end for any hardware architecture, as shown below

LLVM is designed to separate the front end from the back end, so changes in the front end and back end will not affect the other end

Clang profile

Clang is a sub-project of the LLVM project. It is a lightweight compiler based on the LLVM architecture diagram. It was originally created to replace GCC and provide faster compilation speed. There are many benefits to studying Clang

LLVM compilation process

  • Create a new file and write the following code
int test(int a,int b){
    return a + b + 3;
}


int main(int argc, const char * argv[]) {
    int a = test(1, 2);
    printf("%d",a);
    return 0;
}
Copy the code
  • The compilation process of the source code can be printed by command
/ / * * * * * * * * * * * * command * * * * * * * * * * * * clang - CCC - print - phases. The main m / / * * * * * * * * * * * * the compilation process * * * * * * * * * * * * / / 0 - input file: find the source file + 0: Input, "main.m", Objective-C //1 - Preprocessing stage: This process includes macro replacement, header file import + -1: Preprocessor, {0}, objective-c-pcp-output //2 - Compilation stage: perform lexical analysis, syntax analysis, check whether the syntax is correct, and finally generate IR + -2: Compiler, {1}, IR //3 - back end: LLVM is optimized one pass at a time, and each pass does something to generate assembly code + -3: Backend, {2}, assembler //4 - Assembler code to generate object files + -4: Assembler, {3}, object //5 - Link: link required dynamic libraries and static libraries to generate executable files + -5: Linker, {4}, image (image file) //6 - Bind: generate the corresponding executable file from different architectures 6: bind-arch, "x86_64", {5}, imageCopy the code

The following explains the above process respectively, where 0 is mainly the input file, that is, find the source file. I won’t say too much here

1. Pre-compile stage

This stage is mainly to deal with the replacement of the macro, the import of the header file, you can execute the following command, the execution can see the import of the header file and the replacement of the macro

Clang -e main.m >> main2.mCopy the code

Note that:

  • typedefDuring the preprocessing phase when aliasing a data typeIt's not going to be replaced
  • defineIn the pretreatment phaseWill be replaced, so it is often used for code confusion, the purpose is to app security, the implementation logic is: the app core classes, core methods, etcAlias with a system-like nameIt is then replaced during the preprocessing phase to obfuscate the code

Second, compilation stage

The compilation stage is mainly for analysis and check of morphology, grammar, etc., and then generate intermediate code IR

1. Lexical analysis

After the preprocessing is done, a lexical analysis is performed, where the code is sliced into tokens such as brackets, equals signs, and strings,

  • You can run the following command to view the information
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code
  • If the header file cannot be found, specify the SDK
Clang-isysroot (own SDK path) -fmodules-fsynth-only -xclang-dump -tokens main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -dump-tokens main.mCopy the code

Here is the result of the lexical analysis of the code

2. Grammatical analysis

After the completion of lexical analysis, it is the task of grammar analysis, which is to verify whether the grammar is correct. On the basis of lexical analysis, word sequences are combined into all kinds of this method phrases, such as programs, statements, expressions and so on, and then all nodes are formed into Abstract Syntax Tree AST. The parser determines whether a program is structurally correct

  • You can run the following command to view the result of parsing
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code
  • If the import header file is not found, specify the SDK
Clang-isysroot (own SDK path) -fmodules-fsyntax-only - xclang-ast -dump main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -ast-dump main.mCopy the code

Here are the results of the parsing

Among them, it mainly explains the meanings of several keywords

  • – FunctionDecl function
  • – ParmVarDecl parameters
  • -CallExpr calls a function
  • – BinaryOperator operator

3. Generate intermediate code IR

After completing the above steps, the intermediate Code IR is generated. The Code Generation iterates the syntax tree from top to bottom and translates it into LLVM IR.

  • You can run the following command to generate itLl text file, view IR code. The OC code does runtime bridging, : property synthesis, ARC processing, etc
Clang-s-fobjc-arc-emia-llvm main.m // the following IR basic syntax @ global id % local id alloca open space align memory align i32 32bit, 4 bytes store write memory Load read data call call function RET returnCopy the code

Below is the generated intermediate code.ll file

Where, the parameters of the test function are interpreted as

Of course, IR files can be optimized in OC, the general Setting is target-build setting-optimization Level (optimizer Level). The LLVM optimization levels are -o0-O1-O2-O3-OS (the first is a capital O), and the following commands are optimized to generate intermediate code IR

clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Copy the code

This is the optimized intermediate code

  • After Xcode7 opens bitcode, Apple will further optimize and generate the intermediate code of.bc. We will generate the.BC code through the optimized IR code
clang -emit-llvm -c main.ll -o main.bc
Copy the code

Third, the back end

The LLVM backend is optimized one Pass at a time, doing something with each Pass and eventually generating assembly code

Generating assembly code

  • We go through the final.bc or.ll codegenerateAssembly code
 clang -S -fobjc-arc main.bc -o main.s clang -S -fobjc-arc main.ll -o main.s
Copy the code
  • Generating assembly code can also be optimized
clang -Os -S -fobjc-arc main.m -o main.s
Copy the code

The generated main.s file is in assembly code format

4. Generate the target file

The generation of the object file is that the assembler takes the assembly code as the insert, converts the assembly code into machine code, and finally outputs the object file.

clang -fmodules -c main.s -o main.o
Copy the code

You can run the nm command to view the symbols in main.o

$xcrun nm -nm main.o
Copy the code

The following symbols in main.o are in object file format

  • _printfThe function is one is oneUndefined, external 的
  • undefinedIndicates that the current file is temporaryThe symbol _printf could not be found
  • externalThat means that the symbol isExternally accessiblethe

Five, links,

Links are mainly needed to link dynamic libraries and static libraries, which generate executable files

  • Static libraries are merged with executable files
  • Dynamic libraries are independent

The linker creates a Mach-o file by linking the compiled.o file with the.dyld. A file

clang main.o -o main
Copy the code

See the symbol after the link

$xcrun nm -nm main
Copy the code

The result is shown below, where undefined means dynamic binding at run time

Command to see what format main is, in this case the Mach -o executable

Sixth, the binding

The binding is mainly used to generate the corresponding mach-O format executable through different architectures

conclusion

To sum up, the compilation process of LLVM is shown in the figure below

Clang plug-in development

1. Preparation

Due to domestic network restrictions, the source code of LLVM needs to be downloaded with the help of image. Here is the link of image

  • Download the LLVM project
git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/llvm.git
Copy the code
  • inLLVMtheprojectsDownload from directoryCompiler-rt, libcXX, liBCXXABI
cd .. /projects git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/compiler-rt.git git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/libcxx.git git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/libcxxabi.gitCopy the code
  • inClangthetoolsinstallextratool
cd .. /tools/clang/tools git clone https://mirrors.tuna.tsinghua.edu.cn/git/llvm/clang-tools-extra.gitCopy the code

LLVM compilation

Since the latest LLVM only supports cmake to compile, you need to install cmake

Install cmake

  • Check brew to see if Cmake is installed, and if so, skip the following steps
brew list
Copy the code
  • throughThe brew install cmake
brew install cmake
Copy the code

Compile the LLVM

There are two ways to compile:

  • throughxcodeCompile the LLVM
  • throughninjaCompile the LLVM

Compile LLVM with Xcode

  • Cmake compiles into Xcode projects
mkdir build_xcode cd build_xcode cmake -G Xcode .. /llvmCopy the code

Compile Clang using Xcode

  • Select Automatic creation of Schemes

Compile (CMD + B), select ALL_BUILD Secheme to compile, estimated 1+ hours

Note: The i386 architecture is deprecated. You should update your ARCHS build setting to remove The i386 architecture tries to solve it, but no good solution has been found so far (to be added later)

Alternative: Manually create Schemes and compile Clang + ClangTooling

Compile LLVM with NINJA

  • useninjaInstallation is also required to compileninjaUse the following command to install Ninja
brew install ninja
Copy the code
  • Create a build_ninja directory under the LLVM source root, which will eventually generate ‘build. Ninja’ in the build_ninja directory

  • Create the llVM_release directory under the LLVM source directory. The final compilation file will be in the llVM_release folder

Cmake -g Ninja.. CD llvm_build cmake -g Ninja.. / llvm-dcmake_install_prefix = installation path (native: / Users/ XXX/XXX /LLVM/llvm_release)Copy the code
  • Execute the compile and install instructions once
ninja

ninja install
Copy the code

3. Create plug-ins

Create CJLPlugin under/LLVM /tools/clang/tools

Add add_clang_subdirectory(CJLPlugin) to the cmakelists. TXT file in/LLVM /tools/clang/tools, where CJLPlugin is the name of the plug-in created in the previous step

  • inCJLPluginCreate two new files in the directoryCJLPlugi.cpp 和CMakeLists.txtAnd, inCMakeLists.txtAdd the following code to
CPP touch cmakelists. TXT //2, cmakelists. TXT add the following code: add_llvm_library(CJLPlugin MODULE BUILDTREE_ONLY CJLPlugin.cpp )Copy the code

  • Next, use cmake to regenerate the Xcode project inbuild_xcodeRun the following command in the directory
cmake -G Xcode .. /llvmCopy the code
  • This can finally be seen in LLVM’s Xcode projectLoadable modulesYou can then write plug-in code in the CJLPlugin directory

Write plug-in code

In the cjlplugin. CPP file in the CJLPlugin directory, add the following code

// create by CJL // 2020/11/15 #include <iostream> #include "clang/AST/AST.h" #include "clang/AST/DeclObjC.h" #include "clang/AST/ASTConsumer.h" #include "clang/ASTMatchers/ASTMatchers.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/ASTMatchers/ASTMatchFinder.h" #include "clang/Frontend/FrontendPluginRegistry.h" using namespace clang; using namespace std; using namespace llvm; using namespace clang::ast_matchers; CJLMatchCallback = CJLMatchCallback = CJLMatchCallback = CJLMatchCallback public MatchFinder::MatchCallback { private: //CI transfer path: CJLASTAction CreateASTConsumer - constructor of CJLConsumer - private property of CJLMatchCallback Fetch CompilerInstance &CI from the CJLASTConsumer constructor via the constructor. Bool isUserSourceCode(const string filename) {// the filename is not empty if (filename.empty()) return false; If (filename.find("/Applications/ xcode.app /") == 0) return false; if (filename.find("/Applications/ xcode.app /") == 0) return false; return true; } / / determine whether should use the copy modify bool isShouldUseCopy (const string typeStr) {/ / if the types of judgment nsstrings | NSArray | NSDictionary the if (typeStr.find("NSString") ! = string::npos || typeStr.find("NSArray") ! = string::npos || typeStr.find("NSDictionary") ! = string::npos/*... */) { return true; } return false; } public: CJLMatchCallback(CompilerInstance &CI) :CI(CI) {} void run(const MatchFinder::MatchResult &Result) { Const ObjCPropertyDecl *propertyDecl = const ObjCPropertyDecl *propertyDecl = Result.Nodes.getNodeAs<ObjCPropertyDecl>("objcPropertyDecl"); // Check that the node has a value. And it's the user file if (propertyDecl &&) IsUserSourceCode (ci.getSourcemanager ().getFilename(propertyDecl->getSourceRange().getbegin ()).str())) {//15. Obtain the node description ObjCPropertyDecl::PropertyAttributeKind attrKind = propertyDecl->getPropertyAttributes(); String typeStr = propertyDecl->getType().getAsString(); / / cout < < "-- -- -- -- -- -- -- -- -- received:" < < typeStr < < "-- -- -- -- -- -- -- -- -" < < endl; Copy if (propertyDecl->getTypeSourceInfo() &&isshouldusecopy (typeStr) &&! (attrKind & ObjCPropertyDecl::OBJC_PR_copy) {// Use CI to send warning messages // Use CI to get DiagnosticsEngine & Diag = ci.getDiagnostics (); // Use diagnostic engine report to report an error, that is, throw an exception /* Error position: getBeginLoc node start position error: GetCustomDiagID (Level, Prompt) * / diag. Report (propertyDecl - > getBeginLoc (), diag. GetCustomDiagID (DiagnosticsEngine: : Warning, "% 0 - this place is recommended to use copy!!" ))<< typeStr; }}}}; // select CJLASTConsumer from ASTConsumer and use it to listen for information on AST nodes. Public ASTConsumer {private: //AST node search filter MatchFinder matcher; // Define the callback class object CJLMatchCallback callback; Public: // Constructor creates the matcherFinder object CJLASTConsumer(CompilerInstance &CI) : Callback (CI) {// Add a MatchFinder with each objcPropertyDecl node bound to an objcPropertyDecl identifier (to match the objcPropertyDecl node) // Callback, Matcher.addmatcher (objcPropertyDecl().bind("objcPropertyDecl"), &callback);  } // Implement two callback methods: HandleTopLevelDecl and HandleTranslationUnit Bool HandleTopLevelDecl(DeclGroupRef D){// cout<<" parsing..." <<endl; return true; Void HandleTranslationUnit(ASTContext &context) {// cout<<" The file is parsed!" <<endl; // Give matcher matcher.matchast (context) the parsed context of the file. }}; Class CJLASTAction: public PluginASTAction {public: // Overload ParseArgs and CreateASTConsumer methods bool ParseArgs(const CompilerInstance &ci, const std::vector<std::string> &args) { return true; } // Returns an object of type ASTConsumer, where ASTConsumer is an abstract class, that is, the base class /* parses the given plug-in command-line arguments. - param CI compiler instance for reporting diagnostics. - return True if the parsing succeeds. Otherwise, the plug-in is destroyed and no action is taken. This plug-in is responsible for reporting errors using the Diagnostic object CompilerInstance. */ unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef iFile) {// returns a custom CJLASTConsumer, which is a subclass of ASTConsumer. Return unique_ptr<CJLASTConsumer> (new CJLASTConsumer(CI)); }}; } // Step 1: To register a plug-in, And the custom Action class AST syntax tree / / 1, a registered plug-in static FrontendPluginRegistry: : Add < CJLPlugin: : CJLASTAction > CJL (" CJLPlugin ", "This is CJLPlugin");Copy the code

Its principle is mainly divided into three steps

  • Step 1: Register the plugin and customize the AST syntax tree Action class

    • To customize ASTAction, you need to override two methods: ParseArgs and CreateASTConsumer. The most important method is CreateASTConsumer, which has a parameter CI, which is to compile the instance object. It is mainly used in the following two aspects

      • Used to determine whether the file belongs to the user
      • Used to throw warnings
    • To register a plug-in with FrontendPluginRegistry, you need to associate the plug-in name with a custom ASTAction class

  • [Step 2] The scan configuration is complete

    • A MatchFinder object, matcher, and a callback object, called by CJLMatchCallback

    • Implement the constructor, which creates the MatchFinder object and assigns the CI bed to the callback object

    • Implement two callback methods

      • HandleTopLevelDecl: After parsing a top-level declaration, it calls back once
      • HandleTranslationUnit: a callback in which the entire file is parsed, the context in which the file is parsedcontext(that is, AST syntax tree) tomatcher
  • [Step 3] The scanned callback function

    • Inherited from MatchFinder: : MatchCallback, custom callback class CJLMatchCallback

    • Define the private CompilerInstance property to receive CI messages from the ASTConsumer class

    • Override the run method

      • 1. Obtain the corresponding node according to the node mark through resultCJLASTConsumerConsistent in the constructor
      • 2. Check whether the node has a value and is a user fileisUserSourceCodePrivate methods
      • 3. Obtain node description
      • 4. Get the node type and convert it to a string
      • 5, determine that copy should be used, but copy is not used
      • 6, throughCIObtaining diagnostic engines
      • 7. Report errors through diagnostic engines

So, to sum up, the flowchart for clang plug-in development is as follows

Then test the plug-in in the terminal

// Command format Self-compiled clang file path -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -xclang-load-xclang plugin (.dyld) path -xclang-add-plugin -xclang plugin name -c source code path // example /Users/XXX/Desktop/build_xcode/Debug/bin/clang -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -Xclang -load -Xclang /Users/XXXX/Desktop/build_xcode/Debug/lib/CJLPlugin.dylib -Xclang -add-plugin -Xclang CJLPlugin -c / / Users/XXXX/Desktop/XXX XXXX/test demo/testClang/testClang/ViewController. MCopy the code

4. Xcode integration plug-in

Load the plug-in

  • Open the test project intarget->Build Settings -> Other C FlagsAdd the following

Setting up the compiler

  • Because the clang plug-in needs to be loaded with the corresponding version, inconsistent versions will cause compilation failure, as shown below

  • Add two user-defined Settings, CC and CXX, to the Build Settings TAB

    • CCThe corresponding one is compiled by itselfclangAbsolute path of
    • CXXThe corresponding one is compiled by itselfclang++Absolute path of

  • The next inBuild SettingsIn the searchindexThat will beEnable Index-Wihle-Building FunctionalitytheDefaultInstead ofNO

Finally, recompile the test project, and the following results appear