This article will explore how LLVM works, how our code is translated step by step into machine-readable machine code, and what steps we can take to add or change the functionality we need.

Finally, hand polish a clang plugin that you can play with at will.

Download demo: demo

1. Compilation process

Before writing the Clang plug-in, we need to understand what clang does when compiling a project.

Without reading a thick “How to Compile” book, iOS developers come with clang on their Macs, and we use clang commands to watch some of the front-end process.

For the sake of a clear view of the compilation process, we will create a new command line project called Testclang without messy dependencies.

Overview of compilation process

Use the native Clang to view the compilation process

Clang-ccc-print-phases main.mCopy the code

0: input, "main.m", objective-c // source input 1: preprocessor, {0}, objective-c-pcp-output 3: backend, {2}, assembler // Compiler 4: backend, {2}, assembler // 6: bind-arch, "x86_64", {5}, image // ADAPTS the architecture of each platformCopy the code

Second, precompile

For a more intuitive view of the results of precompiling my own code, we will use a unique header fileFoundationDelete it, and then add a simple oneaddFunction.

Use the precompile command to view the results

// precompile clang-e main.mCopy the code

You can see that one of the effects of precompilation is to replace the macro definition with the real value.

If we didn’t delete the Foundation header, we will add the Foundation content to the result at this stage, if you are interested. I’m not going to take up too much space here.

Xcode also provides a convenient entry point

Lexical analysis

The lexical analysis stage is the first stage of the compilation process. It is the process of converting sequences of characters into sequences of words (tokens). The task at this stage is to read the source program character by character from left to right, that is, to scan the stream of characters that make up the source program and identify words (also known as word symbols or symbols) according to word-formation rules. The lexical analyzer performs this task.

So let’s take a look at what tokens our simple add function breaks down into.

Clang-fmodules-e-xclang-dump-tokens main.mCopy the code

As you can see, lexical analysis breaks up the precompiled code for each symbol, as in:

  • intLet’s just define it as zeroint
  • mainIs defined asidentifier
  • (Is defined asl_paren.)Is defined asr_paren
  • Macro definition in source codeNUMI can’t find the real value here anymore6

Other symbols, such as, ‘+’ – ‘=’; There are also tokens that are respectively corresponding to tokens.

4. Grammatical analysis

Parsing is a logical phase of the compilation process. The task of grammar analysis is to combine sequences of words into various grammatical phrases, such as “program”, “statement”, “expression” and so on, based on lexical analysis. The parser determines whether the source program is structurally correct.

Again, let’s see what happens to our add function after parsing it.

// clang-fmodules-fsyntax-only - xclang-ast -dump main.mCopy the code

As you can see, after parsing, you can see a description type, such as:

  • Method describes the type declarationFunctionDecl:add
  • Parameter Description Indicates the typeParmVarDecl:a
  • Variable description type declarationVarDecl:b
  • Integer values describe type declarationsIntegerLiteral:10

And of course the syntax checking that we’ll talk about later will be implemented in this step, and these declaration types will also be used when we implement the plug-in.

There is also an error in the figure above.

main.m:11:13: error: implicit declaration of function 'add' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    int d = add(2);
Copy the code

Four, other

The remaining steps to backend, assembler, would, bind – the arch is not the focus of the film (mainly the author also begin), so don’t add account.

2. Create the Clang plugin

You’ve seen how to compile your own LLVM and Clang in the previous article compiling your own LLVM and Clang. We also talked about how to create a new Xcode template with Clang, so we don’t have to repeat the task.

In the tools directory of clang, where you downloaded the source code, you can store the clang plug-ins.

/llvm-project/clang/tools
Copy the code

Create a new one in Toolstest-plugin1Folders, as clang is all usedC++So of course we need to create a new oneC++The fileTestPlugin1.cppAnd because we are usingcmakeCompile, so CMakeLists files are indispensable.

CMakeLists tells us what files and types the TestPlugin1 plugin contains. It used to be add_llVM_loadable_module, now it is add_llVM_library due to function duplication

add_llvm_library(TestPlugin1 MODULE TestPlugin1.cpp)
Copy the code

Then add the test-plugin1 declaration to the CMakeLists file in the same directory as the test-plugin1 folder.

add_clang_subdirectory(test-plugin1)
Copy the code

Finally, regenerate the Xcode template, which is faster because it is compiled incrementally this time.

To summarize the process: 2, add CMakeLists and CPP files to the plugin1 folder (if there are more than one, add more CPP files) CMakeLists < test-plugin1 > add test-plugin1 declaration to CMakeLists < test-plugin1

3. Tune Xcode

I covered how to compile your own Clang in the previous article compiling your own LLVM and Clang. However, Xcode has its own default version of Clang, which our own projects cannot use directly, so we need to configure Xcode to make our own compiled Clang work.

We need to simulate normal APP development this time, so we need to create a new app project: TestApp.

Specify clang

By default, Xcode uses the built-in clang front end. The new version of Xcode has too many symbols stripped, so we need to add CC and CXX parameters to specify our own CLang address in the new Xcode.

If not specified, the following appearserror: unable to load plugin Symbok not foundSimilar error

Add CC and CXX absolute paths to the configuration file, namely the absolute path of clang++.

Note: Clang and Clang ++ are included in the LLVM compilation as mentioned in the previous article.

CC = /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/bin/clang
CXX = /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/bin/clang++
Copy the code

2, Disable Enable index-while-building Functionality

Index-While-BuildingOriginally used by Apple to optimize the code index, default open. Xcode will be indexed when it is compiled, but it will affect the compilation speed. Faster overall compilation speed after closing the 80s (Xcode will be changed back to the previous way, build code index in idle time). Because we use our own CLang and do not support compile-time indexing, the following error is reported

clang: error: unknown argument: '-index-store-path'
clang: error: cannot specify -o when generating multiple output files
Copy the code

So here we just have to set theta to zeroNoCan be closed

Specify additional plug-ins that need to be loaded

Search in the configuration fileother cQuick query

Add the following

-xclang-load plugin address (dylib address) -xclang-add-plugin -xclang plugin name // instance -xclang-load-xclang /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/lib/TestPlugin1.dylib -Xclang -add-plugin -Xclang TestPluginCopy the code

Note: Since xcode has a cache, after recompiling the plugin, xcode may still use the old version of the plugin (without the TestPlugin1 version). Since I don’t know how to clean the cache (clean does not work), I take the following approach: 1, change the address of the plugin to a wrong address, re-cmd +B 2, then change the correct address, you are clear.

4. Write plug-in code

The code part is the easiest part, just a little bit of syntax, specific apis. This part is not much narration, there are also remarks in the code, directly on the code.

#include <iostream> #include "clang/AST/AST.h" #include "clang/AST/ASTConsumer.h" #include "clang/ASTMatchers/ASTMatchers.h" #include "clang/ASTMatchers/ASTMatchFinder.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/Frontend/FrontendPluginRegistry.h" using namespace clang; using namespace std; using namespace llvm; using namespace clang::ast_matchers; namespace TestPlugin { class TestHandler : public MatchFinder::MatchCallback{ private: CompilerInstance &ci; public: TestHandler(CompilerInstance &ci) :ci(ci) {} // check whether it is a user source file bool isUserSourceCode(const string filename) {// the filename is not empty if (filename.empty()) return false; If (filename.find("/Applications/ xcode.app /") == 0) return false; if (filename.find("/Applications/ xcode.app /") == 0) return false; return true; Void run(const MatchFinder::MatchResult &Result) {// Check the class name (Interface), Cannot have an underscore if (const ObjCInterfaceDecl *decl = result.nodes.getNodeas <ObjCInterfaceDecl>("ObjCInterfaceDecl")) {string filename = ci.getSourceManager().getFilename(decl->getSourceRange().getBegin()).str(); if ( ! isUserSourceCode(filename) ) return; size_t pos = decl->getName().find('_'); if (pos ! = StringRef::npos) { DiagnosticsEngine &D = ci.getDiagnostics(); SourceLocation loc = decl->getLocation().getLocwithoffset (pos); D.R eport (loc, D.g etCustomDiagID (DiagnosticsEngine: : Warning, "TestPlugin: in the name of the class is not underlined")); }} // check variables (Interface), Cannot have an underscore if (const VarDecl *decl = result.nodes.getNodeas <VarDecl>("VarDecl")) {string filename = ci.getSourceManager().getFilename(decl->getSourceRange().getBegin()).str(); if ( ! isUserSourceCode(filename) ) return; size_t pos = decl->getName().find('_'); if (pos ! = StringRef::npos && pos ! = 0) { DiagnosticsEngine &D = ci.getDiagnostics(); SourceLocation loc = decl->getLocation().getLocWithOffset(pos); D.R eport (loc, D.g etCustomDiagID (DiagnosticsEngine: : Warning, "named TestPlugin2: please use the hump, does not recommend the use of the underline")); }}}}; Class TestASTConsumer: public ASTConsumer{private: MatchFinder matcher; TestHandler handler; public: TestASTConsumer(CompilerInstance &ci) :handler(ci) { matcher.addMatcher(objcInterfaceDecl().bind("ObjCInterfaceDecl"), &handler); matcher.addMatcher(varDecl().bind("VarDecl"), &handler); matcher.addMatcher(objcMethodDecl().bind("ObjCMethodDecl"), &handler); } void HandleTranslationUnit(ASTContext &Ctx) { printf("TestPlugin1: All ASTs has parsed."); DiagnosticsEngine &D = Ctx.getDiagnostics(); / / in the build log can see D.R eport (D.g etCustomDiagID (DiagnosticsEngine: : Warning, "TestPlugin Warning")); D.R eport (D.g etCustomDiagID (DiagnosticsEngine: : Error, "TestPlugin Error message")); matcher.matchAST(Ctx); }}; Class TestAction: public PluginASTAction{public: PluginASTAction unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef InFile){ return unique_ptr<TestASTConsumer> (new TestASTConsumer(CI)); } bool ParseArgs(const CompilerInstance &CI, const std::vector<std::string> &arg){ return true; }}; } / / tell the clang, registered a new plugin static FrontendPluginRegistry: : Add < TestPlugin: : TestAction > X (" TestPlugin." "Test a new Plugin"); // Test a new Plugin // Test a new Plugin // Test a new Plugin //Copy the code

The parts of the code are their own logic, for example, the core part above is getName, then find(‘_’).

5, summary

This time we are writing a clang plugin to check the code, so can we play code obtrusion next time?