This article will explore how LLVM works, how our code is translated step by step into machine-readable machine code, and what steps we can take to add or change the functionality we need.
Finally, hand polish a clang plugin that you can play with at will.
Download demo: demo
1. Compilation process
Before writing the Clang plug-in, we need to understand what clang does when compiling a project.
Without reading a thick “How to Compile” book, iOS developers come with clang on their Macs, and we use clang commands to watch some of the front-end process.
For the sake of a clear view of the compilation process, we will create a new command line project called Testclang without messy dependencies.
Overview of compilation process
Use the native Clang to view the compilation process
Clang-ccc-print-phases main.mCopy the code
0: input, "main.m", objective-c // source input 1: preprocessor, {0}, objective-c-pcp-output 3: backend, {2}, assembler // Compiler 4: backend, {2}, assembler // 6: bind-arch, "x86_64", {5}, image // ADAPTS the architecture of each platformCopy the code
Second, precompile
For a more intuitive view of the results of precompiling my own code, we will use a unique header fileFoundation
Delete it, and then add a simple oneadd
Function.
Use the precompile command to view the results
// precompile clang-e main.mCopy the code
You can see that one of the effects of precompilation is to replace the macro definition with the real value.
If we didn’t delete the Foundation header, we will add the Foundation content to the result at this stage, if you are interested. I’m not going to take up too much space here.
Xcode also provides a convenient entry point
Lexical analysis
The lexical analysis stage is the first stage of the compilation process. It is the process of converting sequences of characters into sequences of words (tokens). The task at this stage is to read the source program character by character from left to right, that is, to scan the stream of characters that make up the source program and identify words (also known as word symbols or symbols) according to word-formation rules. The lexical analyzer performs this task.
So let’s take a look at what tokens our simple add function breaks down into.
Clang-fmodules-e-xclang-dump-tokens main.mCopy the code
As you can see, lexical analysis breaks up the precompiled code for each symbol, as in:
int
Let’s just define it as zeroint
main
Is defined asidentifier
(
Is defined asl_paren
.)
Is defined asr_paren
- Macro definition in source code
NUM
I can’t find the real value here anymore6
Other symbols, such as, ‘+’ – ‘=’; There are also tokens that are respectively corresponding to tokens.
4. Grammatical analysis
Parsing is a logical phase of the compilation process. The task of grammar analysis is to combine sequences of words into various grammatical phrases, such as “program”, “statement”, “expression” and so on, based on lexical analysis. The parser determines whether the source program is structurally correct.
Again, let’s see what happens to our add function after parsing it.
// clang-fmodules-fsyntax-only - xclang-ast -dump main.mCopy the code
As you can see, after parsing, you can see a description type, such as:
- Method describes the type declaration
FunctionDecl
:add
- Parameter Description Indicates the type
ParmVarDecl
:a
- Variable description type declaration
VarDecl
:b
- Integer values describe type declarations
IntegerLiteral
:10
And of course the syntax checking that we’ll talk about later will be implemented in this step, and these declaration types will also be used when we implement the plug-in.
There is also an error in the figure above.
main.m:11:13: error: implicit declaration of function 'add' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
int d = add(2);
Copy the code
Four, other
The remaining steps to backend, assembler, would, bind – the arch is not the focus of the film (mainly the author also begin), so don’t add account.
2. Create the Clang plugin
You’ve seen how to compile your own LLVM and Clang in the previous article compiling your own LLVM and Clang. We also talked about how to create a new Xcode template with Clang, so we don’t have to repeat the task.
In the tools directory of clang, where you downloaded the source code, you can store the clang plug-ins.
/llvm-project/clang/tools
Copy the code
Create a new one in Toolstest-plugin1
Folders, as clang is all usedC++
So of course we need to create a new oneC++
The fileTestPlugin1.cpp
And because we are usingcmake
Compile, so CMakeLists files are indispensable.
CMakeLists tells us what files and types the TestPlugin1 plugin contains. It used to be add_llVM_loadable_module, now it is add_llVM_library due to function duplication
add_llvm_library(TestPlugin1 MODULE TestPlugin1.cpp)
Copy the code
Then add the test-plugin1 declaration to the CMakeLists file in the same directory as the test-plugin1 folder.
add_clang_subdirectory(test-plugin1)
Copy the code
Finally, regenerate the Xcode template, which is faster because it is compiled incrementally this time.
To summarize the process: 2, add CMakeLists and CPP files to the plugin1 folder (if there are more than one, add more CPP files) CMakeLists < test-plugin1 > add test-plugin1 declaration to CMakeLists < test-plugin1
3. Tune Xcode
I covered how to compile your own Clang in the previous article compiling your own LLVM and Clang. However, Xcode has its own default version of Clang, which our own projects cannot use directly, so we need to configure Xcode to make our own compiled Clang work.
We need to simulate normal APP development this time, so we need to create a new app project: TestApp.
Specify clang
By default, Xcode uses the built-in clang front end. The new version of Xcode has too many symbols stripped, so we need to add CC and CXX parameters to specify our own CLang address in the new Xcode.
If not specified, the following appearserror: unable to load plugin
Symbok not found
Similar error
Add CC and CXX absolute paths to the configuration file, namely the absolute path of clang++.
Note: Clang and Clang ++ are included in the LLVM compilation as mentioned in the previous article.
CC = /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/bin/clang
CXX = /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/bin/clang++
Copy the code
2, Disable Enable index-while-building Functionality
Index-While-Building
Originally used by Apple to optimize the code index, default open. Xcode will be indexed when it is compiled, but it will affect the compilation speed. Faster overall compilation speed after closing the 80s (Xcode will be changed back to the previous way, build code index in idle time). Because we use our own CLang and do not support compile-time indexing, the following error is reported
clang: error: unknown argument: '-index-store-path'
clang: error: cannot specify -o when generating multiple output files
Copy the code
So here we just have to set theta to zeroNo
Can be closed
Specify additional plug-ins that need to be loaded
Search in the configuration fileother c
Quick query
Add the following
-xclang-load plugin address (dylib address) -xclang-add-plugin -xclang plugin name // instance -xclang-load-xclang /Volumes/ExDisk/LLVM/llvm/llvm_xcode/Debug/lib/TestPlugin1.dylib -Xclang -add-plugin -Xclang TestPluginCopy the code
Note: Since xcode has a cache, after recompiling the plugin, xcode may still use the old version of the plugin (without the TestPlugin1 version). Since I don’t know how to clean the cache (clean does not work), I take the following approach: 1, change the address of the plugin to a wrong address, re-cmd +B 2, then change the correct address, you are clear.
4. Write plug-in code
The code part is the easiest part, just a little bit of syntax, specific apis. This part is not much narration, there are also remarks in the code, directly on the code.
#include <iostream> #include "clang/AST/AST.h" #include "clang/AST/ASTConsumer.h" #include "clang/ASTMatchers/ASTMatchers.h" #include "clang/ASTMatchers/ASTMatchFinder.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/Frontend/FrontendPluginRegistry.h" using namespace clang; using namespace std; using namespace llvm; using namespace clang::ast_matchers; namespace TestPlugin { class TestHandler : public MatchFinder::MatchCallback{ private: CompilerInstance &ci; public: TestHandler(CompilerInstance &ci) :ci(ci) {} // check whether it is a user source file bool isUserSourceCode(const string filename) {// the filename is not empty if (filename.empty()) return false; If (filename.find("/Applications/ xcode.app /") == 0) return false; if (filename.find("/Applications/ xcode.app /") == 0) return false; return true; Void run(const MatchFinder::MatchResult &Result) {// Check the class name (Interface), Cannot have an underscore if (const ObjCInterfaceDecl *decl = result.nodes.getNodeas <ObjCInterfaceDecl>("ObjCInterfaceDecl")) {string filename = ci.getSourceManager().getFilename(decl->getSourceRange().getBegin()).str(); if ( ! isUserSourceCode(filename) ) return; size_t pos = decl->getName().find('_'); if (pos ! = StringRef::npos) { DiagnosticsEngine &D = ci.getDiagnostics(); SourceLocation loc = decl->getLocation().getLocwithoffset (pos); D.R eport (loc, D.g etCustomDiagID (DiagnosticsEngine: : Warning, "TestPlugin: in the name of the class is not underlined")); }} // check variables (Interface), Cannot have an underscore if (const VarDecl *decl = result.nodes.getNodeas <VarDecl>("VarDecl")) {string filename = ci.getSourceManager().getFilename(decl->getSourceRange().getBegin()).str(); if ( ! isUserSourceCode(filename) ) return; size_t pos = decl->getName().find('_'); if (pos ! = StringRef::npos && pos ! = 0) { DiagnosticsEngine &D = ci.getDiagnostics(); SourceLocation loc = decl->getLocation().getLocWithOffset(pos); D.R eport (loc, D.g etCustomDiagID (DiagnosticsEngine: : Warning, "named TestPlugin2: please use the hump, does not recommend the use of the underline")); }}}}; Class TestASTConsumer: public ASTConsumer{private: MatchFinder matcher; TestHandler handler; public: TestASTConsumer(CompilerInstance &ci) :handler(ci) { matcher.addMatcher(objcInterfaceDecl().bind("ObjCInterfaceDecl"), &handler); matcher.addMatcher(varDecl().bind("VarDecl"), &handler); matcher.addMatcher(objcMethodDecl().bind("ObjCMethodDecl"), &handler); } void HandleTranslationUnit(ASTContext &Ctx) { printf("TestPlugin1: All ASTs has parsed."); DiagnosticsEngine &D = Ctx.getDiagnostics(); / / in the build log can see D.R eport (D.g etCustomDiagID (DiagnosticsEngine: : Warning, "TestPlugin Warning")); D.R eport (D.g etCustomDiagID (DiagnosticsEngine: : Error, "TestPlugin Error message")); matcher.matchAST(Ctx); }}; Class TestAction: public PluginASTAction{public: PluginASTAction unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef InFile){ return unique_ptr<TestASTConsumer> (new TestASTConsumer(CI)); } bool ParseArgs(const CompilerInstance &CI, const std::vector<std::string> &arg){ return true; }}; } / / tell the clang, registered a new plugin static FrontendPluginRegistry: : Add < TestPlugin: : TestAction > X (" TestPlugin." "Test a new Plugin"); // Test a new Plugin // Test a new Plugin // Test a new Plugin //Copy the code
The parts of the code are their own logic, for example, the core part above is getName, then find(‘_’).
5, summary
This time we are writing a clang plugin to check the code, so can we play code obtrusion next time?