The code in this article is hosted on Github:Github.com/L-Zephyr/cl…

In the usual development often need to read to learn other people’s code, when you start reading a completely unfamiliar code, usually run into some trouble, because I have to first find the entry point and the logical link code logic to comb again, a code file is usually accompanied by a lot of method calls, this stage tend to be more pain, Because I have to spend a lot of time figuring out the relationship between these methods in order to generate a logical diagram in my mind. If we could automatically generate the Call Graph in the source code, it would be a great help to read the source code.

We need a tool that can automatically generate source method call diagrams, and that tool must be able to understand and analyze our code, which of course is best understood by the compiler. The front end we use to compile objective-C code is Clang, which provides a set of tools to help us analyze the source code, and we can build our own tools based on Clang. Before we get there, a few concepts:

Abstract syntax tree

Abstract Syntax Code (AST) is a tree representation of the syntactic structure of source Code, in which each node represents a structure in the source Code. AST plays a very important role in compilation. Clang analyzes the input source Code and generates the AST. LLVM IR(intermediate code) is then generated according to the AST.

We can use clang-check to look at the AST and create a code file called test.c

int square(int num) {
	return num * num;
}

int main(a) {
	int result = square(2);
}
Copy the code

Run the clang-check-ast-dump test.m command on the terminal, and you can view the ast structure after conversion:

|-FunctionDecl 0x7fa933840e00 </Users/lzephyr/Desktop/test.c:1:1, line:3:1> line:1:5 used square 'int (int)'
| |-ParmVarDecl 0x7fa93302f720 <col:12, col:16> col:16 used num 'int'
| `-CompoundStmt 0x7fa933840fa0 <col:21, line:3:1>
|   `-ReturnStmt 0x7fa933840f88 <line:2:2, col:15>
|     `-BinaryOperator 0x7fa933840f60 <col:9, col:15> 'int' The '*'
|       |-ImplicitCastExpr 0x7fa933840f30 <col:9> 'int' <LValueToRValue>
|       | `-DeclRefExpr 0x7fa933840ee0 <col:9> 'int' lvalue ParmVar 0x7fa93302f720 'num' 'int'
|       `-ImplicitCastExpr 0x7fa933840f48 <col:15> 'int' <LValueToRValue>
|         `-DeclRefExpr 0x7fa933840f08 <col:15> 'int' lvalue ParmVar 0x7fa93302f720 'num' 'int'
`-FunctionDecl 0x7fa933841010 <line:5:1, line:7:1> line:5:5 main 'int ()'
  `-CompoundStmt 0x7fa9338411f8 <col:12, line:7:1>
    `-DeclStmt 0x7fa9338411e0 <line:6:2, col:24>
      `-VarDecl 0x7fa9338410c0 <col:2, col:23> col:6 result 'int' cinit
        `-CallExpr 0x7fa9338411b0 <col:15, col:23> 'int'
          |-ImplicitCastExpr 0x7fa933841198 <col:15> 'int (*)(int)' <FunctionToPointerDecay>
          | `-DeclRefExpr 0x7fa933841120 <col:15> 'int (int)' Function 0x7fa933840e00 'square' 'int (int)'
          `-IntegerLiteral 0x7fa933841148 <col:22> 'int' 2
Copy the code

LibTooling is a library that provides access to and modification of AST. LibTooling can be used to write a program that can be run independently, as in Clang -check, which was written above. LibTooling provides a number of convenient methods to access the syntax tree.

The Clang Plugin is similar to LibTooling in that it has complete control over the AST, except that the Clang Plugin is injected into the compilation process as a plug-in and can be embedded in xCode. In fact, a standalone tool written using LibTooling was converted into a Clang Plugin with only minor changes.

To get the call relationships between functions, we must analyze the AST. Clang provides two methods: ASTMatchers and RecursiveASTVisitor.

###ASTMatchers ASTMatchers provides a series of functions to write matching expressions in DSL style to find the nodes we are interested in and bind to the specified names using the bind method:

StatementMatcher matcher = callExpr(hasAncestor(functionDecl().bind("caller")), 
                                    callee(functionDecl().bind("callee")));
Copy the code

The above expression matches the call of a normal C function in the source code and binds the caller to the string “caller” and the called to the string “callee”. Objects of type FunctionDecl can then be obtained in the callback method with the names Caller and callee:

class FindFuncCall : public MatchFinder::MatchCallback { public : Virtual void run(const MatchFinder::MatchResult &Result) {// Get the caller's function definition if (const FunctionDecl *caller = Result.Nodes.getNodeAs<clang::FunctionDecl>("caller")) { caller->dump(); } // Get the function definition of the called if (const FunctionDecl *callee = result.nodes.getNodeas <clang::FunctionDecl>("callee")) {const FunctionDecl *callee = result.nodes.getNodeAs <clang::FunctionDecl>("callee")) { callee->dump(); }}}; int main(int argv, const char **argv) { StatementMatcher matcher = callExpr(hasAncestor(functionDecl().bind("caller")), callee(functionDecl().bind("callee"))); MatchFinder finder; FindFuncCall callback; finder.addMatcher(matcher, &callback); // Execute Matcher CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList()); Tool.run(newFrontendActionFactory(&finder).get()); return 0; }Copy the code

Each function in the above matching expression (such as callExpr) is called a Matcher, and all matchers can be grouped into three categories:

  • Node Matchers: The core of a matching expression, used to match all nodes of a specific type. All matching expressions are composed of oneNode MatcherTo begin with, and only afterNode MatcherCan be calledbindMethods.Node MathcherIt can contain any number of arguments and pass in other matchers to manipulate the matched node, but note that all matchers passed in as arguments will apply to the same matched node. For example:
    DeclarationMatcher matcher = recordDecl(cxxRecordDecl().bind("class"),
    										hasName("MyClass"));
    Copy the code

    The meaning of this matcher is to find a c++ class named “MyClass”,recordDeclIs aNode Matcher, matching all class, struct, and union definitions;hasNameMatches the node with the name “MyClass”;cxxRecordDeclMatches a node defined by a C++ class and binds it to the string “class”.

  • Narrowing Matchers: As the name suggests, this Matcher provides conditional judgment to narrow a match, as in the second examplehasNameIs aNarrowing Matcher, matches only nodes whose name is “MyClass”.
  • Traversal Matchers: Uses the currently matched node as the starting point to limit the scope of the matching expression search. As in the first examplehasAncestorIn the ancestor node of the current node for the next matching.

RecursiveASTVisitor is another way to access AST provided by Clang. It is very simple to use. You need to define three classes, Inherited from ASTFrontendAction, ASTConsumer, and RecursiveASTVisitor, respectively. Return a custom MyConsumer instance in the custom MyFrontendAction

class MyFrontendAction : public clang::ASTFrontendAction {
public:
    virtual std: :unique_ptr<clang::ASTConsumer> CreateASTConsumer(
      clang::CompilerInstance &Compiler, llvm::StringRef InFile) {
      return std: :unique_ptr<clang::ASTConsumer>(newMyConsumer); }};Copy the code

After the AST is parsed, MyConsumer’s HandleTranslationUnit method is called. TranslationUnitDecl is the root node of an AST. ASTContext holds all AST information. Get TranslationUnitDecl and hand it over to MyVisitor, where most of our operations are done

class MyConsumer : public clang::ASTConsumer {
public:
    virtual void HandleTranslationUnit(clang::ASTContext &Context) {
      Visitor.TraverseDecl(Context.getTranslationUnitDecl());
    }
private:
  	MyVisitor Visitor;
};
Copy the code

To access the node of interest in that Visitor, I simply override the Visit method on that type of node. For example, if I want to access all of the C++ class definitions in my code, I just override VisitCXXRecordDecl to access all of the C++ class definitions

class MyVisitor : public RecursiveASTVisitor<FindNamedClassVisitor> {
public:
  	bool VisitCXXRecordDecl(CXXRecordDecl *decl) {
    	decl->dump();
    	return true; // Return true to continue traversal, false to stop}};Copy the code

Then create the ToolAction using newFrontendActionFactory in the main function:

Tool.run(newFrontendActionFactory<CallGraphAction>().get());
Copy the code

Clang source code in the Analysis folder provided a class named CallGraph, reference this source code implementation to write my own CallGraph tool. The core part mainly consists of three classes: CallGraph, CallGraphNode and CGBuilder:

  • CallGraph: inherited fromRecursiveASTVisitorTo realizeVisitFunctionDeclandVisitObjCMethodDeclMethod to iterate over all C functions and Objective-C methods:
    bool VisitObjCMethodDecl(ObjCMethodDecl *MD) {
        if (isInSystem(MD)) { // Ignore the definition in the system library
            return true;
        }
    
        if (canBeCallerInGraph(MD)) {
            addRootNode(MD); // Add a Node to Roots
        }
        return true;
    }
    Copy the code

    inaddRootNodeEncapsulate it intoCallGraphNodeObject and stored in a member object of type MapRootsIn the. Then get the function body (CompoundStmtType) and pass it toCGBuilderFind the method called in the function body.

    void CallGraph::addRootNode(Decl *decl) {
      CallGraphNode *Node = getOrInsertNode(decl); // Encapsulate decl as Node and add it to Roots
      
      // Initializes CGBuilder to iterate over all method calls in the function
      CGBuilder builder(this, Node, Context);
      if (Stmt *Body = decl->getBody())
          builder.Visit(Body);
    }
    Copy the code
  • CallGraphNode: Encapsulates oneDeclType (defined by a C function or an OC method) that represents an AST node. All other functions called by this function are added to the vector member variablesCalledFunctionsIn the.
    class CallGraphNode {
    private:
        // C function or OC method definition
        Decl *decl;
        // Save all nodes called by decl
        SmallVector<CallGraphNode*, 5> CalledFunctions; .Copy the code
  • CGBuilder: inherited fromStmtVisitor, obtains a CallerNode during initialization, traverses the function body of the corresponding CallerNode, and finds the method call in the function body:CallExprandObjCMessageExpr.CallExprRepresents a normal C function call,ObjCMessageExprRepresents objective-C method calls. Gets the definition of the called function and wraps it intoCallGraphNodeType, and then add it to theCalledFunctionsIn the.
    class CGBuilder : public StmtVisitor<CGBuilder> {
      CallGraph *G;
      CallGraphNode *CallerNode;
      ASTContext &Context;
    public:
      void VisitObjCMessageExpr(ObjCMessageExpr *ME) {
          // Get the Decl of the called method from ObjCMessageExpr
          Decl *decl = ...
          
          // Wrap decl in CallGraphNode and add it to CallerNode's CalledFunctionsaddCalledDecl(decl); }...Copy the code

At present, only a basic version has been implemented, supporting C and ObjeciVE-C, realizing the most basic functions, the code is relatively simple, and will continue to optimize and add new functions, all the code has been hosted on Github: https://github.com/L-Zephyr/clang-mapper

Use # #

You can download and compile the source code yourself, or just use the pre-compiled binary clang-mapper in the Release folder (compiled with Clang5.0.0). Since Graphviz is used to generate the call diagram, make sure you have Graphviz installed correctly before you run it

How to compile a tool written using LibTooling was tolled in Clang

  1. Start by downloading the source code for LLVM and Clang.

  2. Copy the clang-mapper folder to LLVM /tools/clang/tools/.

  3. Edit the file LLVM/tools/clang/tools/CMakeLists. TXT, at the last added add_clang_subdirectory (clang – mapper)

  4. External compilation is recommended. Create a build folder in the directory containing the LLVM folder and compile the source code in the build directory

    $ mkdir build
    $ cd build
    $ cmake -G 'Unix Makefiles' ../llvm
    $ make
    Copy the code

    You can also use Ninja as described in the documentation, which will generate 20 + gb intermediate files during the compilation process. After the compilation, you can find the clang-mapper file in the build/bin/ directory, and copy it to /usr/local/bin

### Basically use any number of files or folders passed in, clang-mapper will automatically process all files and generate the function call graph in the current command path, with the code file name to distinguish. Below, we use Clang-Mapper to analyze the core code of the famous AFNetworking. I didn’t want to mix the results of the analysis with the source files, so I created a folder called CallGraph and called it in there

$ cd ./AFNetworking-master
$ mkdir CallGraph
$ cd./CallGraph $ clang-mapper .. /AFNetworking --Copy the code

Then the program will automatically analyze.. All code files under /AFNetworking and generate corresponding PNG files under CallGraph:

Clang-mapper provides several optional command-line arguments

  • -graph-only: generates only PNG files and does not retain dot files. This is the default option
  • -dot-only: only dot files are generated, not PNG files
  • -dot-graph: generates dot and PNG files at the same time
  • -ignore-header: In iOS development, header files are usually used only for declarations. With this option,. H files in the folder can be ignored

The resources

  • https://clang.llvm.org/docs/LibASTMatchersTutorial.html
  • https://clang.llvm.org/docs/RAVFrontendAction.html
  • https://clang.llvm.org/docs/LibASTMatchersReference.html
  • https://clang.llvm.org/docs/IntroductionToTheClangAST.html