preface

Recently, the project in the group has encountered a bottleneck problem: the code section exceeds the standard, which simply means that the output executable file after compilation is too large.

For iOS and tvOS apps, check that your app size fits within the App Store requirements.

Your app’s total uncompressed size must be less than 4GB. Each Mach-O executable file (for example, app_name.app/app_name) must not exceed these limits:

For apps whose MinimumOSVersion is less than 7.0: maximum of 80 MB for the total of all TEXT sections in the binary.

For apps whose MinimumOSVersion is 7.x through 8.x: maximum of 60 MB per slice for the TEXT section of each architecture slice in the binary.

For apps whose MinimumOSVersion is 9.0 or greater: maximum of 500 MB for the total of all __TEXT sections in the binary.

As you can see, iOS 9+ supports 500MB snippet size, while iOS 8.x only supports 60MB. Faced with increasing business code, we needed a way to remove obsolete code in a timely manner to reduce the size of code segments.

After trying unsuccessfully to analyze the LinkMap file, I found another route, which was to analyze the Clang AST to find methods in the syntax tree that were not explicitly called during static analysis. Because of the dynamic nature of oc, it may still be called during a dynamic phase even if it is not explicitly called during a static phase, but in any case, we can analyze the AST to get methods that are not statically called, and check and validate them.

Clang & LLVM

The knowledge about Clang and LLVM is far from complete in a few words, and I am not very familiar with this area. I would like to recommend a very in-depth article. There is also a brief introduction to LLVM video on YouTube for getting started. Of course, a Google search always turns up a lot of useful results.

To put it simply, Clang is the front end of the LLVM compiler. It compiles and optimizes C, C++, OC and other high-level languages, outputs IR to the LLVM compiler back end, and then further translates it into the underlying language of the corresponding platform.

Get Your Hands Dirty

Compile your Clang

As of this writing, XCode’s built-in Clang does not support plug-in loading, so if you want to use Clang plug-ins in real projects, you need to replace them with your own compiled Clang. After checking out each branch along the specified path, you can compile LLVM. Note that LLVM does not support “in place compilation”, so you need to open a separate folder as the build output file path. LLVM can be compiled in a variety of ways, but this article uses CMake, using the following directives

cmake -G Xcode -DCMAKE_BUILD_TYPE=Release -DCMAKE_OSX_ARCHITECTURES:STRING=x86_64 -DLLVM_TARGETS_TO_BUILD=host -DLLVM_INCLUDE_TESTS=OFF -DCLANG_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_UTILS=OFF -DLLVM_INCLUDE_DOCS=OFF -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_BUILD_EXTERNAL_COMPILER_RT=ON -DLIBCXX_INCLUDE_TESTS=OFF -compiler_rt_include_tests =OFF -compiler_rt_enable_ios =OFF < LLVM source folder path >Copy the code

Xcodeproj in the output directory and select ALL_BUILD scheme to compile. There will be an error related to COMPILer_rt. This will fail even though I have all LLVM repositories. It is not clear why, but it does not affect the development of the plug-in, so it is ignored.

Following this article, you can write your own Clang plug-in. My advice is to just get your hands dirty and run the Clang plugin, and read it quickly after Section 7. (There are some problems with the example code above, which requires changing MobCodeConsumer to MyPluginConsumer).

Abstract Syntax Tree (AST)

Now that you have successfully run your first Clang plug-in, let’s figure out how to use the Clang AST to analyze existing code. If you think back to what you learned in college, or just do a Google search, the explanation for AST looks something like this:

The syntax tree is the compiler’s “understanding” of the code we write, as shown in the figure above: x = a + b; Statement, the compiler splits the statement into left and right nodes by treating operator = as a node, and then continues parsing its children up to the leaf node. I think we can all easily write an AST for a basic operational expression, but the code we write in everyday business development is not always simple and basic expressions, such as


                                                        
- (void)viewDidLoad{
    [self doSomething];
}


                                                    Copy the code

What does the AST look like for such code? The good news is that Clang provides commands that allow us to print the AST that Clang outputs for a particular file compilation, starting with a simple CommandLine example project followed by the main function:

@interface HelloAST : NSObject @end @implementation HelloAST - (void)hello{ [self print:@"hello!"] ; } - (void)print:(NSString *)msg{ NSLog(@"%@",msg); } @endCopy the code

Then, enter the folder where main.m is located in Terminal and execute the following commands:


                                                        
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m


                                                    Copy the code

Let’s look after the import statement:

We can see a clear tree structure where we can see the nodes in the AST for our class definitions, method definitions, and method calls.

The first box is the class definition. You can see that the node name is ObjCInterfaceDecl and the type node is the objC class definition (declaration). The second box, named ObjCMethodDecl, indicates that the node defines an ObjC method (containing classes, instance methods, and common and protocol methods). The third box, ObjCMessageExpr, indicates that this object is a standard OBJC message-sending expression ([obj foo]).

These names correspond to classes defined in Clang, and they contain information that makes our analysis possible. The various classes of information provided by Clang can be further reviewed here.

We also see that when the function is defined, the ImplicitParamDecl node declares the implicit arguments self and _cmd, which is where the self keyword comes from. Looking at the top of the tree, we can see that the root node is the declaration of TranslationUnitDecl, and since Clang’s syntax tree analysis is based on a single file, this node will be the root node for all our analysis.

The preliminary analysis

In an OC program, almost all code can be divided into two categories: Decl (statement), Stmt (statement), the above every ObjCXXXDecl class is a subclass of Decl, ObjCXXXExpr Stmt subclass, according to a statement from the RecursiveASTVisitor method, we can see the corresponding entry methods: Bool VisitDecl (Decl *D) and bool VisitStmt (Stmt *S). For example, in RecusiveASTVisitor. We can see the following code:

//code #define DEF_TRAVERSE_DECL(DECL, CODE) \ template <typename Derived> \ bool RecursiveASTVisitor<Derived>::Traverse##DECL(DECL *D) { \ bool ShouldVisitChildren = true; \ bool ReturnValue = true; \ if (! getDerived().shouldTraversePostOrder()) \ TRY_TO(WalkUpFrom##DECL(D)); \ { CODE; } \ if (ReturnValue && ShouldVisitChildren) \ TRY_TO(TraverseDeclContextHelper(dyn_cast<DeclContext>(D))); \ if (ReturnValue && getDerived().shouldTraversePostOrder()) \ TRY_TO(WalkUpFrom##DECL(D)); \ return ReturnValue; \ } //code bool WalkUpFromDecl(Decl *D) { return getDerived().VisitDecl(D); } bool VisitDecl(Decl *D) { return true; } #define DECL(CLASS, BASE) \ bool WalkUpFrom##CLASS##Decl(CLASS##Decl *D) { \ TRY_TO(WalkUpFrom##BASE(D)); \ TRY_TO(Visit##CLASS##Decl(D)); \ return true; \ } \ bool Visit##CLASS##Decl(CLASS##Decl *D) { return true; }Copy the code

The macros above define various Visit methods with concrete class names and method names, and scroll up and down to see many of these definitions:


                                                        
DEF_TRAVERSE_DECL(ObjCInterfaceDecl, {
    ...
})
DEF_TRAVERSE_DECL(ObjCProtocolDecl, {// FIXME: implement
                                    })

DEF_TRAVERSE_DECL(ObjCMethodDecl, {
    ...
})


                                                    Copy the code

As you can see, if we want to analyze a particular XXXDecl class, we just need to implement VisitXXXDecl(XXXDecl *D), and VisitStmt can use a similar method to get the Clang callback. Now let’s take the edge off and type Warning in all class definitions and method calls:

//statement bool VisitObjCMessageExpr(ObjCMessageExpr *expr){ DiagnosticsEngine &D = Instance.getDiagnostics(); int diagID = D.getCustomDiagID(DiagnosticsEngine::Warning, "Meet Msg Expr : %0"); D.Report(expr->getLocStart(), diagID) << expr->getSelector().getAsString(); return true; } // Declaration bool VisitObjCMethodDecl(ObjCMethodDecl *decl){if (! isUserSourceCode(decl)){ return true; } DiagnosticsEngine &D = Instance.getDiagnostics(); int diagID = D.getCustomDiagID(DiagnosticsEngine::Warning, "Meet Method Decl : %0"); D.Report(decl->getLocStart(), diagID) << decl->getSelector().getAsString(); return true; } //helper bool isUserSourceCode (Decl *decl){ std::string filename = Instance.getSourceManager().getFilename(decl->getSourceRange().getBegin()).str(); if (filename.empty()) return false; // /Applications/Xcode.app/xxx if(filename.find("/Applications/Xcode.app/") == 0) return false; return true; }Copy the code

Compile and now you should see the warning we typed in the warning panel.

conclusion

Now that we’ve successfully written our first Clang plug-in, figured out what the Clang AST nodes mean, and plugged in Clang’s callback methods, in the next article we’ll explore how to check the validity of the method.

The resources

Clang LLVM iS A Clang plugin that uses Xcode to check the syntax of iOS