preface

Years is really a pig farm, in recent years, people fat, wechat code also turned. I remember that when I transferred to wechat in 2014, it only took me ten minutes to compile the wechat project with my laptop. Today’s company-issued 17-year 27-inch iMac takes nearly half an hour to compile; Accidentally update the code, and inexplicably need a new compilation. With such low compilation efficiency, the development mood suffers severely. So at the beginning of the year, I asked for instructions to optimize the efficiency of wechat compilation, and the top also agreed.

Existing programs

Before starting, first search the existing scheme, there are probably these optimization points:

1. Optimize engineering configuration

1. Change Debug Information Format to DWARF

Debug does not need to generate symbol tables, you can check that projects (especially open source libraries) are set up correctly.

2, change Build Active Architecture Only to Yes

Debug does not need to generate the entire architecture, you can check that projects (especially open source libraries) are set up correctly.

3. Optimize the header search path

Avoid Header Search Paths set with recursive references:

When Xcode compiles the source file, it automatically adds the -i parameter according to Header Search Paths. If the number of recursively referenced Paths is larger, the -i parameter is larger, and the compiler is less efficient in preprocessing Header files. Therefore, it is not easy to set path recursion references. Framework Search Paths does the same thing.

Use CocoaPods to manage third-party libraries

This is a common practice in the industry. Use the cocoapods plug-in cocoapods-Packager to package any pod into a Static Library, saving the time of repeated compilation. But the disadvantage is that it is not convenient to debug the source code, if the library code is repeatedly modified, you need to regenerate the binary and upload to the internal server, and so on.

Third, CCache

CCache is a tool that can cache compiled intermediates without requiring much modification of the project configuration or development tool chain. Xcode 9 has an occasional bug that often triggers new compilations without any changes to the source code, and CCache is a good solution to this problem. However, with Xcode 10 fixing the full compilation issue, this solution was phased out.

Fourth, the distcc

Distcc is a distributed compilation tool that distributes local compilation tasks to multiple machines on the network. After the other machines complete the compilation, the product is sent back to the local machine for execution.

Five, hardware solution

Like putting Derived Data directories on virtual disks created from memory, or buying the latest iMac Pro…

Practice process

Optimize compilation options

1. Optimize the header search path

After removing some of the recursive reference paths, the overall compilation speed is 20 seconds faster.

2, Disable Enable index-while-building Functionality

This option was accidentally found (new feature in Xcode 9?). , which is enabled by default. Xcode will be indexed when it is compiled, but will affect the compilation speed. Faster overall compilation speed after closing the 80s (Xcode will be changed back to the previous way, build code index in idle time).

Kinda optimization

Kinda introduced a cross-platform framework for payments (C++) this year, but compilation was extremely slow, taking 30s to compile from a source file. In addition, the generated binary size occupies a relatively high proportion in App, which feels that there are a lot of redundant codes. Theoretically, reducing redundant codes can also speed up compilation. After analyzing LinkMap files and using Xcode Preprocess some source files, we found the following problems:

  • The proto file generates more code

  • A base class/macro uses a lot of templates

For problem one, you can set the proto file option to Optimize_for =CODE_SIZE to cause the Protobuf compiler to generate condensed code. But I did it with my own tools (see here for details), with less code.

For problem two, since the template is polymorphic at compile time (increasing code bloat and compile time), you can change the template base class to a virtual base class at run time. It is also recommended to use hyper_function instead of STD ::function to make the base class use a generic function pointer, which can store any lambda callback function, thus avoiding base class templatization. Such as:

template <typename Request, typename Response>class BaseCgi {public: BaseCgi(Request request, std::function<void(Response &)> &callback) { _request = request; _callback = callback; } void onRequest(std::vector<uint8_t> &outData) { _request.toData(outData); } void onResponse(std::vector<uint8_t> &inData) { Response response; response.fromData(inData); callback(response); }public: Request _request; std::function<void(Response &)> _callback; }; class CgiA : public BaseCgi<RequestA, ResponseA> {public: CgiA(RequestA &request, std::function<void(ResponseA &)> &callback) : BaseCgi(request, callback) {}};Copy the code

Can be changed to:

class BaseRequest {public: virtual void toData(std::vector<uint8_t> &outData) = 0; }; class BaseResponse {public: virtual void fromData(std::vector<uint8_t> &outData) = 0; }; class BaseCgi {public: template <typename Request, typename Response> BaseCgi(Request &request, hyper_function<void(Response &)> callback) { _request = new Request(request); _response = new Response; _callback = callback; } void onRequest(std::vector<uint8_t> &outData) { _request->toData(outData); } void onResponse(std::vector<uint8_t> &inData) { _response->fromData(inData); _callback(*_response); } public: BaseRequest *_request; BaseResponse *_response; hyper_function<void(BaseResponse &)> _callback; }; class RequestA : public BaseRequest { ... }; class ResponseA : public BaseResponse { ... }; class CgiA : public BaseCgi {public: CgiA(RequestA &request, hyper_function<void(ResponseA &)> &callback) : BaseCgi(request, callback) {}};Copy the code

BaseCgi has changed from a template base class to one where only constructors are templates. The onRequest and onResponse logic code is not “copy-pasted” because of the base template instantiation. The above optimizations resulted in an overall 70s faster compilation and a 60% reduction in kinda binaries.

Use PCH precompiled header files

Precompile Prefix Header File (PCH) is a precompiled Header File whose contents can be accessed by all other source files in the project. Usually put some general macro and header files, easy to write code, improve efficiency. In addition, after the PCH file is precompiled, the compiling speed of subsequent source files using the PCH file is also accelerated. The disadvantage is that any changes to the contents of the PCH file and the header file referenced by the PCH cause all source files referenced to the PCH to be recompiled. So use it with caution. Prefix Header = Precompile Prefix Header = Precompile Prefix Header = Precompile Prefix Header

After wechat uses PCH precompilation, the compilation speed is greatly improved, which is nearly 280s faster.

The ultimate optimization

Through the above optimization, the compilation time of wechat project is reduced from the original 1,626.4s to 1,182.8s, which is nearly 450s faster, but it still needs 20 minutes, which is unsatisfactory. If you’re going to optimize, you have to start with the compiler. As we usually do client performance optimization, before optimization, first analyze the principle, output the time of each place, and optimize the time according to the time.

First, compilation principle

A compiler is a program that converts one language (usually a high-level language) into another (usually a low-level language). Most compilers consist of three parts:

  • Frontend: Parses source code, checks for errors, generates an abstract syntax tree (AST), and converts the AST into assembly intermediate code

  • Optimizer: Architecture-independent optimizations of intermediate code to improve efficiency and reduce code volume, such as removing invalid if (0) branches

  • Backend: Translates intermediate code into machine code for the target platform

LLVM implements a more general compilation framework, providing a series of modular compiler components and toolchains. First, it defines an LLVM IR (Intermediate Representation code). Frontend converts the original language to LLVM IR; LLVM Optimizer optimizes LLVM IR; Backend converts LLVM IR to the machine language of the target platform. As a result, new compilers can be created for new languages or platforms as long as Frontend and Backend are implemented.

In Xcode, the compiler for C/C++/ObjC is Clang (front end) +LLVM (back end), Clang for short. The Clang compilation process has several stages:

➜ clang-ccC-print-Phases main.m0: input, "main.m", Objective-c1: preprocessor, {0}, Objective-c-Cpp-output2: compiler, {1}, ir3: backend, {2}, assembler4: assembler, {3}, object5: linker, {4}, image6: bind-arch, "x86_64", {5}, imageCopy the code

1. Pretreatment

This phase involves header file import, macro expansion/replacement, precompiled instruction processing, and comment removal.

2, compile,

At this stage, I did a lot of things, mainly including:

  • Lexical Analysis: Convert code into a series of tokens such as large and small parentheses paren'()’ square'[]’ brace'{}’, identifier, string string_literal, numeric constant numeric_constant, and so on

  • Semantic Analysis: The Token stream is formed into an abstract syntax tree AST

  • Static Analysis: Checks for code errors, such as incorrect parameter types and implementation of calling object methods

  • Intermediate Code Generation: Translate syntax tree from top to bottom into LLVM IR step by step

Generate assembly code

LLVM IR generates assembly code for the current platform, during which LLVM makes the corresponding Optimization (Optimize) according to the Optimization Level set by the compilation. For example, Debug -o0 does not need to be optimized. Release’s -OS is designed to optimize code efficiency and reduce volume as much as possible.

4. Generate the target file

The Assembler converts assembly code to machine code by creating an object object file ending in.o.

5, links,

The Linker links together several object files to produce an executable.

Second, analysis time

The Clang/LLVM compiler is open source and can be downloaded from the official website to generate a customized compiler based on the above compilation process. One week before he was ready to do it himself, Aras Pranckevičius had submitted the rL357340 modification in the LLVM project: Clang added the -ftime-trace option to generate a Time report in Chrome (Chrome ://tracing) format during compilation, listing the time of all phases. The effect is as follows:

  • Overall compilation (ExecuteCompiler) took 8,423.8ms

  • Front-end (Frontend) takes 5,307.9ms and back-end (Backend) takes 3,009.6ms

  • While the front-end compilation of the files in SourceA takes xx ms, B takes xx ms…

  • Parse ClassA (xx ms), B (XX ms)…

  • , etc.

That’s the time report I want! Alter project CC={YOUR PATH}/clang to make Xcode compile using its own compiler; At the same time, -ftime-trace is added to the OTHER_CFLAGS compilation option to output a time report for each source file after compilation. Finally, all reports are aggregated to form the overall compilation time:

As can be seen from the overall time, the compiler Frontend takes 7,659.2s, accounting for 87% of the total time. While Source processing takes 7,146.2s for front-end processing, accounting for 71.9% of the total! Guess header file nesting serious, each source file to introduce dozens or even hundreds of head files, each header file source to do preprocessing, lexical analysis, grammar analysis and so on. In fact, the source file does not need to use some of the definitions in the header file (class, function), which is why the compilation time is so long.

So I wrote a tool that counted the number of headers referenced, the total processing time, the group of headers (refers to the collection of all child headers referenced by a time-consuming top header file), and compiled a table (truncated from Top10) :

Header1 processing time 1187.7s, cited 2,304 times; Header2 took 1,124.9s to process and was cited 3,831 times; Header3 to 10 are referenced by Header1. So try to optimize the header references in the TopN header file to include as few other header files as possible.

3. Time consuming

Usually when we write code, if we need a class, we just include the header that the class is declared in, but in header files, we can use the prefix declaration. So the idea of optimizing header files is simple: if you can use pre-declarations, use pre-declarations instead of include. In fact, the changes were so big that VAkeee and I, another colleague in the group, optimized Header1 and Header2. It took me 5 working days to complete the changes. The effect was to reduce the overall compilation time by 80 seconds.

But there are dozens of header files that need to be optimized, and we can’t continue to do this kind of manual work. So we can make tools that use the AST to find the identifiers (types, functions, macros) that appear in the code and the file in which they are defined, and then analyze whether we need to include the file in which they are defined.

Let’s take a look at how the code converts the AST, as follows:

// HeaderA.hstruct StructA { int val; }; // HeaderB.hstruct StructB { int val; }; // main.c#include "HeaderA.h"#include "HeaderB.h"int testAndReturn(struct StructA *a, struct StructB *b) { return a->val; }Copy the code

Console input:

➜ TestContainer clang Xclang - ast - dump - fsyntax - only. The main cTranslationUnitDecl 0 x7f8f36834208 < < invalid sloc > > < invalid  sloc>|-RecordDecl 0x7faa62831d78 <./HeaderA.h:12:1, line:14:1> line:12:8 struct StructA definition| `-FieldDecl 0x7faa6383da38 <line:13:2, col:6> col:6 referenced val 'int'|-RecordDecl 0x7faa6383da80 <./HeaderB.h:12:1, line:14:1> line:12:8 struct StructB definition| `-FieldDecl 0x7faa6383db38 <line:13:2, col:6> col:6 val 'int'`-FunctionDecl 0x7faa6383de50 <main.c:35:1, line:37:1> line:35:5 testAndReturn 'int (struct StructA *, struct StructB *)' |-ParmVarDecl 0x7faa6383dc30 <col:19, col:35> col:35 used a 'struct StructA *' |-ParmVarDecl 0x7faa6383dd40 <col:38, col:54> col:54 b 'struct StructB *' `-CompoundStmt 0x7faa6383dfc8 <col:57, line:37:1> `-ReturnStmt 0x7faa6383dfb8 <line:36:2, col:12> `-ImplicitCastExpr 0x7faa6383dfa0 <col:9, col:12> 'int' <LValueToRValue> `-MemberExpr 0x7faa6383df70 <col:9, col:12> 'int' lvalue ->val 0x7faa6383da38 `-ImplicitCastExpr 0x7faa6383df58 <col:9> 'struct StructA *' <LValueToRValue> `-DeclRefExpr 0x7faa6383df38 <col:9> 'struct StructA *' lvalue ParmVar 0x7faa6383dc30 'a' 'struct StructA *'Copy the code

As you can see, each line contains the AST Node type, location (file name, line number, column number), and Node description. Classes defined by header files are also included in the AST. The common types of AST Node are Decl (for example, RecordDecl structure definition, FunctionDecl function definition) and Stmt (for example, CompoundStmt function body in parentheses).

Clang AST has three important base classes: ASTFrontendAction, ASTConsumer, and RecursiveASTVisitor. The ClangTool class reads the command line configuration item and initializes CompilerInstance. CompilerInstance ExcutionAction calls the ASTFrontendAction member functions BeginSourceFile (ready to walk the AST) and Execute (parse) AST), EndSourceFileAction (end traversal). ASTFrontendAction has an important pure virtual function CreateASTConsumer (called by its own BeginSourceFile), which returns the ASTConsumer object that reads the AST:

class MyFrontendAction : public clang::ASTFrontendAction {public: virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(clang::CompilerInstance &CI, llvm::StringRef file) override { TheRewriter.setSourceMgr(CI.getASTContext().getSourceManager(), CI.getASTContext().getLangOpts()); return llvm::make_unique<MyASTConsumer>(&CI); }}; int main(int argc, const char **argv) { clang::tooling::CommonOptionsParser op(argc, argv, OptsCategory); clang::tooling::ClangTool Tool(op.getCompilations(), op.getSourcePathList()); int result = Tool.run(clang::tooling::newFrontendActionFactory<MyFrontendAction>().get()); return result; }Copy the code

ASTConsumer has several methods that it can override to receive callbacks during AST parsing, one of which is the HandleTranslationUnit method used by the tool. When the AST of TranslationUnit is fully resolved, HandleTranslationUnit is called back. We use the RecursiveASTVisitor object in HandleTranslationUnit to iterate over all the AST nodes depth-first:

class MyASTVisitor: public clang::RecursiveASTVisitor<MyASTVisitor> {public: explicit MyASTVisitor(clang::ASTContext *Ctx) {} bool VisitFunctionDecl(clang::FunctionDecl* decl) { // FunctionDecl // StructA, StructB return true; } bool visitExpr (clang::MemberExpr* expr) {// StructA return true; } bool VisitXXX(XXX) { return true; } StructA (StructA, StructA, StructA, StructA, StructA, StructA, StructA); StructB can be predeclared}; class MyASTConsumer : public clang::ASTConsumer {private: MyASTVisitor Visitor; public: explicit MyASTConsumer(clang::CompilerInstance *aCI) : Visitor(&(aCI->getASTContext())) {} void HandleTranslationUnit(clang::ASTContext &context) override { clang::TranslationUnitDecl *decl = context.getTranslationUnitDecl(); Visitor.TraverseTranslationUnitDecl(decl); }};Copy the code

The tool framework is roughly as shown above. Clang libTooling tool Include-what -you-use was used to update C/C++ header files, following the following pattern:

➜ include-what-you-use main.cheadera. h has correct #includes/fwd-decls) headerb. h has correct #includes/ FWd-decls)main.c should add these lines:struct StructB; main.c should remove these lines:- #include "HeaderB.h" // lines 2-2The full include-list for main.c:#include "HeaderA.h" // for StructAstruct StructB;Copy the code

We added ObjC support to IWYU and enhanced its logic to make the results look better. (Usually, after IWYU processing, there are a lot of header files and predeclarations, so we prune them to get rid of the superfluous header files and predeclarations. There is not enough space to explain.)

After the introduction of wechat source code through tool optimization header file, the overall compilation time was reduced to 710s. In addition, the reduction of header dependencies also reduces the possibility of large-scale source rewrites due to modification of header files. We then use the compile time analysis tool to analyze the current bottleneck:

WCDB headers take too long to process, business code (such as Model classes) does not isolate WCDB code well, exposes WINQ, and passively includes WCDB headers. There are many solutions, such as introducing wcDB-dependent category headers (XXModel+ wcdb.h) or, as in other libraries, placing

in PCH.

Finally, the compilation time was optimized to less than 540s, which was one third of the original, and the compilation efficiency was greatly improved.

Optimize the summary

Summarize the compilation and optimization scheme of wechat:

  • A. Optimize the header file search path

  • B, disable Enable index-while-building Functionality

  • C. Optimize PB/ template to reduce redundant code

  • D. Precompile with PCH

  • E. Use tools to optimize header file import; Try to avoid header files containing the C++ standard library

future

Looking forward to blue Shield distributed compilation for ObjC; Also, you can modularize your business code and load project files in modules. Kinda/applet/Mars is currently a good practice.

reference

  • How can I make an iOS project compile five times faster

  • An in-depth look at Clang/LLVM compilation for iOS

  • Clang syntax Abstract syntax tree AST

  • time-trace: timeline / flame chart profiler for Clang

  • Introduction to the Clang AST