Start the

The process of launching is usually from the time the user clicks on the app icon until the didFinishLaunching method of AppDelegate is complete, with both cold and hot launching.

  • Cold boot means that the memory does not contain relevant memory data and must be loaded from disk to memory. This process is called cold boot.
    • Killing an app does not necessarily put it into cold boot. It’s the system, or when memory is overwritten. Cold startup can usually be achieved by restarting the phone
  • Warm start: startup when data still exists after the app process has been killed

The startup optimization mentioned here generally refers to the case of cold startup, which is mainly divided into two parts:

  • T1pre-mainThe stage, before main, is when the operating system loads the App executable into memory, performing a series of loads and links, and so onDyld loading process
  • T2After main, that is, from main to Appdelegate’sdidFinishLaunchingUntil the method is executed, the main task is to build the first interface and finish rendering

Therefore, the process of T1+T2 is the process from the user clicking the App icon to the user seeing the main interface of the App, that is, the part that needs to be optimized.

Pre-main phase optimization

You have learned about the dyLD loading process in OC Underlying Principles 09: DYLD loading Process. The startup time of the pre-main phase is actually the time of the dyLD loading process.

For the main function before the startup time, Apple provides a built-in measurement method, inEdit Scheme -> Run -> Arguments ->Environment VariablesClick + to add environment variablesDYLD_PRINT_STATISTICSSet to1), and then run. The following is the pre-main time of iPhone7p normal startup (take WeChat as an example)

The pre-main phase takes 1.7 seconds

  • Dylib loading time: it takes 320.32ms to load the dynamic library

  • Rebase /binding time (offset correction/symbol binding time), 160.52ms

    • Rebase (offset correction): The binary file generated by any app has an address for all methods and function calls inside the binary fileThe offset address in the current binary file. Once it is run time (that is, in memory), the system will run each timeAssign an ASLR (Address Space Layout Randomization) Address valueFor example, if the binary file has a test method, the offset is 0x0001, and the value is randomly assignedASLRIs 0x1f00. If you want to access the test method, its memory address (the real address) changes toASLR+ offset = memory address determined at runtime(i.e. 0x1F00 +0x0001 = 0x1F01)
    • Binding: such asNSLogMethod, which creates a symbol in the Mach-O file generated at compile time! NSLog(currently pointing to a random address), and then at run time (loading from disk into memory as a mirror file), will give the real address to the symbol (that is, bind the address to the symbol in memory)dyldMade of, also calledDynamic library symbol binding), in a word:Binding is the process of assigning values to symbols
  • ObjC Setup Time (time required for OC class registration) : The more OC classes, the more time required

  • Initializer Time (time taken to execute load and constructor)

For these, there are the following optimization suggestions:

  • As far as possibleUse less external dynamic librariesApple officially recommends custom dynamic libraries as the bestNo more than six, if more than 6, yesmergeThe dynamic library
  • Reduce OC classes, because the more OC classes, the more time consuming.
  • Will not have to be in+loadMethod to do things deferred to+initializeIn, try not to use C++ virtual functions
  • If it’s Swift, try to use itstruct

Optimization after main

In the didFinishLaunching method after main, there’s basically all sorts of stuff going on, much of it not necessarily going on right away, and it’s something we can lazily load so as not to affect startup time.

There are three main types of business in didFinishLaunching

  • [First type] Initialize third-party SDKS
  • [Second type] Configuration of APP running environment
  • [Third class] initialization of their own tool classes, etc

The optimization suggestions for the main function stage are as follows:

  • Reduce the process of initiating initialization, can lazy load lazy load, can delay delay, can put in the background initialization put in the background, try not to occupy the main thread startup time
  • Optimize the code logic,Remove all necessary code logicTo reduce the time consumed by each process
  • Start-up energyUsing multiple threadsTo initialize, use multithreading
  • As far as possibleUse pure codeTo build the UI framework, especially the main UI framework, such as UITabBarController. Try to avoid using XIBs or SB, which are more time consuming than pure code
  • Delete obsolete classes and methods

Next, I would like to focus on an optimization scheme in the pre-main stage, namely binary rearrangement. This scheme was first introduced due to the research and development practice of Douyin in this article: The solution based on binary file rearrangement became popular by increasing the APP startup speed by more than 15%.

Binary rearrangement principle

In the virtual memory section, we know that when a process accesses a virtual memory page where the corresponding physical memory does not exist, a Page Fault is triggered, thus blocking the process. At this point, the data needs to be loaded into physical memory and then accessed again. This has some impact on performance.

Based on Page Fault, we think that in the process of cold startup of App, there will be a large number of classes, categories, and third parties that need to be loaded and executed, and the resulting Page Fault will take a lot of time. Taking WeChat as an example, let’s take a look at the number of Page faults in the startup stage

  • CMD+iShortcut key, selectSystem Trace

  • Click Start (you need to restart the phone and clear the cache data before starting), stop the first interface, and follow the operation in the following figure

    It can be seen from the figure that WeChat has occurred PageFault 2800+ times. It can be imagined that this isVery bad performance.

Let’s create a demo project of our own and check the order of methods at compile time. Define the following methods in the following order in ViewController:

@implementation ViewController
void test(){
    block1();
}

int test1(){
    return 0;
}

void(^block1)(void) = ^(void){
    
};


- (void)viewDidLoad {
    [super viewDidLoad];
    
    test();
}

+(void)load
{
    [SwiftTest swiftTest];
}
@end
Copy the code
  • inBuild Setting -> Write Link Map FileSet toYES

  • CMD+B Compile the demo, and then search the LinkMap file in the corresponding path as shown below. You can see that the loading order of the functions in the class is from top to bottom, and the loading order of the files is based on the order in Build Phases -> Compile Sources

    • LinkMap file location

From the above PageFault times and loading order, we can find that in factThe root cause of too many PageFaults is the method that needs to be called at startup time, which is in a different Page. Therefore, our optimization idea is:Arrange all the methods that need to be called at startup time in one page, so that multiple PageFaults become one PageFault. This is binary rearrangementCore principlesAs shown below.

Note: In the iOS production environment, when a Page Fault occurs, the iOS system performs a signature verification on the app when it is reloaded. Therefore, the Page Fault in the iOS production environment takes more time than that in the Debug environment.

Binary rearrangement practice

Now, let’s do some concrete practice, first understand some nouns

LinkMap Is an intermediate product of iOS compilation, which records the layout of binary files. You need to enable the Write LinkMap File in Xcode’s Build Settings. The LinkMap consists of three parts:

  • Object FilesThe path and file number of the link unit used to generate the binary
  • SectionsRecord the range of addresses for each Segment/section in Mach-O
  • SymbolsRecord the address range of each symbol in order

ld

Ld is the linker used by Xcode and has an order_file parameter. We can configure a File path with the suffix Order by setting it to Build Settings -> Order File. In this order file, the required symbols are written in the order in which they are loaded when the project is compiled to achieve our optimization.

  • The absence of methods in the order file is automatically ignored

So the essence of binary rearrangement is to rearrange the symbols that start loading.

If the project is small, it is possible to customize an order file and manually add the order of methods. However, if the project is large and involves many methods, how do we get the function to start running? There are several ideas

  • 1, the hook objc_msgSendAs we know, the essence of a function is to send a message that will come at the bottomobjc_msgSend, but because the objc_msgSend parameter is mutable, it needs to passassemblyAcquisition, higher requirements for developers. And you can only get itOCAnd the swift,@objcMethods after
  • 2. Static scanningScanning:Mach-OSymbol and function data stored in a particular section or section
  • 3. Clang piling: batch hook, can achieve 100% symbol coverage, that is, full accessSwift, OC, C, blockfunction

Clang plugging pile

LLVM comes with a simple code coverage test built in. It inserts calls to user-defined functions at the function level, base block level, and edge level. Santizer coverage is needed for our batch hook here.

The official documentation for clang’s pile coverage is as follows: The clang code Coverage tool documentation provides a detailed overview, as well as a brief Demo.

  • [Step 1: Configure] Enable Santizer Coverage

    • OC project, need to be in:In the Build SettingsIn the”Other C Flags“Add-fsanitize-coverage=func,trace-pc-guardThe __sanitizer_cov_trace_pc_guard function is added to each method/function/block at compile time when clang tracing is required
    • If Swift project or OC has compiled Swift, additional information in”Other Swift Flags“Add-sanitize-coverage=func-sanitize=undefined
    • All binaries linked to the App need to be turned onSanitizerCoverageIn order to fully cover all calls.
    • Also throughpodfileTo configure the parameters
      post_install do |installer| installer.pods_project.targets.each do |target| target.build_configurations.each do |config|  config.build_settings['OTHER_CFLAGS'] = '-fsanitize-coverage=func,trace-pc-guard' config.build_settings['OTHER_SWIFT_FLAGS'] = '-sanitize-coverage=func -sanitize=undefined' end end endCopy the code

    After the configuration as shown in figure, there is an error in compiling There are two errors in compiling, which means that setting that parameter will call the two functions in the above example. Let’s implement the above two functions.

  • In viewController.m, override two methods :__sanitizer_cov_trace_pc_guard_init and __sanitizer_cov_trace_pc_guard. The code is as follows:

    #import "ViewController.h"
    
    #include <stdint.h>
    #include <stdio.h>
    #include <sanitizer/coverage_interface.h>
    #import <dlfcn.h>
    #import <libkern/OSAtomic.h>
    #import "Test-Swift.h"
    
    @interface ViewController ()
    
    @end
    @implementation ViewController
    
    void test(){
        block1();
    }
    
    int test1(){
        return 0;
    }
    
    void(^block1)(void) = ^ (void){
    
    };
    
    
    - (void)viewDidLoad {
        [superviewDidLoad]; test(); } + (void)load
    {
        [SwiftTest swiftTest];
    }
    
    // Atomic queue, whose purpose is to ensure write safety, thread safety
    static  OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT;
    // Define a symbolic structure in the form of a linked list
    typedef struct {
        void *pc;
        void *next;
    }MMNode;
    
    /* -start: start position -stop: not the address of the last symbol, but the address of the last symbol in the entire symbol table =stop-4 (because stop is an unsigned int, 4 bytes). Stop stores the value of the symbol */
    void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
        static uint64_t N;
        if (start == stop || *start) return;
        printf("INIT: %p - %p\n", start, stop);
        for(uint32_t *x = start; x < stop; x++) { *x = ++N; }}/* Fully hook methods, functions, and block calls, used to capture symbols, are multithreaded. This method stores only PCS, in the form of a linked list - guard is a sentinel that tells us the number of */ to be called
    void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    // if (! *guard) return; // The load method is filtered out, so it needs to be commented out
    
        / / for the PC
        /* -pc The current function returns the address of the previous call. -0 The current function address, i.e. the return address of the current function. -1 The current function caller's address, i.e. the return address of the previous function */
        void *PC = __builtin_return_address(0);
        // Create node and assign the value
        MMNode *node = malloc(sizeof(MMNode));
        *node = (MMNode){PC, NULL};
    
        // Join the queue
        // The symbol is accessed not by subscript, but by next pointer to the list, so we need to borrow offsetof (structure type, next address is next).
        OSAtomicEnqueue(&symbolList, node, offsetof(MMNode, next));
    }
    @end
    Copy the code
  • __sanitizer_cov_trace_pc_guard_init method

    • Parameter 1startIs a pointer to an unsigned int, 4 bytes long, equivalent to an arrayThe starting position, the starting position of the symbol (read from high to low
    • Argument 2 stop, since the address of the data is read down (i.eRead from high to low, so the address is not the real address of stop, but the last address marked. When reading stop, because stop takes up 4 bytes,Stop Real address = stop printed address -0x4)

    • What does the value stored in the stop memory address represent? When adding a method/block /c++/ attribute to a method (three more), find that its value is also increased by the corresponding number, such as adding a test1 method
  • The __sanitizer_cov_trace_pc_guard method captures all symbols at the start time, enqueuing all symbols

    • parameterguardIt was a sentinel,Tell us which number was called
    • The storage of symbols requires a helpThe list, so you need to define the linked list nodeMMLNode.
    • throughOSQueueHeadAtomic queues are created to ensure read and write security
    • throughOSAtomicEnqueueMethods the nodeThe teamThe next symbol is accessible through the next pointer to the list
  • – The while loop fetches the symbols from the queue, processes the prefixes of non-OC methods, and stores them in an array

    • An array ofThe not, because the queue is stored in reverse order
    • An array ofduplicate removalAnd removes the symbol of its own method
    • Convert the symbols in the array into strings and write them to the SANDbox TEM foldermm.orderIn the file

    Let’s write it in the touch method:

    - (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event
        {
            // Define an array
            NSMutableArray<NSString *> * symbolNames = [NSMutableArray array];
    
            while (YES) {// a loop! Will also be HOOK once!!
               MMNode * node = OSAtomicDequeue(&symbolList, offsetof(MMNode, next));
    
                if (node == NULL) {
                    break;
                }
                Dl_info info = {0};
                dladdr(node->pc, &info);
        // printf("%s \n",info.dli_sname);
                NSString * name = @(info.dli_sname);
                free(node);
    
                // if it is an OC function and not preceded by "_"
                BOOL isObjc = [name hasPrefix:@"+ ["]||[name hasPrefix:@"-"];
                NSString * symbolName = isObjc ? name : [@"_" stringByAppendingString:name];
                // Whether to remove??
                [symbolNames addObject:symbolName];
                / * if ([name hasPrefix: @ "+ ["] | | [name hasPrefix: @" - ["]) {/ / if the OC method name direct deposit! [symbolNames addObject:name]; continue;}  [symbolNames addObject:[@"_" stringByAppendingString:name]]; */
            }
            // Reverse the array
        // symbolNames = (NSMutableArray
            
             *)[[symbolNames reverseObjectEnumerator] allObjects];
            
            NSEnumerator * enumerator = [symbolNames reverseObjectEnumerator];
    
            // Create a new array
            NSMutableArray * funcs = [NSMutableArray arrayWithCapacity:symbolNames.count];
            NSString * name;
            / / to heavy!
            while (name = [enumerator nextObject]) {
                if(! [funcs containsObject:name]) {// The array does not contain name
                    [funcs addObject:name];
                }
            }
            [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];
            // Array to string
            NSString * funcStr = [funcs componentsJoinedByString:@"\n"];
            // Write the string to the file
            // File path temp real machine
            NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"mm.order"];
            // File contents
            NSData * fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
            [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    }
    Copy the code

    To connect to the real machine, click. Then download it locally

  • Copy the mm. Order File, place it in the specified location, and configure the path./mm. Order.

Below is the comparison before and after order configuration (above is the LinkMap before configuration, below is the LinkMap symbol order after configuration)

Before:After:

Note: Avoid endless loops

  • Build Settings -> Other C FlagsIf yes is configured-fsanitize-coverage=trace-pc-guardIn theThe while loopPart of it will appearInfinite loop(we are intouchBeginDebug in method)

  • We opened assembly debugging and found three__sanitizer_cov_trace_pc_guardThe call

  • The first time bl istouchBegin

  • The second bl is becauseThe while loop. That as long asIf it is a jump, it will be hookedThat there areb(Unconditional jump)bl(conditional jump) instruction, will be hooked

  • The third bl is printf

-fsanitize-coverage=func,trace-pc-guard -fsanitize-coverage=func,trace-pc-guard -fsanitize-coverage=func

Refer to the link

  • IOS optimization chapter App startup time optimization
  • AppOrderFiles