A brief introduction.

App startup time is an important indicator of App performance, or the first impression of App performance. In this article, we will introduce the relevant knowledge and statistics of startup time.

Two. Startup optimization

2.1 App startup Mode

First of all, there are two types of startup mode of App:

1. Cold startup: Start the App from scratch. 2. Hot startup: The App already exists in memory, but the background is still aliveCopy the code

The tests were then performed using both startup methods. Generally speaking, Launch time (click icon -> Show Launch Screen -> Launch Screen disappear) under 400ms is optimal, and the system limits the startup time to no more than 20s, otherwise you will be killed due to watchdog mechanism. The timeout limit varies from lifecycle to lifecycle:

The life cycle timeout
Start the Launch 20 s
Restore the Resume 10 s
Suspension Suspend 10 s
Exit the Quit 6 s
The Background Background 10 min

2.2 App startup process

The startup process is generally divided into pre-main(before main function) and after main function.

2.2.1 the pre – main

Tasks and optimization methods at each stage:

phase work To optimize the
Load dylibs Dyld gets the list of dependent dynamic libraries to load from the header of the main executable. It then needs to find each dylib, and the dylib files on which the application depends may in turn depend on other Dylibs, so all it needs to load is the list of dynamic libraries, a collection of recursive dependencies 1. Try not to use embedded Dylib, because loading embedded Dylib costs a lot of performance; 2. Merge existing dylib and use static Archives to reduce the number of dylib uses; 3. Lazy loading of dylib, but be aware that dlopen() can cause some problems and actually does more work
Rebase and Bind 1. Rebase adjusts the pointer pointer inside the Image. In the past, the dynamic library was loaded at the specified address, and all Pointers and data were correct for the code. Now the address space layout is randomized, so it needs to be modified at the original address based on random offsets. 2. Bind refers to the correct pointer to the content outside the Image. These external Pointers are bound by symbol names. Dyld needs to search the symbol table to find the corresponding implementation of symbol 1. Reduce the number of ObjC classes, methods, and categories; 2. Reduce the number of C++ virtual functions (creating a virtual function table is expensive); 3. Use Swift structs (internally optimized for fewer symbols)
Objc setup 1. Class Objc registration; 2. Insert the category definition in category registration; 3. Ensure that every selector is unique Reduce the number of Objective-C classes, selectors, and categories by merging or deleting OC classes
Initializers 1.Objc +load(); 2.C++ constructor attribute function; 3. Creation of C++ static global variables of non-primitive types (usually classes or structs) 1. Do fewer things in the class’s +load method and try to defer them until +initiailize; 2. Reduce the number of constructor functions and do fewer things in constructor functions. Reduce the number of C++ static global variables

For the pre-main phase, Xcode provides a method of time consumption for each phase, Product -> Scheme -> Edit Scheme -> Environment Variables

Total pre-main time: 955.81 milliseconds (100.0%) Dylib loading time: 97.42 milliseconds (10.1%) rebase/binding time 55.08 milliseconds (5.7%) ObjC Setup Time: 68.65 milliseconds (7.1%) Initializer time: 734.45 milliseconds (76.8%) slowest intializers: 7.65 milliseconds (0.8%) libMainThreadChecker. Dylib: 36.33 milliseconds (3.8%)...Copy the code

Here are some additional dyLD environment variables:

variable describe
DYLD_PRINT_STATISTICS_DETAILS Detailed parameters such as startup time are displayed
DYLD_PRINT_SEGMENTS Log segment mapping
DYLD_PRINT_INITIALIZERS Log image initialization requirements
DYLD_PRINT_BINDINGS Log symbol binding
DYLD_PRINT_APIS Logging DYLD API calls (for example, dlopen)
DYLD_PRINT_ENV Prints startup environment variables
DYLD_PRINT_OPTS Prints startup command line arguments
DYLD_PRINT_LIBRARIES_POST_LAUNCH The log library loads, but only after main runs
DYLD_PRINT_LIBRARIES Log library loading
DYLD_IMAGE_SUFFIX Start by searching for libraries with this suffix

This method is really convenient, but what if we want to measure per-main phase time consumption ourselves? Since we mainly optimize for cold start, we will first introduce the process of cold start:

It can be summarized in three stages:

Dyld: load image, dynamic library 2. RunTime method 3. Main function initializationCopy the code

As you can see in the figure, the Run Image Initializers stage (in the Apple demo) can be implemented before Main. Load, __attribute__(((constructor)), and C++ static object initialization;

Load time monitoring

To know when the load method will execute, you inevitably need to get the +load class and the classification method. One is to read all the classes and their metaclasses under the corresponding mirror through the Runtime API, and iterate through the instance methods of the metaclasses one by one. If the method name is load, the hook operation is performed, representing the library AppStartTime. The __objc_nlclslist and __objc_nlcatList sections that were written to the DATA section of the Mach-o file at compile time can be read directly using getSectionData as runtime. These two sections are used to store the no lazy class list and the no lazy category list respectively. The so-called no lazy structure is the class or classification that defines the +load method, representing the library A4LoadMeasure.

First, let’s talk about the comparison results of the two schemes:

library The load error The statistical range
AppStartTime Around 100 ms class
A4LoadMeasure About 50 ms Class and classification

According to the test results, of course, we will choose the latter, and also counted the classified load. In terms of performance, the former loop calls the object_getClass() method, which triggers the realize operation of the class. Operations such as creating read and write information storage space for the class, adjusting the layout of member variables, and inserting classification method properties, simply make the class available. This can add extra per-main time and cause unnecessary overhead when there are a large number of classes doing this.

Static NSArray <LMLoadInfo *> *getNoLazyArray(const struct mach_header * MHDR) { NSMutableArray *noLazyArray = [NSMutableArray new]; unsigned long bytes = 0; Class *clses = (Class *)getDataSection(mhdr,"__objc_nlclslist", &bytes);
    for (unsigned int i = 0; i < bytes / sizeof(Class); i++) {
        LMLoadInfo *info = [[LMLoadInfo alloc] initWithClass:clses[i]];
        if(! shouldRejectClass(info.clsname)) [noLazyArray addObject:info]; } bytes = 0; Category *cats = getDataSection(mhdr,"__objc_nlcatlist", &bytes);
    for (unsigned int i = 0; i < bytes / sizeof(Category); i++) {
        LMLoadInfo *info = [[LMLoadInfo alloc] initWithCategory:cats[i]];
        if(! shouldRejectClass(info.clsname)) [noLazyArray addObject:info]; }returnnoLazyArray; } static void hookAllLoadMethods(LMLoadInfoWrapper *infoWrapper) {unsigned int count = 0; static void hookAllLoadMethods(LMLoadInfoWrapper *infoWrapper) {unsigned int count = 0; Class metaCls = object_getClass(infoWrapper.cls); Method *methodList = class_copyMethodList(metaCls, &count);for (unsigned int i = 0, j = 0; i < count; i++) {
        Method method = methodList[i];
        SEL sel = method_getName(method);
        const char *name = sel_getName(sel);
        if(! strcmp(name,"load")) {
            LMLoadInfo *info = nil;
            if (j > infoWrapper.infos.count - 1) {
                info = [[LMLoadInfo alloc] initWithClass:infoWrapper.cls];
                [infoWrapper insertLoadInfo:info];
                LMAllLoadNumber++;
            } else {
                info = infoWrapper.infos[j];
            }
            ++j;
            swizzleLoadMethod(infoWrapper.cls, method, info);
        }
    }
    free(methodList);
}
Copy the code

A4LoadMeasure uses LMAllLoadNumber to locate the last print. There is a calculation error. This is a bit of a trick.

attribute) and C++ objects are statically initialized

__attribute__ is a compiler attribute featured in GNU C, which can be learned from iOS attribute; It is called in the following order with load,main, and initialize:

load -> attribute((constructor)) -> main -> initialize

Ok, next, let’s compare these two tripartite libraries again:

library The static error of the initialize
AppStartTime Around 30 ms
A4LoadMeasure Around 40 ms

Statistically, the two data are similar, the most important is to print the method pointer; A4LoadMeasure () : how long does the __attribute__((constructor) method take in C++ Static Initializers? I can’t understand this wave operation; Getting the __mod_init_func (initialized global function address) segment is more appreciated;

Initialization functions are generally executed in the following order:

initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::doInitialization -> ImageLoaderMachO::doModInitFunctions

The last function is the main processing logic, with the following code attached at 👇 :

// This function handles __mod_init_func void ImageLoaderMachO::doModInitFunctions(const LinkContext& context)
{
    if ( fHasInitializers ) {
        const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
        const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
        const struct load_command* cmd = cmds;
        for (uint32_t i = 0; i < cmd_count; ++i) {
            if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
                const struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
                const struct macho_section* const sectionsStart = (struct macho_section*)((char*)seg + sizeof(struct macho_segment_command));
                const struct macho_section* const sectionsEnd = &sectionsStart[seg->nsects];
                for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
                    const uint8_t type = sect->flags & SECTION_TYPE;
                    if ( type == S_MOD_INIT_FUNC_POINTERS ) {
                        Initializer* inits = (Initializer*)(sect->addr + fSlide);
                        const size_t count = sect->size / sizeof(uintptr_t);
                        
                        for (size_t j=0; j < count; ++j) {
                            Initializer func = inits[j];
                            // <rdar://problem/8543820&9228031> verify initializers are in image
                            if(! this->containsAddress((void*)func) ) { dyld::throwf("initializer function %p not in mapped image for %s\n", func, this->getPath()); } func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } } } } cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize); }}}Copy the code

If (! This ->containsAddress((void*)func)) this->containsAddress((void*)func)) So when other images perform this judgment, they throw an exception. In demo projects this phenomenon is not obvious, when the project architecture is more complex, this problem is more obvious;

Iii. Project description

Currently, the project supports pod introduction:

pod 'A0PreMainTime'

The #****** child component is introduced separately at ***********
# Time detection in pre-main phase
pod 'A0PreMainTime/PreMainTime'
# Business time measurement
pod 'A0PreMainTime/TimeMonitor'
Copy the code

For details, see A0PreMainTime

Study:

Calculate the time of the +load method

How to accurately measure the startup time of aN iOS App

App startup time optimization

IOS startup time optimization

Mobile iOS performance optimization exploration

Meituan Takeout iOS App cold startup governance

Extension:

Dyld,

Submit to AppStore Issue: Unsupported architecture x86