A brief introduction.
App startup time is an important indicator of App performance, or the first impression of App performance. In this article, we will introduce the relevant knowledge and statistics of startup time.
Two. Startup optimization
2.1 App startup Mode
First of all, there are two types of startup mode of App:
1. Cold startup: Start the App from scratch. 2. Hot startup: The App already exists in memory, but the background is still aliveCopy the code
The tests were then performed using both startup methods. Generally speaking, Launch time (click icon -> Show Launch Screen -> Launch Screen disappear) under 400ms is optimal, and the system limits the startup time to no more than 20s, otherwise you will be killed due to watchdog mechanism. The timeout limit varies from lifecycle to lifecycle:
The life cycle | timeout |
---|---|
Start the Launch | 20 s |
Restore the Resume | 10 s |
Suspension Suspend | 10 s |
Exit the Quit | 6 s |
The Background Background | 10 min |
2.2 App startup process
The startup process is generally divided into pre-main(before main function) and after main function.
2.2.1 the pre – main
Tasks and optimization methods at each stage:
phase | work | To optimize the |
---|---|---|
Load dylibs | Dyld gets the list of dependent dynamic libraries to load from the header of the main executable. It then needs to find each dylib, and the dylib files on which the application depends may in turn depend on other Dylibs, so all it needs to load is the list of dynamic libraries, a collection of recursive dependencies | 1. Try not to use embedded Dylib, because loading embedded Dylib costs a lot of performance; 2. Merge existing dylib and use static Archives to reduce the number of dylib uses; 3. Lazy loading of dylib, but be aware that dlopen() can cause some problems and actually does more work |
Rebase and Bind | 1. Rebase adjusts the pointer pointer inside the Image. In the past, the dynamic library was loaded at the specified address, and all Pointers and data were correct for the code. Now the address space layout is randomized, so it needs to be modified at the original address based on random offsets. 2. Bind refers to the correct pointer to the content outside the Image. These external Pointers are bound by symbol names. Dyld needs to search the symbol table to find the corresponding implementation of symbol | 1. Reduce the number of ObjC classes, methods, and categories; 2. Reduce the number of C++ virtual functions (creating a virtual function table is expensive); 3. Use Swift structs (internally optimized for fewer symbols) |
Objc setup | 1. Class Objc registration; 2. Insert the category definition in category registration; 3. Ensure that every selector is unique | Reduce the number of Objective-C classes, selectors, and categories by merging or deleting OC classes |
Initializers | 1.Objc +load(); 2.C++ constructor attribute function; 3. Creation of C++ static global variables of non-primitive types (usually classes or structs) | 1. Do fewer things in the class’s +load method and try to defer them until +initiailize; 2. Reduce the number of constructor functions and do fewer things in constructor functions. Reduce the number of C++ static global variables |
For the pre-main phase, Xcode provides a method of time consumption for each phase, Product -> Scheme -> Edit Scheme -> Environment Variables
Total pre-main time: 955.81 milliseconds (100.0%) Dylib loading time: 97.42 milliseconds (10.1%) rebase/binding time 55.08 milliseconds (5.7%) ObjC Setup Time: 68.65 milliseconds (7.1%) Initializer time: 734.45 milliseconds (76.8%) slowest intializers: 7.65 milliseconds (0.8%) libMainThreadChecker. Dylib: 36.33 milliseconds (3.8%)...Copy the code
Here are some additional dyLD environment variables:
variable | describe |
---|---|
DYLD_PRINT_STATISTICS_DETAILS | Detailed parameters such as startup time are displayed |
DYLD_PRINT_SEGMENTS | Log segment mapping |
DYLD_PRINT_INITIALIZERS | Log image initialization requirements |
DYLD_PRINT_BINDINGS | Log symbol binding |
DYLD_PRINT_APIS | Logging DYLD API calls (for example, dlopen) |
DYLD_PRINT_ENV | Prints startup environment variables |
DYLD_PRINT_OPTS | Prints startup command line arguments |
DYLD_PRINT_LIBRARIES_POST_LAUNCH | The log library loads, but only after main runs |
DYLD_PRINT_LIBRARIES | Log library loading |
DYLD_IMAGE_SUFFIX | Start by searching for libraries with this suffix |
This method is really convenient, but what if we want to measure per-main phase time consumption ourselves? Since we mainly optimize for cold start, we will first introduce the process of cold start:
It can be summarized in three stages:
Dyld: load image, dynamic library 2. RunTime method 3. Main function initializationCopy the code
As you can see in the figure, the Run Image Initializers stage (in the Apple demo) can be implemented before Main. Load, __attribute__(((constructor)), and C++ static object initialization;
Load time monitoring
To know when the load method will execute, you inevitably need to get the +load class and the classification method. One is to read all the classes and their metaclasses under the corresponding mirror through the Runtime API, and iterate through the instance methods of the metaclasses one by one. If the method name is load, the hook operation is performed, representing the library AppStartTime. The __objc_nlclslist and __objc_nlcatList sections that were written to the DATA section of the Mach-o file at compile time can be read directly using getSectionData as runtime. These two sections are used to store the no lazy class list and the no lazy category list respectively. The so-called no lazy structure is the class or classification that defines the +load method, representing the library A4LoadMeasure.
First, let’s talk about the comparison results of the two schemes:
library | The load error | The statistical range |
---|---|---|
AppStartTime | Around 100 ms | class |
A4LoadMeasure | About 50 ms | Class and classification |
According to the test results, of course, we will choose the latter, and also counted the classified load. In terms of performance, the former loop calls the object_getClass() method, which triggers the realize operation of the class. Operations such as creating read and write information storage space for the class, adjusting the layout of member variables, and inserting classification method properties, simply make the class available. This can add extra per-main time and cause unnecessary overhead when there are a large number of classes doing this.
Static NSArray <LMLoadInfo *> *getNoLazyArray(const struct mach_header * MHDR) { NSMutableArray *noLazyArray = [NSMutableArray new]; unsigned long bytes = 0; Class *clses = (Class *)getDataSection(mhdr,"__objc_nlclslist", &bytes);
for (unsigned int i = 0; i < bytes / sizeof(Class); i++) {
LMLoadInfo *info = [[LMLoadInfo alloc] initWithClass:clses[i]];
if(! shouldRejectClass(info.clsname)) [noLazyArray addObject:info]; } bytes = 0; Category *cats = getDataSection(mhdr,"__objc_nlcatlist", &bytes);
for (unsigned int i = 0; i < bytes / sizeof(Category); i++) {
LMLoadInfo *info = [[LMLoadInfo alloc] initWithCategory:cats[i]];
if(! shouldRejectClass(info.clsname)) [noLazyArray addObject:info]; }returnnoLazyArray; } static void hookAllLoadMethods(LMLoadInfoWrapper *infoWrapper) {unsigned int count = 0; static void hookAllLoadMethods(LMLoadInfoWrapper *infoWrapper) {unsigned int count = 0; Class metaCls = object_getClass(infoWrapper.cls); Method *methodList = class_copyMethodList(metaCls, &count);for (unsigned int i = 0, j = 0; i < count; i++) {
Method method = methodList[i];
SEL sel = method_getName(method);
const char *name = sel_getName(sel);
if(! strcmp(name,"load")) {
LMLoadInfo *info = nil;
if (j > infoWrapper.infos.count - 1) {
info = [[LMLoadInfo alloc] initWithClass:infoWrapper.cls];
[infoWrapper insertLoadInfo:info];
LMAllLoadNumber++;
} else {
info = infoWrapper.infos[j];
}
++j;
swizzleLoadMethod(infoWrapper.cls, method, info);
}
}
free(methodList);
}
Copy the code
A4LoadMeasure uses LMAllLoadNumber to locate the last print. There is a calculation error. This is a bit of a trick.
attribute) and C++ objects are statically initialized
__attribute__ is a compiler attribute featured in GNU C, which can be learned from iOS attribute; It is called in the following order with load,main, and initialize:
load -> attribute((constructor)) -> main -> initialize
Ok, next, let’s compare these two tripartite libraries again:
library | The static error of the initialize |
---|---|
AppStartTime | Around 30 ms |
A4LoadMeasure | Around 40 ms |
Statistically, the two data are similar, the most important is to print the method pointer; A4LoadMeasure () : how long does the __attribute__((constructor) method take in C++ Static Initializers? I can’t understand this wave operation; Getting the __mod_init_func (initialized global function address) segment is more appreciated;
Initialization functions are generally executed in the following order:
initializeMainExecutable -> ImageLoader::runInitializers -> ImageLoader::doInitialization -> ImageLoaderMachO::doModInitFunctions
The last function is the main processing logic, with the following code attached at 👇 :
// This function handles __mod_init_func void ImageLoaderMachO::doModInitFunctions(const LinkContext& context)
{
if ( fHasInitializers ) {
const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
const struct load_command* cmd = cmds;
for (uint32_t i = 0; i < cmd_count; ++i) {
if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
const struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
const struct macho_section* const sectionsStart = (struct macho_section*)((char*)seg + sizeof(struct macho_segment_command));
const struct macho_section* const sectionsEnd = §ionsStart[seg->nsects];
for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
const uint8_t type = sect->flags & SECTION_TYPE;
if ( type == S_MOD_INIT_FUNC_POINTERS ) {
Initializer* inits = (Initializer*)(sect->addr + fSlide);
const size_t count = sect->size / sizeof(uintptr_t);
for (size_t j=0; j < count; ++j) {
Initializer func = inits[j];
// <rdar://problem/8543820&9228031> verify initializers are in image
if(! this->containsAddress((void*)func) ) { dyld::throwf("initializer function %p not in mapped image for %s\n", func, this->getPath()); } func(context.argc, context.argv, context.envp, context.apple, &context.programVars); } } } } cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize); }}}Copy the code
If (! This ->containsAddress((void*)func)) this->containsAddress((void*)func)) So when other images perform this judgment, they throw an exception. In demo projects this phenomenon is not obvious, when the project architecture is more complex, this problem is more obvious;
Iii. Project description
Currently, the project supports pod introduction:
pod 'A0PreMainTime'
The #****** child component is introduced separately at ***********
# Time detection in pre-main phase
pod 'A0PreMainTime/PreMainTime'
# Business time measurement
pod 'A0PreMainTime/TimeMonitor'
Copy the code
For details, see A0PreMainTime
Study:
Calculate the time of the +load method
How to accurately measure the startup time of aN iOS App
App startup time optimization
IOS startup time optimization
Mobile iOS performance optimization exploration
Meituan Takeout iOS App cold startup governance
Extension:
Dyld,
Submit to AppStore Issue: Unsupported architecture x86