Do you get mad when you click on an APP from your mobile desktop and wait until the page doesn’t show up, or even uninstall it? Therefore, in terms of user experience and retention, good startup speed is a must.

One: Startup type

“Cold start” and “hot start”

  • Cold start:AppClick before startup at this timeAppIs not in the system yet. The system needs to create a new process and assign it toApp. This is a complete oneAppStartup process).

The best startup time is within 400ms, because the startup animation time is 400ms.

  • Thermal activation:AppAfter a cold boot the user willAppBack in the backgroundAppThe process is still in the system. User returnAppIn the process. (Hot starts do less).

What we call startup optimization is also forAPPCold start said.

Cold startIt is mainly divided into three stages:

  • main()Before function execution (pre-main phase).
  • main()After the function executes (frommainThe function executes to the Settingsself.window.rootViewControllerExecution completed.
  • After rendering the first screen (fromself.window.rootViewControllerExecution completed todidFinishLaunchWithOptionsMethod scope ends).

Startup time (Pre-main phase)

In the pre-main phase we can get the elapsed time by adding environment variables.

Select Project→Scheme→Edit Scheme from the Xcode menu… Run → Environment Variables →+ add the Environment variable DYLD_PRINT_STATISTICS whose name is DYLD_PRINT_STATISTICS value 1.

Below is the actual project launch time:

Reading:

The main() function used 1.7 seconds in total

  • Dylib loading time: the dynamic library loading time is 64.05ms

    • Loading dynamic libraries will certainly take time, and dynamic libraries will have dependencies. The system dynamic inventory lies in the shared cache, but the custom dynamic library cannot do the shared cache, so it will consume more time. Therefore, Apple officially recommends no more than six custom dynamic libraries, which can be merged to optimize the loading time of dynamic libraries.

    • Dynamic library merge, need source code to carry out. So we can only merge our own dynamic libraries, the daily use of three party SDK may not be able to merge.

  • Rebase/Binding time: pointer offset correction/symbol binding takes 213.88ms

    • Rebase: The system uses ASLR technology to ensure randomization of address space. So at runtime, you need to reposition symbols via Rebase, using ASLR+ offset addresses;

    • Binding: Use external symbols. Function addresses cannot be found at compile time. So at runtime, dyLD loads the shared cache, loads the linked dynamic library, and then performs a binding operation to rebind the external symbols.

  • ObjC Setup Time: Class initialization time 789.76

    • OC class registration process, read the binary data segment to find information about OC, and then register OC class. When the application is started, the system will generate two tables of class and classification. OC class and classification registration will be inserted into these two tables, so it will cause a certain amount of time consumption.

    • This time is difficult to optimize, except by reducing the definition of classes and categories in the project;

    • Reduce the use of load methods for classes and their own classes, and let classes load lazily.

  • Initializer time: It takes 726.45 to execute the load and constructor

    • Use whenever possibleinitializeMethods to replaceloadMethods.

The whole process is shown below:

tool

Time Profiler

The Call Tree->Hide System Libraries filter out the System Libraries to see the time taken by methods in the main thread.

At the bottom of the state

But Time profilers are only good for coarse-grained analysis. Let’s see how it works:

By default, Time Profiler samples every 1ms, collecting only the call stack in the running thread, and finally summarizing it in a statistical manner. For example, method3 is not sampled in any of the five samples shown in the figure below, so method3 is not seen in the stack aggregated. So the Time seen in the Time Profiler is not the actual Time when the code is executed, but the Time when the stack appears in the sample statistics.

System Trace

System Trace can support refined analysis.

Since we want to refine the analysis, we need to mark a short period of time, which can be marked by Point of interest. In addition, System Trace is useful for analyzing virtual memory and thread state:

  • Virtual Memory: Focus on the Page In event, because there are many Page In the boot path and it is relatively time-consuming
  • Thread State: Focus on pending and preemption states, keeping in mind that the main Thread is not always running
  • System Load threads have a priority, and the number of high-priority threads should not exceed the number of System cores

Practice 2:

The pre – the main stage

Reducing dynamic libraries

Reducing the number of dynamic libraries can also reduce the time required to create and load dynamic libraries during the startup closure phase. It is recommended that the number of dynamic libraries be less than 6.

The recommended approach is to switch from a dynamic library to a static library because of the additional reduction in package size. Another way is to merge dynamic libraries, but it is not feasible in practice. Finally, don’t link to libraries (including the system) that you don’t need, because it will slow down the creation of closures.

Tips: How to view dynamic libraries?

  • In the projectProductFolder find our project.appFile, right clickShow in Finder.
  • Go to the corresponding directory and right-clickDisplay package contents.
  • findFrameworksFolder, open.

Remove useless classes and code

The offline code can reduce the time it takes to initialize Rebase & Bind & Runtime, and of course you can try merging classes and extensions with similar functionality (categories).

AppCode can be used to detect useless code.

+ load migration

In addition to the time-consuming method itself, load can cause a lot of Page In

  • Option 1: If possible, put the +load content after rendering.

  • Use “+initialize()” instead of “+load()”. Use “dispatch_once()” instead of “+initialize()”.

Statically initiate migration

Static initialization, like the +load method, also causes a lot of Page In, usually from C++ code, reducing C++ static global variables.

After the main

Image resources

There are a lot of images to boot. Is there a way to optimize the loading time of images?

Use Asset to manage images instead of putting them directly in bundles. Asset is optimized at compile time to make loading faster, and loading images in Asset is faster than loading images in bundles because UIImage imageNamed has to walk through the Bundle to find the image. Loading the Asset diagram takes most of the time in the first search, because indexing can be reduced by putting the starting diagram into a small Asset.

Every time you create a UIImage you need IO, which is decoded during the first frame rendering. So you can optimize this time by preloading the child threads (creating uiImages) ahead of time.

As shown below, images are only used in the later stages of “RootWindow creation” and “first Frame render”, so you can start tasks early in the startup process by opening preloaded child threads.

Phased loading

DidFinishLaunchingWithOptions there must be some initialization, here the initialization is must do, but we can be appropriate depending on the function of the corresponding appropriate delay start time. For our project, I have divided initialization into three types:

  • Logs, statistics, and other events that must be configured first when the APP moves together
  • Events such as project configuration, environment configuration, user information initialization, push, and IM
  • Other SDK and configuration events

For the first category, due to the particularity of this kind of event, so must be started in the first place, still leaves it in a didFinishLaunchingWithOptions started. The second type of events, these functions must be loaded before the user enters the main body of the APP, so we can put it in the second batch, that is, the user has seen the AD page, and then start the countdown of the AD. The third type of event, since it is not required, can be placed in the viewDidAppear method after the first interface has rendered, which has no impact on startup time at all.

The home page UI uses pure code

Instead of loading the home UI in a storyboard in pure code, the storyboard is more resource-intensive to start up.

Three: binary rearrangement

Physical memory and virtual memory

Virtual memory vs. physical memory

Physical address The physical memory refers to the memory on the memory module. In the early days, all the data of a process is loaded into the physical memory, and the CPU accesses the process data through the physical memory address. This approach leads to the following problems:

  • Out of memory: If too many applications are started, the memory space is insufficient.
  • Waste of memory: When an application becomes larger and larger, users may only use part of its functions. If all functions are loaded into the memory, memory usage will be wasted.
  • Memory data security issues: You can directly modify the data in the physical memory by accessing the physical address.

In order to solve these problems of physical memory, the CPU can access process data indirectly through virtual memory rather than directly through physical memory address.

Virtual memory Virtual memory is an intermediate layer between processes and physical memory. It is generated by the system and managed by paging. The structure of virtual memory is shown as follows:

One virtual memory corresponds to one process, and the size is 4GB. The virtual memory is divided into many pages. The size of each page is 16kb on iOS and 4kb on other systems. The virtual memory address and physical memory address of a certain data item in a process are recorded for each cell in a Page. Therefore, virtual memory is essentially a mapping table of the virtual memory address and physical memory address associated with various data items in a process.

When virtual memory is used, the CPU accesses process data as follows:

  • After the process is started, the system creates a corresponding virtual memory for the process, which records the virtual memory address of each item of data in the process. At this time, the process has not been loaded into the physical memorypageThe physical memory address of the recorded data is 0x00000… .
  • When part of the process is active,CPULocate the corresponding physical memory address based on the virtual memory address of the data, and access the data in the physical memory through the physical address.
  • If thepageIs displayed when the corresponding physical address is not foundpageTriggered when the process data associated with is not loaded into physical memoryPage Fault, interrupts the current process, first loads the process data corresponding to the current page into physical memory, and thenpageIt records the physical address of each piece of data,CPUThen access the data in memory by physical address.

Therefore, virtual memory has the following advantages over direct access to physical memory:

  • More efficient memory usage: The process’s data will only be active after pagingpageThe associated data is loaded into physical memory. When physical memory is occupied, inactive memory is overwritten and active memory is loadedpageData, which can improve the efficiency of memory use.
  • Memory data is more secure: Each time a process is started, the system recreates the corresponding virtual memory and allocates one virtual memoryASLR Address Space Layout Randomization, the virtual address of the data is:ASLR random value + offset value, so that the virtual address of the data changes each time, andCPUPhysical memory is accessed indirectly through virtual memory. The physical memory address is not exposed during this process, so memory data is guaranteed to be secure.

The program’s code is loaded into virtual memory at the same address every time without modification, which is not safe. In order to solve the problem of fixed address, ASLR technology emerged.

ASLR

ASLR (Address Space Layout randomization) It is a security protection technology against buffer overflow. By randomizing the layout of linear areas such as heap, stack and shared library mapping, and increasing the difficulty of the attacker to predict the destination address, it can prevent the attacker from locating the attack code directly, so as to prevent overflow attacks.

Most major operating systems already implement ASLR:

  • Linux: in kernel version2.6.12addASLR;
  • Windows:Windows Server 2008,Windows 7,Windows Vista,Windows Server 2008 R2Is enabled by defaultASLR, but it only applies to dynamically linked libraries and executables;
  • Mac OS X:AppleinMac OS X Leopard10.5(In 2007,Some libraries import random address offsets, but the implementation does not provide themASLRComplete protection capability as defined. whileMac OS X Lion10.7Is provided for all applicationsASLRSupport.AppleClaims of improved support for the technology for applications can make32andA 64 - bitMore of these attacks are avoided. fromOS X Mountain Lion10.8At first, core and core expansion (kext) andzonesIt is also randomly configured during system startup.
  • iOS(iPhone,iPod touch,iPad) :AppleiniOS4.3To import theASLR;
  • Android:The Android 4.0Provides random loading of address space configuration (ASLR) to help protect the system and third party applications from attacks due to memory management problems inThe Android 4.1To add address independent code (position-independent code).

When the system accesses the virtual memory, data is not loaded into the physical memory. As a result, a Page Fault occurs and processes are blocked. In this case, the system loads data into physical memory before the process can continue running. Although each page of data is loaded into memory very quickly, in milliseconds, there may be a large number of missing page interrupts when the application is cold booted, resulting in a certain amount of time consumption in startup speed.

The principle of binary rearrangement

Let’s start with an example

#import "ViewController.h"
@interface ViewController ()
@end
@implementation ViewController
+ (void)test1 {
    NSLog(@"test1");
}

- (void)viewDidLoad {
    [super viewDidLoad];
    [self test2];
}

- (void)test2 {
    NSLog(@"test2");
}

+ (void)load {
    [self test1];
}
@end
Copy the code

Once this code starts, the order of execution of the methods is load, test1,viewDidLoad,test2

But what is the actual order of symbols?

Link Map FileThe file saves the symbol order of the project when compiling the link. It is arranged in the unit of method/function and configured accordingly. You can view the symbol order as shown in the following figure:

Let’s compare the order in which the code was edited to the order in which the code ended up

We find that the code is compiled in the same order as the final code, and that the code in each file is written from top to bottom, meaning that the final code is compiled first and the methods in each file are written in the same order.

This ordering also results in the code being distributed across multiple pages at startup, with startup methods not clustered together, resulting in a large number of broken pages.

All a binary rearrangement does is line up all the methods that need to be called at startup time.

View PageFault times

Open the Instruments:

Click Start to stop the Page after the home Page comes out, select Main Thread, select Virtual Memory,File Backed Page in 2451.

ClangInsert the pile

Clang documentation is a technology apple already uses, but the key is a.order file, and the linker will eventually sort methods by the order of symbols in that file.

Below isobjc4-750In the.orderSymbols in files

Step 1: Build Setting configuration

Using the above demo, we create a.order file in the project directory and add the following symbols, which are in the same order as the actual methods in the demo. Let’s recompile and look at the link file

+[ViewController load]
+[ViewController test1]
-[ViewController viewDidLoad]
-[ViewController test2]
Copy the code

The final sign order is the same as we did in dot order

So how do I get the project to load the.order file?

Perform the following configuration:

In Build Setting –> Other C Flags, add -fsanitize — coverage= trace-PC-guard

-fsanitize-coverage= trace-PC-guard

As we know from the above, OC uses Clang (front end) +LLVM (back end), and Clang is used for lexical analysis, syntax analysis, syntax tree generation and other operations in the early stage of the code. Adding -fsanitize-coverage=trace-pc-guard tells Clang that we need to have the ability to track methods, Clang inserts the __sanitizer_cov_trace_pc_guard function at the edge of the method (both function and Block), resulting in each method call coming to the __sanitizer_cov_trace_pc_guard method, completing the pin, Equivalent to a HOOK operation at compile time.

Step 2: Add auxiliary code

Add __sanitizer_cov_trace_pc_guard_init and __sanitizer_cov_trace_pc_guard as documented

Build project normally will report error, comment out.

Add these two methods as shown in the documentation

#import "ViewController.h" #include <stdint.h> #include <stdio.h> #include <sanitizer/coverage_interface.h> @interface ViewController () @end @implementation ViewController - (void)viewDidLoad { [super viewDidLoad]; // 🌹 void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; // Counter for the guards. if (start == stop || *start) return; // Initialize only once. printf("INIT: %p %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) *x = ++N; // Guards should start from 1.} // 🌹 *guard) return; // void *PC = __builtin_return_address(0); 🌹 char PcDescr[1024]; // __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr)); 🌹 printf("guard: %p %x PC %s\n", guard, *guard, PcDescr); }Copy the code

__sanitizer_cov_trace_pc_guard_init

The compiler inserts this callback into each DSO as a module constructor. Start and stop correspond to the beginning and end of the entire binary (executable or DSO), that is, the function reflects the number of symbol calls.

start:0x104ded4c0 stop:0x104ded4f8

  • stopAddress subtracted from the basis4 bytesIs the value of the last address, fromstarttostopThe store is1-14Whenever a new method (function, Block) is added to the code through multiple operations, it will+ 1.

__sanitizer_cov_trace_pc_guard

Add the touchesBegan method, and look at the breakpoint where it was printed, and open assembly

-(void)touchesBegan:(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@"touchesBegan method "); }Copy the code

The __sanitizer_cov_trace_pc_guard function will be opened before the method is executed. Added -fsanitize- Coverage = trace-PC-guard.

Step 3: Get the symbol and generate.orderfile

Void *PC = __builtin_return_address(0); void *PC = __builtin_return_address And add the corresponding breakpoint.

Print the PC at the address 0x0000000102355eb8

The corresponding stack information is printed

Open assembly and take a look

Clang inserts the __sanitizer_cov_trace_pc_guard function at the edge of each method. If __sanitizer_cov_trace_pc_guard is executed, it must return this method. The address of __sanitizer_cov_trace_pc_guard is 0x0000000102355eb8, which is returned to main.

__builtin_return_addressThe getaddress () function returns the current return address of the callermain)

Next we import to get the function information through dl_info

typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;
Copy the code

thisdli_snameIs not the symbol we want!!

Save symbols to fetch through atomic queues

#import "ViewController.h" #include <stdint.h> #include <stdio.h> #include <sanitizer/coverage_interface.h> #import <dlfcn.h> #import <libkern/ osatomic.h > @interface ViewController () @end@implementation ViewController ( OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT; // typedef struct {void * PC; // Save the PC address void *next; }WJNode; Void test1(void){NSLog(@"test1 call "); testBlock(); } void (^ testBlock) (void) = ^ (void) {NSLog (@ "block calls"); }; - (void)viewDidLoad { [super viewDidLoad]; test1(); } void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; if (start == stop || *start) return; printf("INIT: %p %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) *x = ++N; } void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { if (! *guard) return; void *PC = __builtin_return_address(0); /* const char *dli_fname; void *dli_fbase; const char *dli_sname; void *dli_saddr; */ Dl_info info; dladdr(PC, &info); WJNode *node = malloc(sizeof(WJNode)); *node = (WJNode){PC,NULL}; // Add OSAtomicEnqueue(&symbolList, node, offsetof(WJNode, next)); } -(void)touch began :(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@' touchesBegan '); while (YES) { WJNode *node = OSAtomicDequeue(&symbolList, offsetof(WJNode, next)); if (node == NULL) { break; } Dl_info info = {0}; dladdr(node->pc, &info); printf("%s \n",info.dli_fname); }}Copy the code

The problem is that the while loop is also hooked

-fsanitize-coverage=func,trace-pc-guard

The problem was solved, but it didn’t seem to be completely solved. Methods were repeated, and main lacked underscores.

-(void)touchesBegan:(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@"touchesBegan method "); NSMutableArray<NSString *> *symbolArray = [NSMutableArray new]; while (YES) { WJNode *node = OSAtomicDequeue(&symbolList, offsetof(WJNode, next)); if (node == NULL) { break; } Dl_info info = {0}; dladdr(node->pc, &info); NSString *name = @(info.dli_sname); / / underline BOOL isObjc = [name hasPrefix: @ "+ ["] | | [name hasPrefix: @" - ["]; nsstrings * symbolName = isObjc? Name: [@"_" stringByAppendingString:name]; [symbolArray addObject:symbolName]; Enu = [symbolArray reverseObjectEnumerator]; NSMutableArray *funs = [NSMutableArray new]; NSString *funcName; while (funcName = [enu nextObject]) {if (! [funs containsObject:funcName]) { [funs addObject:funcName]; }} // 🌹 remove itself is the current touchesBegan method [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];  for (NSString *str in funs) { NSLog(@"%@",str); } / / 🌹 generates. Order file / / array into a string nsstrings * funcStr = [funcs componentsJoinedByString: @ "\ n"); / / the string written to the file / / file path nsstrings * filePath = [NSTemporaryDirectory () stringByAppendingPathComponent: @ "wj. Order"]. / / the file content NSData * fileContents = [funcStr dataUsingEncoding: NSUTF8StringEncoding];  [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil]; }Copy the code

Let’s see what happens

So once we get the.order we can just put it in the project root directory and use it.

Put the. Order File in the root directory of our project and set the. Order File path in Build Settings -> Order File

Use pre-sign order

Use post-symbolic order

reference

Performance optimization (I)APP startup optimization Tiktok r&d practice: IOS APP startup optimization (3) — make your own tool to monitor the startup time of your APP