Do you get mad when you click on an APP from your mobile desktop and wait until the page doesn’t show up, or even uninstall it? Therefore, in terms of user experience and retention, good startup speed is a must.
One: Startup type
“Cold start” and “hot start”
- Cold start:
App
Click before startup at this timeApp
Is not in the system yet. The system needs to create a new process and assign it toApp
. This is a complete oneApp
Startup process).
The best startup time is within 400ms, because the startup animation time is 400ms.
- Thermal activation:
App
After a cold boot the user willApp
Back in the backgroundApp
The process is still in the system. User returnApp
In the process. (Hot starts do less).
What we call startup optimization is also forAPP
Cold start said.
Cold start
It is mainly divided into three stages:
main()
Before function execution (pre-main phase).main()
After the function executes (frommain
The function executes to the Settingsself.window.rootViewController
Execution completed.- After rendering the first screen (from
self.window.rootViewController
Execution completed todidFinishLaunchWithOptions
Method scope ends).
Startup time (Pre-main phase)
In the pre-main phase we can get the elapsed time by adding environment variables.
Select Project→Scheme→Edit Scheme from the Xcode menu… Run → Environment Variables →+ add the Environment variable DYLD_PRINT_STATISTICS whose name is DYLD_PRINT_STATISTICS value 1.
Below is the actual project launch time:
Reading:
The main() function used 1.7 seconds in total
-
Dylib loading time: the dynamic library loading time is 64.05ms
-
Loading dynamic libraries will certainly take time, and dynamic libraries will have dependencies. The system dynamic inventory lies in the shared cache, but the custom dynamic library cannot do the shared cache, so it will consume more time. Therefore, Apple officially recommends no more than six custom dynamic libraries, which can be merged to optimize the loading time of dynamic libraries.
-
Dynamic library merge, need source code to carry out. So we can only merge our own dynamic libraries, the daily use of three party SDK may not be able to merge.
-
-
Rebase/Binding time: pointer offset correction/symbol binding takes 213.88ms
-
Rebase: The system uses ASLR technology to ensure randomization of address space. So at runtime, you need to reposition symbols via Rebase, using ASLR+ offset addresses;
-
Binding: Use external symbols. Function addresses cannot be found at compile time. So at runtime, dyLD loads the shared cache, loads the linked dynamic library, and then performs a binding operation to rebind the external symbols.
-
-
ObjC Setup Time: Class initialization time 789.76
-
OC class registration process, read the binary data segment to find information about OC, and then register OC class. When the application is started, the system will generate two tables of class and classification. OC class and classification registration will be inserted into these two tables, so it will cause a certain amount of time consumption.
-
This time is difficult to optimize, except by reducing the definition of classes and categories in the project;
-
Reduce the use of load methods for classes and their own classes, and let classes load lazily.
-
-
Initializer time: It takes 726.45 to execute the load and constructor
- Use whenever possible
initialize
Methods to replaceload
Methods.
- Use whenever possible
The whole process is shown below:
tool
Time Profiler
The Call Tree->Hide System Libraries filter out the System Libraries to see the time taken by methods in the main thread.
At the bottom of the state
But Time profilers are only good for coarse-grained analysis. Let’s see how it works:
By default, Time Profiler samples every 1ms, collecting only the call stack in the running thread, and finally summarizing it in a statistical manner. For example, method3 is not sampled in any of the five samples shown in the figure below, so method3 is not seen in the stack aggregated. So the Time seen in the Time Profiler is not the actual Time when the code is executed, but the Time when the stack appears in the sample statistics.
System Trace
System Trace can support refined analysis.
Since we want to refine the analysis, we need to mark a short period of time, which can be marked by Point of interest. In addition, System Trace is useful for analyzing virtual memory and thread state:
- Virtual Memory: Focus on the Page In event, because there are many Page In the boot path and it is relatively time-consuming
- Thread State: Focus on pending and preemption states, keeping in mind that the main Thread is not always running
- System Load threads have a priority, and the number of high-priority threads should not exceed the number of System cores
Practice 2:
The pre – the main stage
Reducing dynamic libraries
Reducing the number of dynamic libraries can also reduce the time required to create and load dynamic libraries during the startup closure phase. It is recommended that the number of dynamic libraries be less than 6.
The recommended approach is to switch from a dynamic library to a static library because of the additional reduction in package size. Another way is to merge dynamic libraries, but it is not feasible in practice. Finally, don’t link to libraries (including the system) that you don’t need, because it will slow down the creation of closures.
Tips: How to view dynamic libraries?
- In the project
Product
Folder find our project.app
File, right clickShow in Finder
. - Go to the corresponding directory and right-click
Display package contents
. - find
Frameworks
Folder, open.
Remove useless classes and code
The offline code can reduce the time it takes to initialize Rebase & Bind & Runtime, and of course you can try merging classes and extensions with similar functionality (categories).
AppCode can be used to detect useless code.
+ load migration
In addition to the time-consuming method itself, load can cause a lot of Page In
-
Option 1: If possible, put the +load content after rendering.
-
Use “+initialize()” instead of “+load()”. Use “dispatch_once()” instead of “+initialize()”.
Statically initiate migration
Static initialization, like the +load method, also causes a lot of Page In, usually from C++ code, reducing C++ static global variables.
After the main
Image resources
There are a lot of images to boot. Is there a way to optimize the loading time of images?
Use Asset to manage images instead of putting them directly in bundles. Asset is optimized at compile time to make loading faster, and loading images in Asset is faster than loading images in bundles because UIImage imageNamed has to walk through the Bundle to find the image. Loading the Asset diagram takes most of the time in the first search, because indexing can be reduced by putting the starting diagram into a small Asset.
Every time you create a UIImage you need IO, which is decoded during the first frame rendering. So you can optimize this time by preloading the child threads (creating uiImages) ahead of time.
As shown below, images are only used in the later stages of “RootWindow creation” and “first Frame render”, so you can start tasks early in the startup process by opening preloaded child threads.
Phased loading
DidFinishLaunchingWithOptions there must be some initialization, here the initialization is must do, but we can be appropriate depending on the function of the corresponding appropriate delay start time. For our project, I have divided initialization into three types:
- Logs, statistics, and other events that must be configured first when the APP moves together
- Events such as project configuration, environment configuration, user information initialization, push, and IM
- Other SDK and configuration events
For the first category, due to the particularity of this kind of event, so must be started in the first place, still leaves it in a didFinishLaunchingWithOptions started. The second type of events, these functions must be loaded before the user enters the main body of the APP, so we can put it in the second batch, that is, the user has seen the AD page, and then start the countdown of the AD. The third type of event, since it is not required, can be placed in the viewDidAppear method after the first interface has rendered, which has no impact on startup time at all.
The home page UI uses pure code
Instead of loading the home UI in a storyboard in pure code, the storyboard is more resource-intensive to start up.
Three: binary rearrangement
Physical memory and virtual memory
Virtual memory vs. physical memory
Physical address The physical memory refers to the memory on the memory module. In the early days, all the data of a process is loaded into the physical memory, and the CPU accesses the process data through the physical memory address. This approach leads to the following problems:
Out of memory
: If too many applications are started, the memory space is insufficient.Waste of memory
: When an application becomes larger and larger, users may only use part of its functions. If all functions are loaded into the memory, memory usage will be wasted.Memory data security issues
: You can directly modify the data in the physical memory by accessing the physical address.
In order to solve these problems of physical memory, the CPU can access process data indirectly through virtual memory rather than directly through physical memory address.
Virtual memory Virtual memory is an intermediate layer between processes and physical memory. It is generated by the system and managed by paging. The structure of virtual memory is shown as follows:
One virtual memory corresponds to one process, and the size is 4GB. The virtual memory is divided into many pages. The size of each page is 16kb on iOS and 4kb on other systems. The virtual memory address and physical memory address of a certain data item in a process are recorded for each cell in a Page. Therefore, virtual memory is essentially a mapping table of the virtual memory address and physical memory address associated with various data items in a process.
When virtual memory is used, the CPU accesses process data as follows:
- After the process is started, the system creates a corresponding virtual memory for the process, which records the virtual memory address of each item of data in the process. At this time, the process has not been loaded into the physical memory
page
The physical memory address of the recorded data is 0x00000… . - When part of the process is active,
CPU
Locate the corresponding physical memory address based on the virtual memory address of the data, and access the data in the physical memory through the physical address. - If the
page
Is displayed when the corresponding physical address is not foundpage
Triggered when the process data associated with is not loaded into physical memoryPage Fault
, interrupts the current process, first loads the process data corresponding to the current page into physical memory, and thenpage
It records the physical address of each piece of data,CPU
Then access the data in memory by physical address.
Therefore, virtual memory has the following advantages over direct access to physical memory:
More efficient memory usage
: The process’s data will only be active after pagingpage
The associated data is loaded into physical memory. When physical memory is occupied, inactive memory is overwritten and active memory is loadedpage
Data, which can improve the efficiency of memory use.Memory data is more secure
: Each time a process is started, the system recreates the corresponding virtual memory and allocates one virtual memoryASLR Address Space Layout Randomization
, the virtual address of the data is:ASLR random value + offset value
, so that the virtual address of the data changes each time, andCPU
Physical memory is accessed indirectly through virtual memory. The physical memory address is not exposed during this process, so memory data is guaranteed to be secure.
The program’s code is loaded into virtual memory at the same address every time without modification, which is not safe. In order to solve the problem of fixed address, ASLR technology emerged.
ASLR
ASLR (Address Space Layout randomization) It is a security protection technology against buffer overflow. By randomizing the layout of linear areas such as heap, stack and shared library mapping, and increasing the difficulty of the attacker to predict the destination address, it can prevent the attacker from locating the attack code directly, so as to prevent overflow attacks.
Most major operating systems already implement ASLR:
Linux
: in kernel version2.6.12
addASLR
;Windows
:Windows Server 2008
,Windows 7
,Windows Vista
,Windows Server 2008 R2
Is enabled by defaultASLR
, but it only applies to dynamically linked libraries and executables;Mac OS X
:Apple
inMac OS X Leopard10.5
(In 2007,
Some libraries import random address offsets, but the implementation does not provide themASLR
Complete protection capability as defined. whileMac OS X Lion10.7
Is provided for all applicationsASLR
Support.Apple
Claims of improved support for the technology for applications can make32
andA 64 - bit
More of these attacks are avoided. fromOS X Mountain Lion10.8
At first, core and core expansion (kext
) andzones
It is also randomly configured during system startup.iOS
(iPhone
,iPod touch
,iPad
) :Apple
iniOS4.3
To import theASLR
;Android
:The Android 4.0
Provides random loading of address space configuration (ASLR
) to help protect the system and third party applications from attacks due to memory management problems inThe Android 4.1
To add address independent code (position-independent code
).
When the system accesses the virtual memory, data is not loaded into the physical memory. As a result, a Page Fault occurs and processes are blocked. In this case, the system loads data into physical memory before the process can continue running. Although each page of data is loaded into memory very quickly, in milliseconds, there may be a large number of missing page interrupts when the application is cold booted, resulting in a certain amount of time consumption in startup speed.
The principle of binary rearrangement
Let’s start with an example
#import "ViewController.h"
@interface ViewController ()
@end
@implementation ViewController
+ (void)test1 {
NSLog(@"test1");
}
- (void)viewDidLoad {
[super viewDidLoad];
[self test2];
}
- (void)test2 {
NSLog(@"test2");
}
+ (void)load {
[self test1];
}
@end
Copy the code
Once this code starts, the order of execution of the methods is load, test1,viewDidLoad,test2
But what is the actual order of symbols?
Link Map FileThe file saves the symbol order of the project when compiling the link. It is arranged in the unit of method/function and configured accordingly. You can view the symbol order as shown in the following figure:
Let’s compare the order in which the code was edited to the order in which the code ended up
We find that the code is compiled in the same order as the final code, and that the code in each file is written from top to bottom, meaning that the final code is compiled first and the methods in each file are written in the same order.
This ordering also results in the code being distributed across multiple pages at startup, with startup methods not clustered together, resulting in a large number of broken pages.
All a binary rearrangement does is line up all the methods that need to be called at startup time.
View PageFault times
Open the Instruments:
Click Start to stop the Page after the home Page comes out, select Main Thread, select Virtual Memory,File Backed Page in 2451.
Clang
Insert the pile
Clang documentation is a technology apple already uses, but the key is a.order file, and the linker will eventually sort methods by the order of symbols in that file.
Below isobjc4-750
In the.order
Symbols in files
Step 1: Build Setting configuration
Using the above demo, we create a.order file in the project directory and add the following symbols, which are in the same order as the actual methods in the demo. Let’s recompile and look at the link file
+[ViewController load]
+[ViewController test1]
-[ViewController viewDidLoad]
-[ViewController test2]
Copy the code
The final sign order is the same as we did in dot order
So how do I get the project to load the.order file?
Perform the following configuration:
In Build Setting –> Other C Flags, add -fsanitize — coverage= trace-PC-guard
-fsanitize-coverage= trace-PC-guard
As we know from the above, OC uses Clang (front end) +LLVM (back end), and Clang is used for lexical analysis, syntax analysis, syntax tree generation and other operations in the early stage of the code. Adding -fsanitize-coverage=trace-pc-guard tells Clang that we need to have the ability to track methods, Clang inserts the __sanitizer_cov_trace_pc_guard function at the edge of the method (both function and Block), resulting in each method call coming to the __sanitizer_cov_trace_pc_guard method, completing the pin, Equivalent to a HOOK operation at compile time.
Step 2: Add auxiliary code
Add __sanitizer_cov_trace_pc_guard_init and __sanitizer_cov_trace_pc_guard as documented
Build project normally will report error, comment out.
Add these two methods as shown in the documentation
#import "ViewController.h" #include <stdint.h> #include <stdio.h> #include <sanitizer/coverage_interface.h> @interface ViewController () @end @implementation ViewController - (void)viewDidLoad { [super viewDidLoad]; // 🌹 void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; // Counter for the guards. if (start == stop || *start) return; // Initialize only once. printf("INIT: %p %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) *x = ++N; // Guards should start from 1.} // 🌹 *guard) return; // void *PC = __builtin_return_address(0); 🌹 char PcDescr[1024]; // __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr)); 🌹 printf("guard: %p %x PC %s\n", guard, *guard, PcDescr); }Copy the code
__sanitizer_cov_trace_pc_guard_init
The compiler inserts this callback into each DSO as a module constructor. Start and stop correspond to the beginning and end of the entire binary (executable or DSO), that is, the function reflects the number of symbol calls.
start:0x104ded4c0
stop:0x104ded4f8
stop
Address subtracted from the basis4 bytes
Is the value of the last address, fromstart
tostop
The store is1-14
Whenever a new method (function, Block) is added to the code through multiple operations, it will+ 1
.
__sanitizer_cov_trace_pc_guard
Add the touchesBegan method, and look at the breakpoint where it was printed, and open assembly
-(void)touchesBegan:(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@"touchesBegan method "); }Copy the code
The __sanitizer_cov_trace_pc_guard function will be opened before the method is executed. Added -fsanitize- Coverage = trace-PC-guard.
Step 3: Get the symbol and generate.order
file
Void *PC = __builtin_return_address(0); void *PC = __builtin_return_address And add the corresponding breakpoint.
Print the PC at the address 0x0000000102355eb8
The corresponding stack information is printed
Open assembly and take a look
Clang inserts the __sanitizer_cov_trace_pc_guard function at the edge of each method. If __sanitizer_cov_trace_pc_guard is executed, it must return this method. The address of __sanitizer_cov_trace_pc_guard is 0x0000000102355eb8, which is returned to main.
__builtin_return_address
The getaddress () function returns the current return address of the callermain
)
Next we import
typedef struct dl_info {
const char *dli_fname; /* Pathname of shared object */
void *dli_fbase; /* Base address of shared object */
const char *dli_sname; /* Name of nearest symbol */
void *dli_saddr; /* Address of nearest symbol */
} Dl_info;
Copy the code
thisdli_sname
Is not the symbol we want!!
Save symbols to fetch through atomic queues
#import "ViewController.h" #include <stdint.h> #include <stdio.h> #include <sanitizer/coverage_interface.h> #import <dlfcn.h> #import <libkern/ osatomic.h > @interface ViewController () @end@implementation ViewController ( OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT; // typedef struct {void * PC; // Save the PC address void *next; }WJNode; Void test1(void){NSLog(@"test1 call "); testBlock(); } void (^ testBlock) (void) = ^ (void) {NSLog (@ "block calls"); }; - (void)viewDidLoad { [super viewDidLoad]; test1(); } void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; if (start == stop || *start) return; printf("INIT: %p %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) *x = ++N; } void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { if (! *guard) return; void *PC = __builtin_return_address(0); /* const char *dli_fname; void *dli_fbase; const char *dli_sname; void *dli_saddr; */ Dl_info info; dladdr(PC, &info); WJNode *node = malloc(sizeof(WJNode)); *node = (WJNode){PC,NULL}; // Add OSAtomicEnqueue(&symbolList, node, offsetof(WJNode, next)); } -(void)touch began :(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@' touchesBegan '); while (YES) { WJNode *node = OSAtomicDequeue(&symbolList, offsetof(WJNode, next)); if (node == NULL) { break; } Dl_info info = {0}; dladdr(node->pc, &info); printf("%s \n",info.dli_fname); }}Copy the code
The problem is that the while loop is also hooked
-fsanitize-coverage=func,trace-pc-guard
The problem was solved, but it didn’t seem to be completely solved. Methods were repeated, and main lacked underscores.
-(void)touchesBegan:(NSSet< uittouch *> *)touches withEvent:(UIEvent *)event {NSLog(@"touchesBegan method "); NSMutableArray<NSString *> *symbolArray = [NSMutableArray new]; while (YES) { WJNode *node = OSAtomicDequeue(&symbolList, offsetof(WJNode, next)); if (node == NULL) { break; } Dl_info info = {0}; dladdr(node->pc, &info); NSString *name = @(info.dli_sname); / / underline BOOL isObjc = [name hasPrefix: @ "+ ["] | | [name hasPrefix: @" - ["]; nsstrings * symbolName = isObjc? Name: [@"_" stringByAppendingString:name]; [symbolArray addObject:symbolName]; Enu = [symbolArray reverseObjectEnumerator]; NSMutableArray *funs = [NSMutableArray new]; NSString *funcName; while (funcName = [enu nextObject]) {if (! [funs containsObject:funcName]) { [funs addObject:funcName]; }} // 🌹 remove itself is the current touchesBegan method [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]]; for (NSString *str in funs) { NSLog(@"%@",str); } / / 🌹 generates. Order file / / array into a string nsstrings * funcStr = [funcs componentsJoinedByString: @ "\ n"); / / the string written to the file / / file path nsstrings * filePath = [NSTemporaryDirectory () stringByAppendingPathComponent: @ "wj. Order"]. / / the file content NSData * fileContents = [funcStr dataUsingEncoding: NSUTF8StringEncoding]; [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil]; }Copy the code
Let’s see what happens
So once we get the.order we can just put it in the project root directory and use it.
Put the. Order File in the root directory of our project and set the. Order File path in Build Settings -> Order File
Use pre-sign order
Use post-symbolic order
reference
Performance optimization (I)APP startup optimization Tiktok r&d practice: IOS APP startup optimization (3) — make your own tool to monitor the startup time of your APP