\
IOS Cold Start Optimization – Binary rearrangement & Clang staking
\
1. The cold start
1.1 What is cold start?
Cold boot is a startup process that does not contain the data related to the application and must be loaded from disk to memory.
Note: Restarting the APP does not necessarily mean a cold boot.
- When the memory is insufficient and the APP is automatically killed by the system, the startup is cold startup.
- If the relevant data of the APP is still stored in the memory before the APP is reopened, then the APP is hot booted
- Cold startup and hot startup are determined by the system, we can’t decide.
- Of course, after the device restarts, the process of opening APP for the first time must be cold startup.
1.2 How Do I Calculate the Cold Startup Time?
Generally speaking, the statistics of APP startup time can be divided into two major stages with main function as the node:
- The code after main is written by ourselves, so we can count how long it takes to enter main to the first screen.
-
- Print the current time in main,
- Print the current time in the viewDidLoad method of the first controller to display
-
- The difference between the two times is the loading time after the main function.
- Before the main function, it is the pre-main stage. Since the system is doing things, we cannot directly count the time spent during this period, so we need to check the feedback from the system.
1.2.1 What did the pre-main stage do?
Let’s look at the time taken for the pre-main phase of the project.
- Looking at the feedback from the system requires adding an environment variable,
- Add path: in Xcode -> Edit Scheme -> Run -> Arguments -> Environment Variables
- Add an environment variable DYLD_PRINT_STATISTICS: 1.
Here is the loading time of my project:
The time-consuming process is divided into the following four parts:
- Dylib loading time: it takes less time to load the dynamic library because the dynamic library is optimized. Apple officially recommends no more than 6 external dynamic libraries, more than 6, you need to consider merging dynamic libraries, merged dynamic libraries for startup optimization, very effective. For example, there were eight or nine dynamic libraries in wechat in the early days, but now there are six.
- rebase/binding
- Rebase: indicates the offset correction time of the address.
-
- When the binary is compiled, each function has an address, which is the offset address relative to the binary.
- At startup, when the binary is being loaded into virtual memory, Apple has a security mechanism (ASLR) that adds a random offset value to the front of the entire binary.
-
- For example, the A function has an offset of 0x003 relative to the binary. At startup, the entire binary is assigned a random value 0x100. So the actual address of function A in memory is 0x003 + 0x100 = 0x103.
- Offset correction refers to the process of calculating the location of a method in virtual memory!
- Binding: Method binding for a dynamic library. This is the time it takes to bind a method name to its implementation.
-
- For example, when loading the NSLog method, you need to find the Foundation library first, and then find the implementation of the NSLog method in the library, and bind the method name and method implementation together.
- Objc setup Time: The time required to register all OC classes. The more classes, the more time required. It has been estimated that 20,000 custom OC classes take about 800 milliseconds. Delete unused classes to reduce time consumption.
- Initializer time: time taken by the load method and C++ constructor. Reduce overwriting the load method and try to defer things until after the main method to reduce time consuming.
- Slowest intializers: the slowest intializers are the following 6 libraries (the last one is my project).
1.2.2 Summary of time-consuming optimization methods in pre-main stage:
- Reduce the number of external dynamic libraries
- Delete unused classes and methods
- The class tries to use lazy loading, that is, try not to override the load method.
- Data loaded at startup uses multithreading
- Use pure code. No XIB storyboard(extra code parsing conversion and page rendering)
The above methods are closely related to their own project code optimization program. Different projects are specific implementation action is not the same.
There is another optimization method, no matter what the project, the implementation action is the same, effective for any project, that is binary rearrangement!
2. Binary rearrangement
Learning about binary rearrangement begins with knowing how data is loaded into memory.
☞How is data loaded into memory
\
We have already seen the process of loading data into memory. When a virtual memory page does not have a corresponding physical memory page, a PageFault occurs. During cold startup, there is no data in the physical memory, and a large number of page misses will occur, which will take a lot of time. Is there room for optimization here? Next comes the optimization: binary rearrangement!
\
Before we look at binary rearrangement, what is the order in which classes and their internal method implementations are placed when a project is compiled to generate binaries?
2.2.1 What is the sorting of methods implemented in binary files?
- So in viewController, let’s just write a couple of random methods.
\
- Take a look at the compile order of the source files
Next, check the Link Map file to check the symbol order.
- Open the link map
* * * *
- Compile the link Map file
- Find the Link Map file
- In the project directory, right-click the generated app, show in Finder
- Find your app’s upper directory
- Go to Intermediates. Noindex -> traceDemo. build -> Debug-iphonesimulator -> traceDemo. build -> TraceDemo-LinkMap-normal-x86_64.txt
- Open the Link Map file and find the name of your class and method
5. We can intuitively see the order of symbols in the Link Map. Classes are arranged from top to bottom in the compilation order of source files. Method names are in the order in which the methods in the class are written, from top to bottom.
2.2.2 Why is binary rearrangement necessary?
Load -> test2 -> viewDidLoad -> test1
But the order of symbols in binary files is the order in which methods are written from top to bottom, not the order in which they are called.
When loading binaries with cold start paging, you find that many pages have methods that are needed at startup, so even pages have methods that are not needed at startup, but because memory is paged, full page loading is required to load. As a result, a large number of methods that do not need to be executed in the pre-main phase are loaded into memory, increasing the startup time.
\
For example, a startup needs to load 100 pages, and each page can contain 20 methods. But only two methods per page are used after startup. This actually requires 2 * 100 = 200 methods to start, and if you put those 200 methods right next to each other, you only need 2 pages. That’s 98 pages less than 100. The time consuming will be greatly reduced.
2.2.3 How do I Perform Binary Rearrangement?
1. Binary rearrangement method
When a project is compiled to generate binaries, find the methods needed at startup and reorder them together, which is binary rearrangement.
Two key points: finding the method you need at startup & reordering the method
2. Method reordering:
Reordering is actually quite simple. Xcode already provides this mechanism for us. It uses a linker called LD, and LD has a parameter called Order File. We can configure the Order File to generate the symbol Order of the Link Map of binary files generated at compile time according to the Order we specify. And libobJC actually does binary rearrangement as well.
\
[Step 1] Create an xxx.order file in the project root directory, and write the names of the methods or functions in the order you want them to be arranged. (If you write a symbol that does not exist, it will not report an error and will be automatically filtered out.)
[Step 2] Search for the order file file in Build Settings. Set the file created in the project root directory.
Recompile and check the order of the Link Map file. Sure enough, it is arranged in the order we specified.
3. Static piling – Find all methods for cold start
Next, all we need to do is write the symbols in the order file. We cannot write all the symbols needed to start the execution. All symbols here include called methods, functions, C++ constructors, swift methods, blocks.
The simple code coverage tool built into LLVM is used here. It inserts calls to user-defined functions at the edge, function, and base block levels.
edge
(Default) :Edge detectionAll instruction jumps are inserted into calls to user-defined functions, such asLoop, branch judgment, method function, etc.).bb
:Detect the base block.
func
: Will only be detectedeach Function input block(This is the symbol we want to reorder).
According to the documentation,
- Set Other C Flags/ Other C++ Flags to ****-fsanitize-coverage=func,trace-pc-guard Otherwise it will create an endless loop).
- If swift exists, set Other Swift Flags to ****-sanitize-coverage= fun-sanitize =undefined
- The compiler will insert a call to the module constructor, so we’ll implement this method:
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);
Copy the code
Starting from the start address and the first four digits of the stop address, the value stores the number from 1 to 19.
From this function, we can know the number of custom function input blocks in the current project.
- At the beginning of each func call, the compiler inserts the following code when generating the binary:
__sanitizer_cov_trace_pc_guard(&guard_variable)
Copy the code
That is, the above method is called when each method is executed. The following:
-
-
- We’re going to implement this method, and in this method, we’re going to get the address that we’re going to return when this method ends
-
Void *PC = __builtin_return_address(0); void *PC = __builtin_return_address(0);Copy the code
-
-
- And the address stored in a system of atomic queue ((the bottom is actually a stack structure, using queue structure + atomicity to ensure the order)), the use of atomic queue, is to prevent multi-thread resource grab. The storage method of atomic queue is as follows:
-
// Store the structure into the atomic queue. // offsetof(type,member) returns the offsetof the member in the structure, 8 bytes since the pointer PC is 8 bytes. OSAtomicEnqueue(&SymbolList, node, offsetof(SYNode, next));Copy the code
\
The first address of each SYNode is the number of bytes from the last offset of the PC. The nice thing about this is that the next address of each SYNode happens to be the address of the next structure. This makes it easy to get all the data in the queue.
- [Step 4] We are in the event of clicking the screen
-
- Iterate over the address stored in the atomic queue,
- Gets the name of the method where the current address is based on the address and stores it in an array.
typedef struct dl_info { const char *dli_fname; /* file */ void *dli_fbase; /* file address */ const char *dli_sname; /* symbol name */ void *dli_saddr; /* function start address */} Dl_info; Int dladdr(const void *, Dl_info *);Copy the code
-
- Since the atomic queue is a stack structure, we need to arrange the array in reverse order
- Since methods may be called multiple times, we need to de-duplicate them
-
- Delete the last method that we currently click on the screen
- Convert an array of method names to a string and write it to a sandbox file
\
The complete code is as follows:
// // viewController.m // TraceDemo // // Created by Hank on 2020/3/16. // Copyright © 2020 Hank. All Rights Reserved. // #import "ViewController.h" #import <dlfcn.h> #import <libkern/OSAtomic.h> #import "TraceDemo-Swift.h" @interface ViewController () @end @implementation ViewController +(void)initialize { } void(^block1)(void) = ^(void) { }; void test(){ block1(); } +(void)load { } - (void)viewDidLoad { [super viewDidLoad]; [SwiftTest swiftTestLoad]; test(); } -(void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event { NSMutableArray <NSString *> * symbolNames = [NSMutableArray array]; while (YES) { SYNode * node = OSAtomicDequeue(&symbolList, offsetof(SYNode, next)); if (node == NULL) { break; } Dl_info info; dladdr(node->pc, &info); NSString * name = @(info.dli_sname); BOOL isObjc = [name hasPrefix:@"+["] || [name hasPrefix:@"-["]; NSString * symbolName = isObjc ? name: [@ "_" stringByAppendingString: name], [symbolNames addObject: symbolName];} / / invert NSEnumerator * emt = [symbolNames ReverseObjectEnumerator]; / / to heavy NSMutableArray < > nsstrings * * funcs = [NSMutableArray arrayWithCapacity: symbolNames. Count]; NSString * name; while (name = [emt nextObject]) { if (![funcs containsObject:name]) { [funcs addObject:name]; } } // Remove this method [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]]; // Convert an array to a string NSString * funcStr = [funcs componentsJoinedByString:@"\n"]; NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"demo.order"]; NSData * fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding]; [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil]; NSLog(@"%@",funcStr); } void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; // Counter for the guards. if (start == stop || *start) return; // Initialize only once. printf("INIT: %p %p\n", start, // Guards should start from 1.} // Static OSQueueHead Typedef struct {void * PC; void *next;}SYNode; void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { This address is contained within the method being hooked, Void *PC = __builtin_return_address(0); SYNode *node = malloc(sizeof(SYNode)); *node = // Enter OSAtomicEnqueue(&symbolList, node, offsetof(SYNode, next));} @endCopy the code
2.2.4 How to Verify the Effect of binary rearrangement?
1. Check the number of abnormal missing pages Page Fualt:
- Look at the number of missing page exceptions for your project. Note that you need to uninstall the APP or restart the phone to ensure that the APP is not loaded into memory at all, because if the APP’s data is stored in physical memory,
- Open Instrument -> System Trace
3. Select the real machine and project and click Start. When the first page is displayed, click Stop.
4. Search for main thread and select Virtual Memory. File Backed Page in is the number of Page missing exceptions
Before optimization: The number of missing pages of the project is 427
After the optimization:
Before optimization: the number of missing pages of the project is 286
At this point, the article ends ~ reducing the page missing exception at startup by approximately 40% ~