Tiktok RESEARCH and development Practice: APP startup speed increased by more than 15% based on binary file rearrangement solution
A, principle
1. Virtual memory and physical memory
Early computers did not have virtual addresses. Once loaded, they would all be loaded into memory. Moreover, processes were arranged in order, so that other processes could access other processes simply by adding their own addresses
Now the development of the software than hardware, software, memory is more and more big, this leads to the computer’s memory is not enough to use, when open multiple software, if memory use can only wait for, can only load after the software off in front of open, this is why early computer sometimes only turn off the front of the software to open the cause of the new software;
And when the user uses the software, he does not use all the memory, but only a part of it. If the software is opened, he loads the software into the memory, which will waste the memory space
Based on the above reasons, virtual memory technology emerged. After the software is opened, the software itself thinks there is a large amount of memory space, but in fact it is virtual. Virtual memory and physical memory are related by a table, we can take a look at the following two tables
Process 1 run time will open up a piece of memory space, but access to the memory bank is not the memory space, and through the access address through process mapping table 1 is mapped to a different physical memory space, this is called address translation, this process needs the CPU and operating system, because the mapping table is the operating system to manage,
When we debugging found are continuous access to the data memory address, actually this is a sham, internal can access in this process, because when we access through the process of memory mapping table to get the real physical memory address, if access to other process, other processes without the corresponding mapping table, naturally can’t access to real physical memory address, This solves the memory security problem
Memory usage problem:
In memory paging management, the mapping table cannot be in bytes but in pages. Linux uses 4K as a page, iOS uses 16K bits as a page, but MAC system uses 4K as a page. We can enter pageSize in MAC terminal and find that 4096 is returned
Why is there enough memory after paging? Because the application memory is virtual, so when the program starts, it thinks it has a lot of memory. Let’s look at the following figure
In the application loading time won’t put all the data in memory, because the data is lazy loading, when process access to a virtual address, see page table, above all if it is found that the page table data is 0, data show that the page is not on the physical address, this time the system will block the process, this behavior is called the page interrupt (page Fault), also called missing page is unusual, Then put the disk in the corresponding page load data into memory, and then let the virtual memory refers to the physical memory just loaded, to load data into memory, if the free memory space, memory space is empty, if not, would have to cover the other processes of data, how covering operating systems have a specific algorithm, This will always ensure that the current process is used, which is flexible memory management.
However, there is a problem at this time, virtual memory solves the security and efficiency problem, but there is another security problem, because the virtual memory exists when the compilation link is determined, so it is easy for hackers to analyze the corresponding virtual memory to operate, so that all the code is good hook, code injection, And this is where a new technology, ASLR (Address space randomization), is that every time a process is loaded, it gives a random offset to ensure that virtual memory changes every time a process is loaded. IOS started with iOS4,
Binary reshoot:
In addition, iOS not only loads data into memory, but also performs signature authentication on the page. Therefore, iOS takes a long time, and the time of each page varies greatly, from 0.1ms to 0.8ms, which may not be noticed during use. But there’s going to be a lot of data to load at startup, and that’s going to take a lot of time, so if we’re on a different page when we start up, because the code in machO is not based on the moment of call, it’s based on where the file was compiled, it’s possible that page Fault will be called many times at runtime when we start up, So if we put all the startup code on one or two pages, we can optimize the startup speed a lot, which is called binary reshoot
It is unsafe for processes to have direct access to physical memory, so the operating system creates a layer of virtual memory on top of physical memory. In order to improve efficiency and convenient management, virtual memory and physical memory and paging (Page). When a process accesses a virtual memory Page that does not correspond to physical memory, a Page Fault is triggered, physical memory is allocated, and human data is read from disk MMAP if needed.
Page Fault also performs signature verification for apps distributed through the App Store, so a Page Fault takes longer than expected:
By default, when generating binary code, the compiler writes files in the order of the linked Object File(.o) and functions in the order of the functions inside the Object File.
Static library files. a is a set of.o files ar package, can be viewed with ar-T.
Default layout:
To simplify things: Let’s say we only have two pages: page1/page2, where the green method1 and method3 are called on startup, and the system must have two Page faults in order to execute the corresponding code.
But if we arrange method1 and method3 together, all we need is a Page Fault, which is the core principle of binary rearrangement.
After the rearrangement, our experience was to optimize a Page Fault and increase startup speed by 0.6~0.8ms.
Second, the implementation
1. System Trace debugging
We know that memory is divided into virtual memory and physical memory, and memory is managed by paging. When we start up, we call many methods, if these methods are not on the same page, A page fault occurs, and this operation is time consuming, so if all the methods are started on the same page, the startup time is greatly reduced. This requires binary rebeats to place all the methods called at startup on the same page
- First we open the project Command + I to open the Instruments debugging tool
- Select System Trace, the software can see the data for each thread in our project,
- Click Start. Here we search for Main Thread, select our app, click Main Thread, and then go to Main Thread –> Virtual Memory.
- File Backed Page In is the number of times the Page is In fault state
- After the APP is killed, the value of File Backed Page In becomes very small, indicating that even after the APP is killed, there is still a data In the cache of the system
- What is the real cold start? You can kill the APP, start the APP, and then start the APP, and find the File Backed Page In becomes large again
- Note The virtual memory exists in the system. If the system memory is insufficient, other apps will overwrite the virtual memory of the old APP
- Binary rebeats are generated at the link stage, and the executable is generated after reordering, so we can only optimize at compile time, not the generated IPAS
2. Binary rearrangement
So we can configure the binary retake in XCode, and first of all we need to determine the order of the symbols so we know how to do it, and XCode uses a linker called LD, and LD has a parameter called order_file, and as long as we have this file, we can tell XCode the path of the file, Write the order of the symbols in the order_file file, and when XCode is compiled it will be packaged into a binary executable according to the order of the symbols in the file.
You can find this file in apple’s objC4-750 source code
Open it in the following format:
It is full of function symbols. We open the project and search for order file in Build Setting
Note that the order file path is specified here, because once the order file path is specified here, XCode will compile in the order that the file is written in
We will now write a Demo and Compile it, and we know that when XCode is compiled the file will have a link to convert the.m file to.o file in the order of the files in Build Phases Compile SourceL, and then link the.O files together to create the executable.
We could do an experiment, write a load method in both ViewController and AppDelegate, and run it
+(void)load
{
NSLog(@"ViewController");
}
+(void)load
{
NSLog(@"AppDelegate");
}
Copy the code
Compile Source order for Build Phases:
Run, look at print:
Let’s change the order of Compile Source again
To run or print:
We find that the printing order is the same as the Compile Source file order, which verifies the above conclusion
To see the symbol order of the entire project, go to Build Settings and search for link Map
Link Map is the symbol table that we linked to, so we’ll change it to YES, so when we compile it it will write us the symbol table that we linked to, command + R we’ll run it, and then the.app file inside Products, In our Intermediates. Noindex –> project name. build– > debug-iphoneos –> project name. build– > project name-linkmap-normal-arm64.txt file, we have the linked symbol order table
Object files: refers to which. O files are linked
In the Sections:
-
Address:
-
The Size:
-
Segment: __TEXT Code Segment, readable only; __DATA is a data segment that can be read and written
-
Section:
And then there’s the notation we care about:
Symbols:
-
Address: The Address of the method code
-
Size: The space occupied by the method
-
File: indicates the File number
-
Name:.o Method symbol in the file
For Address, we take the executable file for the project from.app, open it with MachOView, and then look at the Assembly in Section
We found that the value of 0x100004B70 in the symbol table in MachOView is assembly code, that is, the code that we converted into assembly, so this address is the code address, so the binary retake is to reorder all the code, put the code that was called at startup first, Reduce the number of pages loaded at startup (none of them 16K)
To add order file, we create a hank.order file and write to it
Then put it in the root directory of the project, search for order file in Build Setting, and add the address of the file later
Linkmap-normal-arm64.txt = linkmap-normal-arm64.txt = linkmap-normal-arm64.txt = linkmap-normal-arm64.txt
We found that the order is in the order of symbols, and if the order contains method symbols that do not exist in the project, XCode will automatically filter out, there is no impact
Another way to view the symbol table is to CD to the project executable directory on the terminal and type
Nm Executable file nameCopy the code
This is to see all the symbols, as well as the symbols for custom methods
nm -Up TraceDemo
Copy the code
View system symbols
nm -up TraceDemo
Copy the code
3. Get all the methods called when the APP starts
So that’s the binary reshoot, but how do we know which methods are called when our APP starts?
We used to get call methods in the form of hook, but we need all the methods in hook project,
The first way is: The objc_msgSend function is used to hook the system with fishHook. Because oc methods are all in the form of sending messages, but the parameters of this function are variable parameters, so we can only hook in the form of assembly. However, initialize and block and direct function calls do not hook in this case
Second way: CLANG insert form: official document: CLang
OC methods, functions, and blocks can be hooked
-fsanitize-coverage=trace- PC-guard
-fsanitize-coverage=func,trace-pc-guard
Copy the code
2, then compile, we find that an error will be reported, prompting an error
Showing Recent Messages
Undefined symbol: ___sanitizer_cov_trace_pc_guard_init
Copy the code
We copy this code into the project and find that the error is gone
__sanitizer_cov_trace_pc_guard_init
Then let’s analyze __sanitizer_cov_trace_pc_guard_init, which has a start and stop function, and make a breakpoint. Let’s look at the start and stop values in memory.
Found in the start every 4 bytes are inside an array, and it is in accordance with the order of 1, 2, 3, 4, look again at the stop, because the stop literally means the end, in accordance with the rules of the start, we lose 4 bytes to look at, discovery is 13, this is because it is our project of the number of symbols in the custom file, Whether it’s a method, a function, or a block, it counts, so we can add a few more methods, or functions, or blocks, and we can verify that
__sanitizer_cov_trace_pc_guard
Let’s look again at __sanitizer_cov_trace_pc_guard
We found a lot of Guard printed when we ran it
Then we implement each gesture
-(void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
}
Copy the code
Click on the screen and find
To verify this, we define a function and a block that calls a function when we click on the screen. Let’s look again
void(^block1)(void) = ^(void) {
};
void test(){
block1();
}
guard: 0x100d8381c a PC
guard: 0x100d83814 8 PC
guard: 0x100d83810 7 PC
Copy the code
We found that once clicked, this function was called three times, so that proves it
To verify this, add breakpoints to toubegain, function, and block, then open assembly and run
The BL instruction means to call a method or a function to clear the breakpoint
Test is also called. Let me do that again
Block is also called, because when we configure Chang’s code coverage tool and then implement the above two functions, Clang inserts a line of code statically inside all methods and function blocks, and inserts it at the beginning of the first line, achieving global hook
If __sanitizer_cov_trace_pc_guard is used, add a breakpoint to the function
Then run
On the left we find a function call stack, and each time the method is called the __sanitizer_cov_trace_pc_guard function, which is called by the corresponding method
So we found a PC in our example code, so let’s add a breakpoint and print out the PC, and let’s go through all the functions that we started up and then open the breakpoint, and then click on the screen and trigger the touchesBegan method to intercept it
Then type bt in the control bar to take a look at the function call stack
Let’s look at the address 0x0000000104349abc
We see that this address is inside of a ‘touchesBegan,’ but it’s not at the beginning of a ‘touchesBegan,’ so let’s subtract four bytes from it
We’re going to put a breakpoint in the ‘touchesBegan’ method, and then we’re going to go to the ‘touchesBegan’ method, and we’re going to open up assembly
Because BL means call, we found that 0x104349AB8 is the beginning of the touchesBegan method, bl means call, that is 0x00000001000BdABC is the next address for the instruction to call the next function, and we found that the PC printed 0x104349ABC
Let’s look at the function call stack again
On the left side of the stack is the starting address of the last function, and there’s a +64 at the end, and the last number is the offset. That means the function’s starting position + the offset is the actual position of the function. At this point, the offset ‘touchesBegan’ is 44.
That’s the real implementation of the ‘touchesBegan,’ which is this part of the assembly
This means that __sanitizer_cov_trace_pc_guard gives us the first address of the next function call
Let’s look at the assembly call to __sanitizer_cov_trace_pc_guard
Every function or method has a return. In the low-level implementation, after each function call is completed, the address of the next function to be called is returned. That is, the address of the next instruction to be called is stored in X30 during each BL in the assembly. When the function hits ret, it returns from the x30 value. For example, when we click on the screen, we add a breakpoint to __sanitizer_cov_trace_pc_guard, and then read x30 data, and we get the address touchesBegan
So in __sanitizer_cov_trace_pc_guard
The address of __sanitizer_cov_trace_pc_guard is the address of the next function to be called. Since __sanitizer_cov_trace_pc_guard is executed before the hook function, the address of this function is the address of the hook function
Since we can get the address of the function, we can use this function to get the name of the function
#import <dlfcn.h>
dladdr(<#const void *#>, <#Dl_info *#>)
Copy the code
The first argument is the address of the function. The second argument is a structure pointer. Let’s look at the structure structure
typedef struct dl_info {
const char *dli_fname; /* Pathname of shared object */
void *dli_fbase; /* Base address of shared object */
const char *dli_sname; /* Name of nearest symbol */
void *dli_saddr; /* Address of nearest symbol */
} Dl_info;
Copy the code
Let’s print it out:
void *PC = __builtin_return_address(0);
Dl_info info;
dladdr(PC, &info);
printf("fname:%s \nfbase:%p \nsname:%s \nsaddr:%p\n", info.dli_fname, info.dli_fbase, info.dli_sname, info.dli_saddr); Print: fname:/private/var/containers/Bundle/Application/38C6E838-7D51-4546-9882-BF5858D08C16/TraceDemo.app/TraceDemo fbase:0x1000e0000 sname:-[ViewController touchesBegan:withEvent:] saddr:0x1000e5a0cCopy the code
So we know:
- Fname: indicates the file path
- Fbase: indicates the file address
- Sname: indicates the function symbol name
- Saddr: symbolic address of the function, that is, the start address of the function
When we can get the symbols of all the functions called by the project, we can use this method to get all the functions, methods, block symbols called when the APP starts, and then create the order file for automatic binary reshoot code:
Static OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT; Typedef struct {void * PC; void *next; }SYNode; void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { //if(! *guard)return; // Duplicate the guard check. /* Accurately locate where to start and where to end! Make a judgement to write a condition inside this! */ void *PC = __builtin_return_address(0); SYNode *node = malloc(sizeof(SYNode)); *node = (SYNode){PC,NULL}; OSAtomicEnqueue(&symbolList, node, offsetof(SYNode, next)) OSAtomicEnqueue(&symbolList, node, offsetof(SYNode, next)) // } -(void)createOrderFile{ NSMutableArray <NSString *> * symbolNames = [NSMutableArray array];while (YES) {
SYNode * node = OSAtomicDequeue(&symbolList, offsetof(SYNode, next));
if (node == NULL) {
break;
}
Dl_info info;
dladdr(node->pc, &info);
NSString * name = @(info.dli_sname);
BOOL isObjc = [name hasPrefix:@"+ ["] || [name hasPrefix:@"-"];
NSString * symbolName = isObjc ? name: [@"_"stringByAppendingString:name]; [symbolNames addObject:symbolName]; Emt = [symbolNames reverseObjectEnumerator]; / / to heavy NSMutableArray < > nsstrings * * funcs = [NSMutableArray arrayWithCapacity: symbolNames. Count]; NSString * name;while (name = [emt nextObject]) {
if(! [funcs containsObject:name]) { [funcs addObject:name]; }} // Kill yourself! [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]]; / / array into a string nsstrings * funcStr = [funcs componentsJoinedByString: @"\n"];
NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"hank.order"];
NSData * fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
[[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
NSLog(@"% @",funcStr);
}
Copy the code