This is the 27th day of my participation in the August Genwen Challenge.More challenges in August
In the previous two articles, I have introduced some basic concepts and ideas for starting optimization. Now I will focus on an optimization scheme for the pre-main stage, namely binary rearrangement.
1. Principle of binary rearrangement
In the virtual memory section, we know that when a process accesses a virtual memory page where the corresponding physical memory does not exist, a Page Fault is triggered, thus blocking the process. At this point, the data needs to be loaded into physical memory and then accessed again. This has some impact on performance.
Based on Page Fault, we think that in the process of cold startup of App, there will be a large number of classes, categories, and third parties that need to be loaded and executed, and the resulting Page Fault will take a lot of time. Taking WeChat as an example, let’s take a look at the number of Page faults in the startup stage
CMD+i
Shortcut key, selectSystem Trace
- Click Start (you need to restart the phone and clear the cache data before starting), stop the first interface, and follow the operation in the following figure
It can be seen from the figure that WeChat has occurred PageFault 2800+ times. It can be imagined that this isVery bad performance
.
- Then we’ll use the Demo to see how the methods are sorted at compile time, and define the following methods in the following order in the ViewController
@implementation ViewController
void test1(){
printf("1");
}
void test2(){
printf("2");
}
- (void)viewDidLoad {
[super viewDidLoad];
test1();
}
+(void)load{
printf("3");
test2();
}
@end
Copy the code
- in
Build Setting -> Write Link Map File
Set toYES
- CMD+B compiles demo, and then looks in the corresponding path
link map
The file, shown below, can be found in the classFunctions are loaded from top to bottom
, andfile
The order is based onBuild Phases -> Compile Sources
In order to load
From the number of Page faults and loading order, we can see that in factThe root cause of too many Page faults is that the methods that need to be called at startup time are in different pages
. Therefore, our optimization idea is:Lining up all the methods that need to be called at startup time, i.e. on a single Page, turns multiple Page faults into a single Page Fault
. This is binary rearrangementCore principles
, as shown below
Note: iOS will also reload an app in production if a Page Fault occursSignature verification
Page faults in the iOS production environment take more time than the Debug environment.
2. Binary rearrangement practice
Now, let’s do some concrete practice, first understand some nouns
Linkmap is an intermediate product of iOS compilation and records the layout of binary files. You need to enable the Write Link Map File in Xcode’s Build Settings. The Link Map consists of three parts:
Object Files
The path and file number of the link unit used to generate the binarySections
Record the range of addresses for each Segment/section in Mach-OSymbols
Record the address range of each symbol in order
ld
Ld is the linker used by Xcode and has an order_file parameter. We can configure a File path with the suffix Order by setting it to Build Settings -> Order File. In this order file, the required symbols are written in the order in which they are loaded when the project is compiled to achieve our optimization
So the essence of binary rearrangement is to rearrange the symbols that start loading.
If the project is small, it is possible to customize an order file and manually add the order of methods. However, if the project is large and involves many methods, how do we get the function to start running? There are several ideas
1, the hook objc_msgSend
As we know, the essence of a function is to send a message that will come at the bottomobjc_msgSend
, but because the objc_msgSend parameter is mutable, it needs to passassembly
Acquisition, higher requirements for developers. And you can only get itOC
And the swift,@objc
Methods after2. Static scanning
Scanning:Mach-O
Symbol and function data stored in a particular section or section3. Clang piling
: batch hook, can achieve 100% symbol coverage, that is, full accessSwift, OC, C, block
function
3. The Clang of pile
LLVM comes with a simple code coverage test built in. It inserts calls to user-defined functions at the function level, base block level, and edge level. Santizer coverage is needed for our batch hook here.
The official documentation for clang’s pile coverage is as follows: The clang code Coverage tool documentation provides a detailed overview, as well as a brief Demo.
- [Step 1: Configuration] Enable
SanitizerCoverage
- OC project, need to be in:
In the Build Settings
In the”Other C Flags
“Add-fsanitize-coverage=func,trace-pc-guard
- In case of Swift project, additional information in”
Other Swift Flags
“Add-sanitize-coverage=func
和-sanitize=undefined
- All binaries linked to the App need to be turned on
SanitizerCoverage
In order to fully cover all calls. - Also through
podfile
To configure the parameters
- OC project, need to be in:
post_install do |installer| installer.pods_project.targets.each do |target| target.build_configurations.each do |config| config.build_settings['OTHER_CFLAGS'] = '-fsanitize-coverage=func,trace-pc-guard' config.build_settings['OTHER_SWIFT_FLAGS'] = '-sanitize-coverage=func -sanitize=undefined' end end endCopy the code
- Create a new OC file
CJLOrderFile
Override two methods-
__sanitizer_cov_trace_pc_guard_init method
- Parameter 1
start
Is a pointer to an unsigned int, 4 bytes long, equivalent to an arrayThe starting position
, the starting position of the symbol (read from high to low)
- Argument 2 stop, since the address of the data is read down (i.e
Read from high to low
, so the address is not the real address of stop, but the last address marked. When reading stop, because stop takes up 4 bytes,Stop Real address = stop printed address -0x4
)
- What does the value stored in the stop memory address represent? When adding a method/block /c++/ attribute to a method (three more), find that its value is also increased by the corresponding number, such as adding a test1 method
- Parameter 1
-
The __sanitizer_cov_trace_pc_guard method captures all symbols at the start time, enqueuing all symbols
- parameter
guard
It was a sentinel,Tell us which number was called
- The storage of symbols requires a help
The list
, so you need to define the linked list nodeCJLNode
. - through
OSQueueHead
Atomic queues are created to ensure read and write security - through
OSAtomicEnqueue
Methods the nodeThe team
The next symbol is accessible through the next pointer to the list
- parameter
-
Static OSQueueHead queue = OS_ATOMIC_QUEUE_INIT; Typedef struct {void * PC; typedef struct {void * PC; void *next; }CJLNode; /* -start: start position -stop: not the address of the last symbol, but the address of the last symbol in the entire symbol table =stop-4 (because stop is an unsigned int, 4 bytes). Void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {static uint64_t N; if (start == stop || *start) return; printf("INIT: %p - %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) { *x = ++N; }} /* Can fully hook methods, functions, and block calls to capture symbols, is multithreaded, this method only stores PC, in the form of a linked list - guard is a sentry, */ void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {// if (! *guard) return; // get PC /* -pc the current function returns the address of the previous call -0 the current function address, i.e. the return address of the current function -1 the address of the current function caller, */ void *PC = __builtin_return_address(0); // Create a node and assign CJLNode *node = malloc(sizeof(CJLNode)); *node = (CJLNode){PC, NULL}; // The symbol is not accessed by the subscript, but by the next pointer to the list, so we need to borrow offsetof (structure type, Next) OSAtomicEnqueue(&queue, node, offsetof(CJLNode, next)); }Copy the code
- Step 3: Get all symbols and write them to a file
-
The while loop fetches symbols from the queue, processes prefixes for non-OC methods, and stores them in an array
-
The array is reversed because the queue is stored in reverse order
-
The array is de-weighted and the symbol of the method itself is removed
-
Convert the symbols in the array to a string and write it to the cjl.order file
-
extern void getOrderFile(void(^completion)(NSString *orderFilePath)){ collectFinished = YES; __sync_synchronize(); NSString *functionExclude = [NSString stringWithFormat:@"_%s", __FUNCTION__]; Dispatch_after (dispatch_time (DISPATCH_TIME_NOW, (int64_t) (0.01 * NSEC_PER_SEC)), dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{// Create symbol array NSMutableArray<NSString *> *symbolNames = [NSMutableArray array]; While (YES) {CJLNode *node = OSAtomicDequeue(&queue, offsetof(CJLNode, next)); if (node == NULL) break; // Save the PC to info Dl_info info; dladdr(node->pc, &info); // printf("%s \n", info.dli_sname); If (info.dli_sname) {// if (info.dli_sname) {if (info.dli_sname) {// If (info.dli_sname) {// If (info.dli_sname); BOOL isObjc = [name hasPrefix:@"+["] || [name hasPrefix:@"-["]; NSString *symbolName = isObjc ? name : [@"_" stringByAppendingString:name]; [symbolNames addObject:symbolName]; } } if (symbolNames.count == 0) { if NSEnumerator *emt = [symbolNames reverseObjectEnumerator]; (completion) {completion(nil); / / to heavy NSMutableArray < > nsstrings * * funcs = [NSMutableArray arrayWithCapacity: symbolNames. Count]; nsstrings * name; the while (name = [funcs nextObject]) {if (![funcs containsObject:name]) {[funcs addObject:name];}} // Remove itself [funcs RemoveObject: functionExclude]; / / the array into a string nsstrings * funcStr = [funcs componentsJoinedByString: @ "\ n"); NSLog(@"Order:\n%@", FuncStr); / / the string written to the file nsstrings * filePath = [NSTemporaryDirectory () stringByAppendingPathComponent: @ "CJL. Order"]; NSData *fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding]; BOOL success = [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil]; if (completion) { completion(success ? filePath : nil); } }); }Copy the code
- Step 4: In
didFinishLaunchingWithOptions
Note that the location of the call is up to you, and is generally the first interface to render
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
[self test11];
getOrderFile(^(NSString *orderFilePath) {
NSLog(@"OrderFilePath:%@", orderFilePath);
});
return YES;
}
- (void)test11{
}
Copy the code
These are the only three methods in cjL.order at this point
- [Step 5: Copy the file, put it in the specified location, and configure the path.] Generally, put the file in the main project path, and click
Build Settings -> Order File
In the configuration./cjl.order
, the following is the comparison before and after the configuration (the top is the familiarity before the configuration, and the bottom is the symbol order after the configuration)
Note: Avoid endless loops
Build Settings -> Other C Flags
If yes is configured-fsanitize-coverage=trace-pc-guard
In theThe while loop
Part of it will appearInfinite loop
(we are intouchBegin
Debug in method)
- We opened assembly debugging and found three
__sanitizer_cov_trace_pc_guard
The call
- The first time bl is
touchBegin
- The third bl is
printf
- The second bl is because
The while loop
. That as long asIf it is a jump, it will be hooked
That there areBl, b
“Will be hooked
The solution: will BuildSetting other C Flags-fsanitize-coverage=trace-pc-guard
To change to-fsanitize-coverage=func,trace-pc-guard