There are a few things you need to know before you can do a binary rearrangement. For example, physical memory, virtual memory, and paging management

Physical memory

Early operating systems had only physical memory

After an application is started, all the applications are loaded into the memory and arranged according to the actual memory address

This will lead to some problems, such as:

  • I’m going to run out of memory
  • Not safe, because in memory the App is using the real address access, so the App can access the memory outside of it

Virtual memory

In iOS, a virtual memory corresponds to a process one by one, and the size of the virtual memory is 4G. The virtual memory is divided into many pages, each of which is 16KB

When you have virtual memory. CPU access process data has changed from above:

  • After a process starts, the system creates a corresponding virtual memory for the process, which records the virtual memory address of each process data (such as “process 1 virtual page table” in the figure).
  • When a part of the process is active, the MMU(Memory Management Unit- Memory Management Unit) translates the virtual memory address of the data into its corresponding physical memory address, and then the CPU accesses the data in physical memory through the physical address.
  • If the corresponding physical address is not found on the page (Figure "Process 1 virtual page table "P2), indicating that the process data associated with this page has not been loaded into physical memory, and a page missing exception will be triggered (Page Fault), interrupt the current process, first load the process data corresponding to the current page into the physical memory, then the page will record the physical address of the data, the CPU through the physical address to access the data in memory (This process takes milliseconds)

Advantages of virtual memory over pure physical memory in the early days

  • More efficient memory usage: After paging management, only active page data is loaded into physical memory. When physical memory is occupied, inactive page data is overridden and active page data is loaded. This improves memory usage
  • Memory data is safer: each time the process is started, the system re-establishes the corresponding virtual memory and allocates one for the virtual memoryAddress Space Layout Randomization, the virtual address of data is ASLR random value + offset value. In this way, the virtual address of data changes every time. In addition, THE CPU accesses the physical memory indirectly through the virtual memory, and the physical memory address is not exposed during the process, so the security of memory data can be ensured

ASLR (Address Space Layout Randomization)

First of all, virtual memory is a security risk without ASLR

  • Each virtual page table begins with 0 (0~4G). If you do a static analysis, locate a function and find the function offset address, you get that function every time

ASLR can remedy the safety deficiencies described above

ASLR: Address Space Layout Randomization; ASLR prevents an attacker from reliably jumping to a specific location in memory to exploit a function by randomly placing the address space of a process’s key data area

A more straightforward explanation is as follows: Each time the process is started, the system re-establishes the corresponding virtual memory and allocates an ASLR (Address Space Layout Randomization) value to the virtual memory. The virtual Address of the data is: ASLR random value + offset value, so that the virtual address of the data will change every time

With that in mind, let’s move on to binary rearrangements

Binary rearrangement

As mentioned above, when loading data that is not loaded into physical memory, a system interrupt (PageFault) will be triggered. Although each time takes milliseconds, there is one case where a large number of PageFaults occur, and that is when the App starts, The core of reducing App startup speed by binary reordering is to reduce the number of PageFaults when the App starts

Before reordering, we can check the default order in which our projects are added to memory by using the LinkMap file (LinkMap records the layout of the binaries).

  • Xcode -build settings-write Link Map File – Set this parameter to YES
  • Run the project, then go to Products in the project directory, find xxx.app, right click show in Finder
  • Go back two levels and follow the directoryIntermediates. Noindex/project name. build/ debug-iphonesimulator/project name. build/ project name. build/ project name. linkmap-normal-arm64.txtFind the linkMap file

The contents are as follows:

You can see that by default this order is in the order of Compile Source, and the different methods within a single file are in the order in which the code is written

In addition, we can also see the number of PageFaults when the App is started using xcode-Slls104System Trace

As shown in the screenshot :(real project, project information blurred)

There is a problem with this: there are a lot of Page Faults when an app is first opened, and fewer Page Faults when it is opened again. When you open multiple apps, you can also find a lot of Page Faults when you open the detected app

This is due to the operating system mechanism that when an application is killed, the physical memory it accesses is not immediately cleared; The physical memory it accesses needs to be overwritten by another app,

So what we’re going to do is we’re going to put all the code that we need to start up, together, at the top of the list, and we’re going to reduce the number of pageFaults that we don’t need to start up.

In summary, it is the following two points

  • Find all the functions that need to be called when the App starts
  • Changes the order in which App data is added to memory

Gets the sequence of methods called at launch time in a project

  • HOOK objc_msgSend(); Methods that cover all OC are not considered in this article (see my other article for an introduction to some hooks).
  • (c) Hook the fishhook to the fishhook
  • Clang pile insertion (official document)

In this paper, clang piling method is adopted:

Principle: At compile time, inside each function, the __sanitizer_cov_trace_pc_guard method is statically inserted, and then we register its callback function in the project. Every time the App calls a method (OC,C,block, everything), it goes through __sanitizer_ Cov_trace_pc_guard is called back and forth, so we can record all the methods we need to start the App

  • Configure other C Flags
Build settings-other C Flags -fsanitize-coverage=func,trace-pc-guard -fsanitize-coverage=trace-pc-guard in the official file will also insert hook code in the while loop, statically adding __sanitizer_cov_trace_pc_guard several times, causing an endless loop So we're going to add the func argument, which means when we only have hook functionsCopy the code
  • Importing header files
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <dlfcn.h>
Copy the code
  • Register the callback function

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {

  static uint64_t N;  // Counter for the guards.

  if (start == stop || *start) return// Initialize only once.

  printf("INIT: %p %p\n", start, stop);

  for (uint32_t *x = start; x < stop; x++)

    *x = ++N;  // Guards should start from 1.

}


void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {

// if (! *guard) return;

    void *PC = __builtin_return_address(0);

    Dl_info info;

    dladdr(PC, &info);

    printf("%s\n",info.dli_sname); // Print the method name
}

Copy the code

You can do a simple debugging with LLDB, and for obvious results, do the following

  • Create a new project, write any number of methods, and add the above code to it (anywhere, viewController.m)
  • After running the project, add a power outage to __sanitizer_cov_trace_pc_guard(as shown in the figure), and then click on the screen to trigger the click method to enter the breakpoint

You can see from the log that star and stop are 0x102591898 and 0x1025918f8, respectively. A memory read is performed to read the contents of the last memory address

(lldb) x 0x1025918f4
Copy the code

Why is it 0x1025918f8-0x4?

Start and stop are the uint32_t type and occupy 4 bytes. End points to the last (see figure), so to get the contents of the last chunk of memory, subtract 0x4

If we add another method to this figure and do the same thing to get the number in the red box, we will see that the number in the hollow box is the number of methods (note: the 18 in the red box is a hexadecimal number, which means there are 24 methods).

Can we add assembly code and see what happens

xcode - Debug - Debug Workflow - Always show Disassembly
Copy the code

Add a touchesBegan:withEvent: method to the current ViewController and add a breakpoint inside the method, after clicking on the screen:

The method __sanitizer_cov_trace_pc_guard was injected. If we call testMethod() from inside the touchesBegan:withEvent: method, we can see that testMethod() will also be injected with __sanitizer_cov_trace_pc_guard.

The above method of recording is printed by means of NSLog, if in a large actual project, we can consider writing the method name to a local file, I refer to the iOS startup optimization: Binarysorttool. h exposes a class method + (void)writeSortedFileMethod; You can call this method after the App starts to write to the file

// BinarySortTool.m // Created by qwer on 2021/8/10. #import "BinarySortTool.h" #include <stdint.h> #include <stdio.h> #include <sanitizer/coverage_interface.h> #include <dlfcn.h> #import <libkern/OSAtomic.h> @implementation BinarySortTool  + (void)writeSortedFileMethod { NSMutableArray<NSString *> * symbolNames = [NSMutableArray array]; While (YES) {//offsetof is to find the offsetof an attribute relative to a structure. SymbolNode * node = OSAtomicDequeue(&symbolList, symbolList). offsetof(SymbolNode, next)); if (node == NULL) break; Dl_info info; dladdr(node->pc, &info); NSString * name = @(info.dli_sname); / / add _ BOOL isObjc = [name hasPrefix: @ "+ ["] | | [name hasPrefix: @" - ["]; nsstrings * symbolName = isObjc? Name: [@ "_" stringByAppendingString: name]; / / to weigh the if (! [symbolNames containsObject: symbolName]) {[symbolNames NSArray * symbolAry = [[symbolNames reverseObjectEnumerator] allObjects]; // Write the result to a file NSString * funcString = [symbolAry componentsJoinedByString:@"\n"]; NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"binary.order"]; NSData * fileContents = [funcString dataUsingEncoding:NSUTF8StringEncoding]; BOOL result = [[NSFileManager defaultManager] createFileAtPath:filePath If (result) {NSLog(@"%@",filePath);}else{NSLog(@" file write error ");}} // Atomic queue static OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT; // Struct typedef struct{void * PC; void * next;}SymbolNode; void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) { static uint64_t N; // Counter for the guards. if (start == stop || *start) return; // Initialize only once. printf("INIT: %p %p\n", start, stop); for (uint32_t *x = start; x < stop; x++) *x = ++N; } void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { //if (!*guard) return; // Duplicate the guard check. void *PC = __builtin_return_address(0); SymbolNode * node = Malloc (sizeof(SymbolNode)); *node = (SymbolNode){PC,NULL} OSAtomicEnqueue(&symbolList, node, offsetof(SymbolNode, next)); } @endCopy the code

At this point, we will get the.order file. Due to privacy issues of the project, we will not provide the screenshot of the.order content

Now that you have the.order file, it’s the last step

Changes the order in which App data is added to memory

Go to Build Settings and set the path to the order file

At this point, the binary rearrangement to start optimization ends. We can verify the page using the SESSIon-based System Trace described above If you kill an App, the physical memory of the App will not be wiped immediately. Try opening more apps before opening your project, or clean up all the background and shut down.


Reference:

Douyin DEVELOPMENT practice: Based on binary file rearrangement solution APP startup speed increased by more than 15%

IOS startup optimization: binary rearrangement