IOS martial arts esoteric article summary

Writing in the front

Startup is the first impression an App gives to the user and is crucial to the user experience. Imagine an App that requires more than 5 seconds to start. Do you still want to use it?

The original project was certainly free of these problems, but as business requirements grew, the code grew. If left unchecked, the startup time will continue to rise until it becomes unacceptable.

Starting from the principle of optimization, this paper introduces how I found the symbols needed for startup through Clang peg, and then modified the compilation parameters to complete the rearrangement of binary files to improve the startup speed of the application.

A possible secret Demo for this section

1. Basic Concepts (Knowledge Reserve)

① virtual memory vs. physical memory

In the early days, data was accessed directly from a physical address. There were two problems with this approach:

  1. Out of memory
  2. Memory data security issues

①.1 Solution to insufficient memory: virtual memory

For problem 1, we added an intermediate layer between the process and physical memory. This intermediate layer is called virtual memory, which is mainly used to manage physical memory when multiple processes exist at the same time. Improved CPU utilization, enabling multiple processes to load simultaneously and on demand. Therefore, virtual memory is essentially a mapping table of the corresponding relationship between virtual addresses and physical addresses

  • Each process has an independent virtual memory, the address is from 0, size is 4 g fixed, each virtual memory is divided into a one page (page size is 16 KB in iOS, the other is 4 KB), is loaded every time a page to load, is unable to visit each other between processes, ensure the security of data between processes.

  • In a process, only some functions are active. Therefore, only the active parts of the process need to be stored in physical memory to avoid wasting physical memory

  • When the CPU needs to access data, it first accesses virtual memory and then addresses it through virtual memory. That is, it looks for the corresponding physical address in the table and then accesses the corresponding physical address

  • If the content of the virtual address is not loaded into the physical memory during the access, a PageFault will occur and the current process will be blocked. In this case, the data needs to be loaded into the physical memory before addressing and reading. This avoids memory waste

The following figure shows the relationship between virtual memory and physical memory

①.2 Security of memory data: ASLR technology

In the virtual memory explained above, we mentioned that the starting address and size of the virtual memory are fixed, which means that when we access it, the address of its data is also fixed, which makes our data very easy to be cracked. To solve this problem, apple introduced ASLR technology in iOS4.3.

ASLR concept: Address Space Layout Randomization is a security protection technology against buffer overflow. By randomizing the linear area Layout of heap, stack and shared library mapping, it increases the difficulty for the attacker to predict the destination Address. It is a technique to prevent the attacker from locating the attack code directly and prevent overflow attack.

The purpose is to configure some sensitive data (such as APP login and registration, payment related code) to an address that the malicious program cannot know in advance, making it difficult for attackers to attack by randomly configuring the data address space.

Due to the existence of ASLR, the loading address of executable files and dynamic linked libraries in virtual memory is not fixed every time they are started, so the resource pointer in the image needs to be fixed at compile time to point to the correct address. The correct memory address = ASLR address + offset value

②. Executable file

Different operating systems have different executable file formats. The system kernel reads the executable into memory and signs it against the executable’s header (magicMagic number) determines the format of the binary file

Among them, PE, ELF and Mach-O are all variations of Command File format (COFF). The main contribution of COFF is the introduction of “segments” mechanism in target files. Different target files can have different numbers and types of “segments”.

③. Universal binary files

Because different CPU platforms support different instructions, such asarm64andx86The common binary format in Apple isPackage mach-O files for multiple architectures togetherAnd then the system selects the appropriate Mach-O based on its CPU platform, soUniversal binary formatAlso referred to as theFat binary format, as shown in the figure below

The common binary format is defined in<mach-o/fat.h>In, can be inDownload the xnuAnd then according toxnu -> EXTERNAL_HEADERS ->mach-oThe file is found in.It starts with generic binariesFat Headerisfat_headerStructure, andFat ArchsIs to indicate how many Mach-Os there are in a generic binary. The description of a single Mach-O is passedfat_archStructure. Two structures are defined as follows:

So, to sum up:

  1. Universal binary file is a new binary file storage structure proposed by Apple. It can store binary instructions of various architectures at the same time, so that the CPU can automatically detect and select the appropriate architecture to read the binary file in the most ideal way
  2. Because common binaries store multiple schemas at the same time, they are much larger than single-schema binaries and take up a lot of disk space, but since the system automatically selects the most appropriate one, unrelated schema code does not take up memory space and is more efficient to execute
  3. You can also merge and split Mach-O with instructions
    1. View the current mach-O architecture:lipo -info MachOfile
    2. Merger:lipo -create MachO1 MachO2 -outputOutput file path
    3. Resolution:Lipo MachO file - Thin architecture - OutputOutput file path

(4). The Mach – O files

Mach-o files are short for The Mach Object file format, which is a file format for executables, dynamic libraries, and Object code. As an alternative to the A. out format, the Mach-o format provides greater extensibility and faster access to symbol table information

Familiarity with the Mach-O file format will help you better understand the underlying operating mechanism of Apple and better master the steps of dyLD to load Mach-O

(4). The Mach 1 – O files

If you want to see information about specific Mach-O files, you can use the MachOView software by dragging the Mach-O executable file toMachOViewTool to open the

④.2 Mach-O file format

For OS X and iOS, Mach-O is the executable file format, which includes the following file types

  • Executable: Executable file
  • Dylib: dynamic link library
  • Bundle: dynamic library that cannot be linked and can only be loaded at run time using dlopen
  • Image: means one of various Executable, Dylib, and Bundle
  • Framework: a collection of Dylib, resource files, and header files

The following illustration shows the Mach-O image file format

The above is the format of a Mach-O file. A completed Mach-O file is divided into three main parts:

  • The Header Mach - O the headMach-o CPU architecture, file types, and load commands
  • Load Commands Load Commands: describes the specific organization structure of data in a file. Different data types are represented by different load commands
  • The Data of DataThe data for each segment of the data is stored here. The concept of segment is similar to the concept of the middle section of ELF files. Each segment has one or more sections that hold specific data and code, including code, data, such as symbol tables, dynamic symbol tables, and so on

HeaderThe Mach – OHeaderContains theKey information for the entire Mach-O file, so that the CPU can quickly know the basic information of mac-O, its inMachO.hThe documents are directed at32And the64Bit architecture cpus are used separatelymach_headerandmach_header_64Structure to describeThe Mach - O the head.mach_headerIs the first thing the connector reads when it loads, and determines some information about the infrastructure, system type, number of instructions, and so onmach_header_64Structure definition, compared to32A framework ofmach_header, just one morereservedKeep field

Filetype mainly records the file types of Mach-O

#define MH_OBJECT   0x1     /* Target file */
#define MH_EXECUTE  0x2     /* Executable file */
#define MH_DYLIB    0x6     /* Dynamic library */
#define MH_DYLINKER 0x7     /* Dynamic linker */
#define MH_DSYM     0xa     /* Store binary file symbol information for debug analysis */
Copy the code

The corresponding Header is inMachOViewAs shown below

Load CommandsIn the Mach-O file,Load CommandsChiefly used ofLoading instructions, whose size and number are already provided in Header, and which are inMachO.hIs defined as follows

We are inMachOViewIn the viewLoad Commands, which records a lot of information, such asThe location of the dynamic linker, the entry of the program, the information about the dependent libraries, the location of the code, the location of the symbol tableAnd so on, as follows

The LC_SEGMENT_64 type segment_command_64 is defined as follows

Data Load Commands is followed by the Data area, which stores specific read-only, read-write code, such as methods, symbol tables, character tables, code Data, and Data required by connectors (redirects, symbol bindings, and so on). The main thing is to store specific data. Most of these mach-O files contain three sections:

  • __TEXT code: read-only, including functions and read-only strings
  • __DATA data segment: Read and write, including global variables that can be read and write
  • __LINKEDITThe: __LINKEDIT contains metadata for methods and variables (locations, offsets), as well as information such as the code signature.

inDataArea,SectionA very large proportion,SectioninMachO.hThe middle is a structuresection_64(in arm64 architecture), which is defined as follows

Section can be seen in MachOView, mainly embodied in the TEXT and DATA sections, as shown below

Common sections include the following

So, to sum up, the format diagram for Mach-O is shown below

2. App startup

It is unsafe for processes to have direct access to physical memory, so the operating system creates a layer of virtual memory on top of physical memory. On this basis, Apple also has ASLR(Address Space Layout Randomization) technology protection (the previous concept is introduced).

In iOS, virtual memory is mapped to physical memory in the smallest unit of pages. When a process accesses a virtual memory Page but the corresponding physical memory does not exist, a Page Fault interrupts and the Page is loaded. While this is fast in itself, it can add up to thousands (or more) of Page faults during the startup of an App.

A page on iOS is 16KB.

We often say start refers to click App by the end of the first page displayed include pre – the main, the main to didFinishLaunchingWithOptions over the entire time.

In addition, there are two important concepts: cold start and hot start. Some students may think that killing and then restarting the App is a cold start, but it is not true.

  • Cold start

A cold boot occurs only after the program exits completely and the paging data loaded in between is overwritten by another process, or after the device is restarted for the first installation.

  • Warm start

After the program is killed, it restarts immediately. At this point, the previously loaded paging data is still in the corresponding physical memory and can be reused without a full reload. So the speed of hot start is relatively fast.

The startup optimization we are talking about here generally refers to the cold startup. The startup in this case is mainly divided into two parts:

  • T1: Pre-main stage, that is, before main function, the operating system loads the EXECUTABLE file of App into the memory, and performs a series of loading & linking tasks. Simply speaking, it is the dyLD loading process
  • T2: After main, starting with main and ending with the didFinishLaunching method of Appdelegate, build the first interface and render it

Therefore, the process of T1+T2 is the process from the user clicking the App icon to the user seeing the main interface of the App, that is, the part that needs to be optimized

①. Optimization of pre-main stage

The startup time of the pre-main phase is actually the time of the dyLD loading process

For the main function before the startup time, Apple provides a built-in measurement method, inEdit Scheme -> Run -> Arguments ->Environment VariablesClick + to add environment variablesDYLD_PRINT_STATISTICSSet to1), and then run. The following is the pre-main time of normal startup of iPhone6sp (take WeChat as an example)

The pre-main phase takes 1.1 seconds in total

  • Dylib loading time: it takes 297.53ms to load the dynamic library

    • The dynamic loader finds and reads the dependent dynamic libraries used by the application. Each library may have its own dependencies. While loading the Apple system framework is highly optimized, loading an embedded framework can be time-consuming. To speed up the loading of dynamic libraries, Apple recommends that you use fewer dynamic libraries or consider merging them.
    • The proposed target is six additional (non-systematic) frameworks.
  • Rebase/Binding time (offset correction/symbol binding time) : 133.43ms

    • Fixed adjusting the pointer inside the mirror (resetting) and setting the pointer to the symbol outside the mirror (binding). To speed up relocation/binding time, we need fewer pointer fixes.
    • Rebase (offset correction): The binary file generated by any app has an address for all methods and function calls inside the binary file. The address isThe offset address in the current binary fileOnce it is run time (that is, run into memory), the system will run each timeAssign an ASLR (Address Space Layout Randomization) Address value(a security mechanism that assigns a random number to be inserted at the beginning of a binary file), e.gtestMethod, the offset value is0x0001And randomly assignedASLRis0x1f00If you want to accesstestMethod, its memory address (that is, real address) becomesASLR+ offset = memory address determined at runtime(i.e.0x1f00 + 0x0001 = 0x1f01)
    • Binding: e.g.NSLogMethod, generated at compile timemach-oIn the file, a symbol is created! NSLog(currently pointing to a random address), and then at run time (loading from disk into memory as a mirror file), will give the real address to the symbol (that is, bind the address to the symbol in memory)dyldMade of, also calledDynamic library symbol binding), in a word:Binding is the process of assigning values to symbols
  • ObjC Setup Time (time required for OC class registration) : The more OC classes, the more time required

    • The Objective-C runtime requires setting class, category, and selector registrations. Any improvements we make to reposition the binding time will also optimize this setup time
    • An application with a large number of objective-C classes, selectors, and categories can add 800ms to the startup time.
    • If the application uses C++ code, fewer virtual functions are used.
    • It is also usually faster to use the Swift architecture
  • Initializer Time (time taken to execute load and constructor)

    • Run the initializer. If you use objective-C’s +load method, replace it with the +initialize method.

②. Optimization of main function stage

In the didFinishLaunching method after main, there’s basically all sorts of stuff going on, much of it not necessarily going on right away, and it’s something we can lazily load so as not to affect startup time.

There are three main types of business in didFinishLaunching

  • [First type] Initialize third-party SDKS
  • [Second type] Configuration of APP running environment
  • [Third class] initialization of their own tool classes, etc

The optimization suggestions for the main function stage are as follows:

  • Reduce the startup initialization process, lazy loading can be lazy, can delay delay, can put in the background initialization put in the background, try not to occupy the main thread startup time
  • Optimize the code logic, eliminate unnecessary code logic, reduce the elapsed time of each process
  • If you can use multiple threads to initialize the startup phase, use multiple threads
  • As far as possibleUse pure codeTo build the UI framework, especially the main UI framework, for exampleUITabBarController.Try to avoid usingXiborSB, which is more time consuming than pure code
  • Delete obsolete classes and methods

Binary rearrangement – optimizations for reducing Page faults

Some basic concepts and ideas for starting optimization have been briefly introduced. Now we will focus on an optimization scheme for the pre-main stage, namely binary rearrangement

1. Principle of binary rearrangement

In the virtual memory section, we know that when a process accesses a virtual memory page where the corresponding physical memory does not exist, a Page Fault is triggered, thus blocking the process. At this point, the data needs to be loaded into physical memory and then accessed again. This has some impact on performance.

Based on Page Fault, we think that in the process of cold startup of App, there will be a large number of classes, categories, and third parties that need to be loaded and executed, and the Page Fault generated at this time will be very time-consuming. Taking WeChat as an example, let’s take a look at the number of Page faults in the startup stage

  • CMD+iShortcut key, selectSystem Trace

  • Click Start (you need to restart the phone and clear the cache data before starting), stop the first interface, and follow the operation in the following figure

As can be seen from the figure, WeChat has 2900+ PageFault times, which, as can be imagined, has a significant impact on performance.

  • And then we go throughDemoTo see the order of methods at compile time, define the following methods in the following order in ViewController

  • inBuild Settings -> Write Link Map FileSet toYES

  • CMD+BcompileDemo, and then search in the corresponding pathlink mapFile. Right clickShow In FinderOpen the package folder:

    • Two levels above the package file, findIntermediates.noindex:

    • 1 – Start optimizing demo-linkmap-normal-arm64.txt

  • Function order (written order), as shown below, you can see that the functions in the class are loaded from top to bottom, whilefileThe order is based onBuild Phases -> Compile SourcesIn order to load

conclusionFrom the abovePage FaultThe number of times and the loading order can be found in factThe root cause of too many Page faults is that the methods that need to be called at startup time are in different pagesTherefore, our optimization idea is: ** put all the methods that need to be called at startup time together, i.e. on a Page, so that multiple Page faults become a single Page Fault **. This is binary rearrangementCore principles, as shown below

Note: In the iOS production environment, when a Page Fault occurs, the iOS system performs a signature verification on the app when it is reloaded. Therefore, the Page Fault in the iOS production environment takes more time than that in the Debug environment

②. Binary rearrangement practice

Now, let’s do some concrete practice, first understand some nouns

(2). 1 Link Map

Link Map is an intermediate product of iOS compilation process, which records the layout of binary files. You need to enable Write Link Map File in Xcode’s Build Settings. Link Map consists of three parts:

  • Object FilesThe path and file number of the link unit used to generate the binary
  • SectionsRecord the range of addresses for each Segment/section in Mach-O
  • SymbolsRecord the address range of each symbol in order

(2). 2 ld

Ld is the linker used by Xcode and has an order_file parameter. We can configure a File path with the suffix Order by setting it to Build Settings -> Order File. In this order file, the required symbols are written in the order in which they are loaded when the project is compiled to achieve our optimization

So the essence of binary rearrangement is to rearrange the symbols that start loading

If the project is small, it is possible to customize an order file and manually add the order of methods. However, if the project is large and involves many methods, how do we get the function to start running? Here are a few ideas:

  1. hook objc_msgSendAs we know, the essence of a function is to send a message that will come at the bottomobjc_msgSendBut because ofobjc_msgSendIs variable parameters, need to be obtained through assembly, high requirements for developers. And you can only get itOCswiftIn the@objcMethods after
  2. Static scanScanning:Mach-OSymbol and function data stored in a particular section or section
  3. Clang plugging pile: batch hook, can achieve 100% symbol coverage, that is, full accessSwift, OC, C, blockfunction

②.3 Initial experience of binary rearrangement

Binary rearrangement, the key is the order file

  • Speak in front of theobjcYou will see the order file in your project:

  • Open theThe order file, and you can see that inside are all sorted function symbols

  • That’s because Apple’s own libraries are also binary rearranged

Load ->test1->test2->ViewDidAppear->main

  • Create a tcJ.order file in the Demo project root directory

    touch tcj.order
    Copy the code

  • intcj.orderManually sequential writing functions to the file (including a nonexistent Hello function)

  • inBuild SettingsIn the searchorder fileTo join./tcj.order

  • Command + BAfter compiling, check it out againlink mapFile:

    • foundorderIn the fileA function that does not exist(hello), the compiler skips it
    • otherFunction of symbolsAccording to usorderorder
    • orderIn the default orderorderThe back of the function
  • The problem is to write functions into the order file by hand. How w do I know which function comes first and which comes after?

    • Our goal: to getStart to finishAfter theAt some point.beforeAll of theIs calledthefunction.Excuse meoneselfLine up to get into mineorderFile (Clang piling to achieve)

(2). 4 Clang pile

To actually implement the binary rearrangement, we need to take all the symbols of the methods, functions, etc. that are started, save their order, and then write the order file to implement the binary rearrangement.

Douyin has an article about its development practice: A solution based on binary file rearrangement improves APP startup speed by more than 15%, but the article also mentions bottlenecks:

The solution based on static scan + runtime trace still has a few bottlenecks:

  • Can’t initialize the hooks
  • Some block hooks fail
  • C++ does not scan statically through indirect function calls to registers

The current rearrangement scheme can cover 80% ~ 90% of symbols. In the future, we will try other schemes such as compile-time piling to cover 100% of symbols, so that the rearrangement can achieve the optimal effect.

At the same time, a solution to compile time staking is also given.

Before we talk about clang piling, what is hook?

A hook is a hook — get the memory address and implementation of the original function symbol, hook it, and do whatever you want

  • For example: You come across a car stopped on the highway. You can go with his car (add your own code), or you can just grab his car and drive it yourself (rewrite the implementation).

Obviously, what we want to do at this point is to hook up all the functions until the end of startup, attach some code, save the function names in order, and generate our order file

Q: Is there an API that allows me to hook anything I want? Swift, OC, c functions I need to hook? A: Yes, clang piling. It generates all the syntax trees. It decides the order.

Clang peg LLVM has a simple code coverage detection built in. It inserts calls to user-defined functions at the function level, base block level, and edge level. Santizer coverage is needed for our batch hook here.

The official documentation for clang’s pile coverage is as follows: The clang code Coverage tool documentation provides a detailed overview, as well as a brief Demo

We created the TraceDemo project, according to the official example, to try to develop

Add the trace

  • According to the official description, trace code can be added, and the callback function is given.

Open up our TranceDemo, -fsanitize-coverage=trace- PC-guard objc Undefined symbol: ___sanitizer_cov_trace_pc_guard_init Undefined symbol: ___sanitizer_cov_trace_pc_guard

Viewing the official website will require us to add two functions:

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
// if(*guard)
// __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
// __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if(! *guard)return;  // Duplicate the guard check.
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  // store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}

Copy the code

We added our code to viewController.m. We don’t need extern “C” so we can delete it. __sanitizer_symbolize_pc() will still give an error.

The function __sanitizer_cov_trace_pc_guard_init counts the number of methods.

And when we run it, we can see that

After reading memory, we can see something like a counter. The last one printed is the end position, and it’s 4 bits, 4 bits, so if you move 4 bits forward, it should print the last bit.

Explain two parameters:

  • Parameter 1startIt’s a pointer to unsignedintType, 4 bytes, equivalent to the start position of an array, i.e. the start position of the symbol (read from high to low)
  • Parameter 2stopSince the address of the data is read down (i.e. read from high to low), the address obtained at this time is not the real address of stop, but the last address marked. When reading stop, because stop takes 4 bytes,Stop Real address = stop printed address -0x4)
  • startandstopRepresents the start and end memory addresses of the current file. The unit is int32 4 bytes
  • If you add a few more functions, you will find that the stop address value increases accordingly.
  • In this case, it refers to the open and closed interval from start to stop. [,], so the stop address is the address of the last function symbol

According to the little endian mode, 0e 00 00 00 00 corresponds to 00 00 00 0e is 14.

So what does the value stored in the stop memory address represent? When you add a method/block /c++/ attribute to a method (several more), you find that its value increases by the corresponding number.

For example in the firstViewController.mAdd atouchesBeganMethod to run:

According to the little endian mode, 0f 00 00 00 00 corresponds to 00 00 00 0f is 15.

We’re adding a functiontest()Run:According to the little endian mode,10, 00 00 00The corresponding is00 00 00 1016.

We’re adding oneblockRun:According to the little endian mode,11 00 00 00The corresponding is00 00 00 1117.

At this point you can see that the total increment is 3(block is an anonymous function), the counter counts the number of functions/methods/blocks, three are added, and the index is increased by 3

Rearrange the code:

#import "ViewController.h"
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

@interface ViewController(a)

@end

@implementation ViewController

void test()
{
    block();
}

void(^block)(void) = ^ (void){
    
};

- (void)viewDidLoad {
    [super viewDidLoad];
}

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if(! *guard)return;  // Duplicate the guard check.
// void *PC = __builtin_return_address(0);
  char PcDescr[1024];
// __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
    
    test();
}

@end

Copy the code

Run the project to clear the print area: When we click the screen again:

We are intouchBegin,test,blockand__sanitizer_cov_trace_pc_guardAdd breakpoints and run the code:

[Verification I] The execution order is: touchesBegan -> __sanitizer_cov_trace_pc_guard -> test -> __sanitizer_cov_trace_pc_guard -> block -> __sanitizer_cov_trace_pc_guard

【 verification 2 】touchesBeganEnter assembly:

If we look at other functions we’ll see similar displays in assembly code. Then each function, when triggered, calls __sanitizer_cov_trace_pc_guard.

Simply add Other C Flags to enable trace. LLVM inserts a line of code calling __sanitizer_cov_trace_pc_guard at the edge of each function (the starting position). It was inserted at compile time. Clang piling is a call to the __sanitizer_cov_trace_pc_guard function inserted into the assembly code.

Explain the __sanitizer_cov_trace_pc_guard method: basically, it captures all the symbols at the start time and enlists all symbols.

Once you’ve got all the symbols, you need to save them, but you can’t use an array, because there might be something that’s executed on a child thread, so using an array would have threading problems. Here we use atomic queues:

#import "ViewController.h"
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#import <libkern/OSAtomic.h>
#import <dlfcn.h>

@interface ViewController(a)

@end

@implementation ViewController

// Define atomic queue: features 1. First in last out 2. Thread safety 3
static OSQueueHead symbolList = OS_ATOMIC_QUEUE_INIT;

// Define the symbolic structure list
typedef struct{
    void *pc;
    void *next;
} SymbolNode;

void test()
{
    block();
}

void(^block)(void) = ^ (void){
    
};

- (void)viewDidLoad {
    [super viewDidLoad];
}

/* -start: start position -stop: not the address of the last symbol, but the address of the last symbol in the entire symbol table =stop-4 (because stop is an unsigned int, 4 bytes). Stop stores the value of the symbol */
void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

/* Fully hook methods, functions, and block calls, used to capture symbols, are multithreaded. This method stores only PCS, in the form of a linked list - guard is a sentinel that tells us the number of */ to be called
void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
// if (! *guard) return; // Duplicate the guard check. // The load method is filtered out, so it needs to comment out
    
    / / for the PC
    /* -pc The current function returns the address of the previous call. -0 The current function address, i.e. the return address of the current function. -1 The current function caller's address, i.e. the return address of the previous function */
  void *PC = __builtin_return_address(0);
    
    // Create a structure!
  SymbolNode * node = malloc(sizeof(SymbolNode));
    *node = (SymbolNode){PC, NULL};
    
    
    // Join the queue
    // The symbol is accessed not by subscript, but by next pointer to the list, so we need to borrow offsetof (structure type, next address is next).
    OSAtomicEnqueue(&symbolList, node, offsetof(SymbolNode, next));
    
    Dl_info info;// Declare objects
    dladdr(PC, &info);// Read the PC address and assign it to info
    printf("fnam:%s \n fbase:%p \n sname:%s \n saddr:%p \n",
           info.dli_fname,
           info.dli_fbase,
           info.dli_sname,
           info.dli_saddr);

}

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
    
    test();
}

@end

Copy the code

After we run it, we can see a lot of printing here, just take one to illustrate it, and it’s obvious that there’s onesnameThat’s the sign name we need.

Now we export the symbols we need by clicking on the screen. Note that C functions and Swift methods need to be underlined (this can be confirmed in the LinkMap file mentioned earlier).

- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(UIEvent *)event{
    
    NSMutableArray <NSString *>* symbolNames = [NSMutableArray array];
    
    // A hook is added to each while loop (__sanitizer_cov_trace_pc_guard)
    [other c clang]: -fsanitize-coverage=func,trace-pc-guard =func
    while (YES) {
        // Remove the linked list
        SymbolNode * node = OSAtomicDequeue(&symbolList, offsetof(SymbolNode, next));
        
        if (node == NULL) {
            break;
        }
        
        Dl_info info = {0};
        // Fetch the node's PC and assign it to info
        dladdr(node->pc, &info);
        // Release the node
        free(node);
        / / name
        NSString * name = @(info.dli_sname);
        
        BOOL isObjc = [name hasPrefix:@ "+ ["] || [name hasPrefix:@"-["]; // the OC method is not processed
        NSString * symbolName = isObjc ? name : [@ "_" stringByAppendingString:name]; //c functions, swift methods are preceded by underscores
        [symbolNames addObject:symbolName];
        printf("%s \n",info.dli_sname);

    }
    
    // the queue is stored in reverse order.
    NSEnumerator * emt = [symbolNames reverseObjectEnumerator];
    // Create an array
    NSMutableArray<NSString*>* funcs = [NSMutableArray arrayWithCapacity:symbolNames.count];
    // Temporary variables
    NSString * name;
    // Iterate over the set, de-duplicate it, and add it to funcs
    while (name = [emt nextObject]) {
        // The array is de-appended
        if (![funcs containsObject:name]) {
            [funcs addObject:name];
        }
    }
    // Delete the current method, because the click method is not required for startup
    [funcs removeObject:[NSString stringWithFormat:@"%s",__FUNCTION__]];
    // File path
    NSString * filePath = [NSTemporaryDirectory() stringByAppendingPathComponent:@"tcj.order"];
    // Array to string
    NSString *funcStr = [funcs componentsJoinedByString:@"\n"];
    // File contents
    NSData * fileContents = [funcStr dataUsingEncoding:NSUTF8StringEncoding];
    // Create a file on the path
    [[NSFileManager defaultManager] createFileAtPath:filePath contents:fileContents attributes:nil];
    
    NSLog(@ "% @",filePath);

}
Copy the code

If you click directly on the screen, there is a big hole, and you see the console keeps output, in an infinite loop:

We set a breakpoint inside the while:Then look at the assembly:

__sanitizer_cov_trace_pc_guard has a total of 10, which triggers queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, queue enqueuing in __sanitizer_cov_trace_pc_guard, and queue enqueuing in __sanitizer_cov_trace_pc_guard.

-fsanitize-coverage=func,trace-pc-guard

Only check the entry of each function.

Run it again and tap the screen and it won’t be a problem.

Note:

  1. *if(!guard) return; I need to get rid of it because it affects the +load writing
  2. The while loop also triggers __sanitizer_cov_trace_pc_guard(trace does not hook every jump according to the function, but hooks every jump (bl). The while loop also has a jump, so it enters an infinite loop).

The order file is stored in the TMP folder on the real machine.

inThe Window – > Devices And Simulators(shortcuts⇧ + ⌘ + 2) :

Download to the specified location, display the package contents, order file can be found in the TMP folder.

Swift binary rearrangement can Swift also be rearranged? Of course you can!

Swift binary rearrangement, same as OC. Only the LLVM front end is different.

  • OCThe front-end compiler isClangAnd so onother c flagsadd-fsanitize-coverage=func,trace-pc-guard
  • SwiftThe front-end compiler isSwiftAnd so onother Swift FlagsAdd –sanitize=undefined-sanitize-coverage=func

We add a Swift class to the project, and then inViewControllertheloadMethod to call:

Build Setting Other Swift Flags

After running, click on the screen to view the console:

Add: swift symbol comes with name confusion without changing the code, swift symbol does not change in short, order file, please in the code sealed version, regenerate

Write Link Map File to NO and delete Other C Flags/Other Swift Flags configuration

Because this configuration automatically inserts the jump execution __sanitizer_cov_trace_pc_guard in our code. You don’t need to rearrange it. You need to get rid of it. Also remove __sanitizer_cov_trace_pc_guard from ViewController.

At this point, Clang staking and automatic generation of Order files have been completed. After you get the order file, you can try your own project.

Write in the back

The binary rearrangement makes the methods needed for startup more compact and reduces the number of Page faults. When obtaining the symbol table, Clang peg can be directly hooked to Objective-C method, Swift method, C function and Block without any difference. Compared with tiktok’s previous proposal, it is indeed much simpler and the threshold is lower.