preface

The symbol table has always been an important part of reverse engineering, and iOS apps tend to cut the symbol table before they go live to avoid reverse analysis.

This article introduces a self-written tool for restoring symbol tables for iOS applications.

Directly see the effect, alipay to restore the symbol table after the appearance:

The article is a bit long, please be patient to see the end, highlights at the end.

Why restore the symbol table

In reverse engineering, the dynamic analysis of the debugger is essential, and Xcode + LLDB is indeed a very good debugger. For example, we can easily view the call stack in Xcode, and the ABOVE diagram can clearly see the RPC call process of Alipay login.

In fact, if we had not restored the symbol table, you would have seen the following debugging page:

For the same function call procedure, Xcode’s display could hardly be more different.

The reason is that when Xcode displays symbols in the call stack, it only displays symbols that are in the symbol table. For our debugging process to go smoothly, we need to restore the symbol table in the executable file.

What is a symbol table

To restore the symbol table, we first need to know what the symbol table is and how it exists in a Mach-o file.

The Symbol Table is stored in the mach-o file’s __LINKEDIT section, which refers to the Symbol Table and the String Table.

Here we use MachOView to open the Alipay executable and find the Symbol Table item.

The structure of a symbol table is a sequential list, each of which is a struct nList.

// uint32_t n_strx; // uint32_t n_strx; // uint32_t n_strx; } n_un; uint8_t n_type; uint8_t n_sect; int16_t n_desc; // Uint32_t n_value; // Uint32_t n_value; };Copy the code

The focus here is on the first and last entries, the first being the offset of the symbol name in the string table, which is used to represent the function name, and the last being the symbol’s address in memory, which is similar to a function pointer (this is just the general structure, see the official Mach O file format documentation for details).

That is, if we know the relationship between symbol names and memory addresses, we can construct symbol table data from this structure in reverse.

Now that you know how to construct the symbol table, the next step is to collect the mapping between symbol names and memory addresses.

Gets the symbol table for the OC method

Because of the OC language, the compiler will compile the class names, function names, and so on into the final executable file, so we can reverse the structure of the mach-o file to restore all the classes in the project, known as the reverse tool class-dump. Class-dump = class-dump; class-dump = class-dump;

So we just need to modify the source of class-dump to get the information we want.

Symbol table recovery tool

With the data formatted and the source of the data clear, we can write the tool.

The implementation process will not be explained in detail, the tool is open source in my documentation, link: docs.qq.com/doc/DTWxwWE…

Here’s how to use this tool:

1. Download and compile the source code

git clone --recursive https://github.com/tobefuturer/restore-symbol.git
cd restore-symbol && make
./restore-symbol
Copy the code

2. Restore the symbol table of OC, very simple

./restore-symbol ./origin_AlipayWallet -o ./AlipayWallet_with_symbol
Copy the code

For the origin_AlipayWallet Clutch, the mach-o file -o has no symbol table followed by the output file location

3. Re-sign the Mach-o file and package it to see the effect

After the symbol table of the file is restored, 20M more symbol table information is displayed

Look at the call stack in Xcode

As you can see, the symbols of this part of OC function have been restored, and the general calling process can be seen in the function call stack. However, in Alipay, the callback form of block is adopted, so a large part of symbols are not displayed correctly.

So let’s see how we can restore the symbol of this block.

Gets the symbol information for the block

Again, in order to recover the symbolic information about blocks, we need to know how blocks are stored in the file.

The in-memory structure of a block

First, let’s examine how blocks are stored in memory at runtime. A block exists as a structure in memory. The structure is roughly as follows:

Struct __block_impl {/** block isa struct of Class NSObject, starting with an isa pointer */ Class isa; /** these two variables do not care about */ int flags; int reserved; /** a pointer to a function! */ void (*invoke)(...) ; . }Copy the code

In the following example, the ISA pointer in a block can have three different values to represent different types of blocks:

  1. _NSConcreteStackBlock

    A block on a stack is created by allocating space in a block structure on the stack and then assigning values to variables such as ISA.

  2. _NSConcreteMallocBlock

    A block on the heap, when a block is added to the GCD or is owned by an object, is copied from the stack to the heap, and the resulting block type becomes _NSConcreteMallocBlock.

  3. _NSConcreteGlobalBlock

    For globally static blocks, when the block does not depend on the context, such as holding no variables outside the block and only using variables inside the block, the memory allocation of the block can be completed at compile time and allocated in the global static constant area.

The second type of block does not appear until run time. We will only focus on two types, 1 and 3, and analyze the association between these two isa Pointers and the address of the block symbol.

Association between block ISA Pointers and symbol addresses

IDA, a disassembler software, is used to analyze this part. Here are two small practical examples to illustrate:

1._NSConcreteStackBlock

Suppose our source code is a simple block like this:

@implementation ViewController - (void)viewDidLoad { int t = 2; void (^ foo)() = ^(){ NSLog(@"%d", t); // Block refers to an external variable t}; foo(); } @endCopy the code

After compiling, the actual assembly looks like this:

At runtime, the actual block construction process looks like this:

  1. Create stack space for blocks
  2. Assign a value to the block’s ISA pointer (always reference the global variable:_NSConcreteStackBlock)
  3. Gets the address of a function and assigns it to a function pointer

So we can sort out a feature like this:

Here comes the point!!

Any code that uses a block on the stack, it gets it__NSConcreteStackBlockAs a pointer to isa, it is followed by a function address, which is the function address of the block.

Read this sentence carefully in conjunction with the following diagram (which is the same file as the above, but with the symbol table cropped)

Using this feature, we can make the following inferences in reverse analysis:

If you find a reference to __NSConcreteStackBlock in an OC method, there must be a function address nearby, which is a block in the OC method.

For example, in the figure above, we can see that viewDidLoad refers to __NSConcreteStackBlock and immediately loads the address of sub_100049D4. Sub_100049D4 is a block in viewDidLoad. The symbol name of sub_100049D4 should be viewDidLoad_block.

2. _NSConcreteGlobalBlock

A global static block is a block that does not refer to an external variable, so it can be allocated at compile time and does not have to worry about copying blocks and so on. It exists in the constant area of the executable.

If you don’t understand, here’s an example:

Let’s change the source code to look like this:

@implementation ViewController - (void)viewDidLoad {void (^ foo)() = ^(){NSLog(@"%d", 123); }; foo(); } @endCopy the code

This will look like this after compilation:

So using the above ideas, we can infer this when we do reverse analysis

  1. A reference to _NSConcreteGlobalBlock was found in the static constants area
  2. There must be a block of structured data in this place
  3. A value appears at the 16th byte of the structure, which is the function address of a block

3. Nested structure of blocks

In actual use, it may occur that the block is embedded in the block:

- (void)viewDidLoad {
  dispatch_async(background_queue ,^{
    ...
    dispatch_async(main_queue, ^{
      ...     
    });
  });
}
Copy the code

So there is a parent-child relationship in the block, and if we collect these parent-child relationships, we can find that these relationships will form the forest structure in graph theory, and we can simply use recursive depth-first search to deal with it, and the detailed process will not be described.

Block symbol Table Extraction script (IDA+ Python)

Putting together the above ideas, we find that the search process relies on IDA providing various reference information, and IDA provides programming interfaces that can be used to extract the reference information.

IDA provides the Python SDK, and the finished script is also stored in the repository docs.qq.com/doc/DTWxwWE…

Extract the block symbol table

Here is a brief introduction to how to use the above script

  1. Use IDA to open alipay’s Mach-o file
  2. Waiting for analysis to complete! It may take an hour
  3. Alt + F7 or menu barFile -> Script file...

  1. Wait for the script to finish running. It is expected to be 30s to 60s. During the running process, such a popover window will appear

  1. When the popover disappears, the block symbol table is extracted
  2. In the directory where IDA opened the file, a file namedblock_symbol.jsonBlock symbol table in the JSON format

Restore symbol table & actual analysis

Import the symbol table of the block into the Mach-o file using the previous symbol table recovery tool

./restore-symbol ./origin_AlipayWallet -o ./AlipayWallet_with_symbol -j block_symbol.jso
Copy the code

-j is followed by the previously obtained JSON symbol table

Finally, an executable file with OC function symbol table and block symbol table is obtained

Here’s a brief case study to give you an idea of how powerful this tool can be.

  1. In Xcode-[UIAlertView show]To set breakpoints

  1. Run the app and enter your phone number and phone number on the Alipay login pageWrong password, click Login
  2. Xcode will stop when the ‘password error’ warning box pops up, and this is the call stack on the left

A picture to see the login process of Alipay

More:

IOS Underlying Principles – Isa – The underlying principles of the class (1)

IOS Underlying Principles – Isa – The underlying principles of the class (middle)

The underlying principles of iOS – ISA – The underlying principles of the class (2)

Interview questions for iOS (part 1)

Interview questions for iOS (Part 2)

Want to enter big factory, interview question essential! (iOS Interview collection!)

A learning route for iOS developers