Deep into the iOS system of XCODE for assembly support

To do a good job, you must sharpen your tools first — The Analects of Confucius · Duke Ling of Wei

A good IDE should not only provide a comfortable, concise and convenient source code editing environment, but also provide a powerful debugging environment. XCODE is by far the best IDE for iOS app development (although Visual Studio2017 is also starting to support iOS app development), after all, XCODE and iOS are the sons of Apple. The only joke is that the system and build environment is so tied, every time a small update of the mobile operating system, you need to upgrade a new version of the application of several gigabytes, which is really a bit of a trap! At present, there are many decomcompiling tools on the market, such as IDA, Hopper Disassembler, etc., as well as operating system tools such as otool, LLDB. Some of these tools are good at static analysis and some are good at debugging, so I won’t analyze them here. If the program is running to see some of the internal system and real-time debugging analysis I think XCODE itself is very good also, now that the system we must deeply understand and learn some things about the assembly, then must understand and grasp some of tools, and XCODE is one of the most convenient tool on your hand.

XCODE assembler mode switch

Have you ever seen the following screen when running online due to a system crash?

Don’t panic! It’s essentially XCODE’s assembly mode interface. Not only can we see it when the program crashes, we can also artificially enter the interface mode. This article is more of an introduction to using XCODE tools that you may or may not have touched and learned about. Switching between assembly code and source code can be done by using the Debug -> Debug Workflow -> Always Show Disassembly menu.

Remember to set a breakpoint and run to the breakpoint to switch to see the assembly instructions ah!

In the introduction to the instruction set at the bottom of iOS system, we have said that the simulator runs Intel instructions, while the real machine runs ARM instructions. Here we see the differences between the simulator and the real machine assembly instructions:

Through the above three figures you will find that there is a great difference between the source code and assembly code, as well as between the assembly code under different instruction sets! Differences in assembly code are simply differences in instructions running on different cpus. Remember the instruction set from the previous article? The former is running on an emulator so it shows x64 instructions, while the latter is running on a real machine so it shows ARM64 instructions. Can you find similarities and differences between them by comparing the pictures?

All code in a system consists of functions or methods, even those defined in classes and in blocks. At compile time, the system will compile and link all defined function methods into machine instructions and save them in the code segment of the file. Machine instructions within a function are stored consecutively, but not necessarily between functions.
Each assembly instruction in the above picture is uniquely corresponding to a machine instruction. It should be noted here that although the assembly code is displayed, the actual storage and operation is machine code, but we use assembly code to show it is easy to read and understand.
The address before each instruction indicates the memory address at which the instruction is run. You might ask, isn’t the instruction on the CPU? Yes, the instructions are executed on the CPU, but the storage is still in memory or disk. There is a register called IP (Intel) or PC (ARM) on the CPU that stores the memory address of the next instruction to be executed. In this way, each instruction is read and executed from the memory address specified in IP/PC, and the next instruction of the current instruction is kept in IP/PC. This is done over and over again (the CPU actually caches some of the instructions in memory into the CPU’s internal cache to speed up processing, rather than reading each instruction out of memory).
The first address of each function method is the entry address of the function, which means that when we call the function, we actually tell the CPU to jump to this address and execute. More specifically, we set the value of the IP/PC register to the entry address of the function. For methods in the OC class, the method entry address is the IMP of the method.
In the simulator you will find that each instruction is different in length, ranging from 1 byte to 7 bytes, so you will see that each instruction has a different offset, while in real life you will find that each instruction is always fixed at 4 bytes. This is actually a very significant difference between CISC and RISC instruction sets: CISC instruction is of variable length while RISC instruction is of fixed length. You will also find that the amount of assembly code in the simulator is lower than that in the real machine. This is also the difference between CISC and RISC instructions: CISC instructions are complex and numerous, and one instruction can accomplish more than RISC instructions. RISC has simple instructions, so some functions require multiple instructions to complete.
Comments in assembly mode are made by; It starts with a. When exploring the internal implementation in assembly language, people recommend watching the AT&T assembly in the simulator because the assembly comments running in the simulator are more detailed than the assembly instructions running in real machine mode.
The format of each assembly instruction is always: opcode, operand 1, operand 2, operand 3. Operands are either constants, memory, or memory addresses. The operands you see in RAX,RSI,RDI,R0,R1… These are all registers in the CPU (I’ll talk more about registers in the next article). And in the lower left part of XCODE we can see the values of all the registers in the current CPU, you can print and modify them.

The breakpoint

Some of you might say why do I turn on assembly mode and I still don’t see assembly code? That’s because you didn’t set breakpoints for your code! What is a breakpoint? Why does a breakpoint stop a program? In general, the CPU always executes instructions and completes tasks in sequence. When a task is being executed, if a special event or a task of higher priority is encountered, it needs to interrupt the current executing code and execute the code of higher priority. This mechanism is called interrupt. Interrupts are hard interrupts caused by external hardware device events, and the CPU also provides a soft interrupt instruction. When a soft interrupt instruction is executed in code, the program pauses and the CPU gives the operating system permission to execute the interrupt handler. When we set the breakpoint somewhere in the program or an instruction set breakpoint, the system will save the breakpoint instruction to a temporary breakpoint list, at the same time will replace breakpoint instructions for soft interrupt instruction, so that when the program is running to break point for execution is soft interrupt instruction, and lead to system calls, and perform the soft interrupt handler, Breakpoint soft interrupt handlers for the user processing operations, such as when the user presses on the keyboard Ctrl + F7, soft interrupt handler will save real breakpoint instruction in temporary breakpoint list to restore to the specified memory, at the same time under the minor execution of instructions to the true, then perform real instruction again, This completes the continuation of the instruction at the breakpoint. (To understand the specific implementation of breakpoints, you need to have some assembly knowledge, which will not be expanded here, I will explain the implementation principle of breakpoints in detail in a special chapter later).

Symbol breakpoint

When we set a breakpoint somewhere in the program code or a breakpoint somewhere in the instruction, the program will pause when it reaches the breakpoint. If we are in assembly mode, you will see assembly breakpoints, and if you are in source mode, you will see source breakpoints. In addition to setting breakpoints in code, we can also set symbolic breakpoints. Let’s look at the following three application scenarios:

The frame value of a view in our program keeps getting changed at runtime for some unknown reason, but you just don’t know where the frame change was done. One solution is to override the setFrame method and set a breakpoint to debug and see when frame is called.
The crash problem occurred when a system method was called in our online program, but we could not see the source code because it was a system method, so we could not analyze the crash problem (for example, we encountered many crash problems without context).
If I know assembly language, I would like to study how a method of system framework is implemented.

I do not know how we will solve the three problems above? All three scenarios can be done with symbolic breakpoints. In general, we can set breakpoints somewhere in the source code to debug the program. In the case of no source code, we can set symbolic breakpoints to debug and run the program. Setting a symbolic breakpoint is simple. To Create Symbolic Breakpoints, go to XCODE’s menu: Debug -> Breakpoints -> Create Symbolic Breakpoint or the shortcut: Option + Command + \ :

When a symbolic breakpoint is set up, a breakpoint is created before execution of a function or method with the same symbolic name, so that the internal implementation of a method can be seen. It also helps us analyze and reproduce crashes that occur without context and outside of source code to help us locate problems. Here is the assembly language content of the two symbolic breakpoints we see after running them:

VCTest1`-[ViewController setA:]:
->  0x1029855e0 <+0>:  sub    sp, sp, #0x20 ; =0x20
    0x1029855e4 <+4>:  adrp   x8, 4
    0x1029855e8 <+8>:  add    x8, x8, #0x70 ; =0x70
    0x1029855ec <+12>: str    x0, [sp, #0x18]
    0x1029855f0 <+16>: str    x1, [sp, #0x10]
    0x1029855f4 <+20>: str    w2, [sp, #0xc]
    0x1029855f8 <+24>: ldr    w2, [sp, #0xc]
    0x1029855fc <+28>: ldr    x0, [sp, #0x18]
    0x102985600 <+32>: ldrsw  x8, [x8]
    0x102985604 <+36>: add    x8, x0, x8
    0x102985608 <+40>: str    w2, [x8]
    0x10298560c <+44>: add    sp, sp, #0x20 ; =0x20
    0x102985610 <+48>: ret    

-----------------

libsystem_c.dylib`abs:
->  0x1813dd984 <+0>: cmp    w0, #0x0 ; =0x0
    0x1813dd988 <+4>: cneg   w0, w0, mi
    0x1813dd98c <+8>: ret    
Copy the code

Have you seen the internal implementation of the property setA and the internal implementation of the function ABS?

debugging

Debugging program is a programmer should master the most basic time, here will not introduce other detailed debugging commands and methods, many other articles are introduced. Here are a few menus and shortcuts to step through when debugging code:

In source mode

F7: Code steps, jumping inside a function when it encounters a call. F6: Code executes separately and does not jump inside a function when it encounters a call. F8: Jumps out of function execution and returns to the next line of code that calls this function.Copy the code

In assembly mode

Control + F7: Instructions step into a function when it encounters a call. Control + F6: The instruction executes separately and does not jump inside a function when it encounters a call.Copy the code

Switching between multiple threads:

 control + shift+ F7: Switch to the current thread and execute the single step instruction. control +shift+ F6: Switches to the current thread and jumps to the next instruction of the caller of the function.Copy the code

When a breakpoint occurs during the debugging run, we can enter various debugging commands on the LLDB command line. The rest will not be introduced, but the expr command. The expr command is actually a complete version of P or Po. In addition to display, the expr command can also be used to modify data, call methods and other powerful capabilities. Here are some common expR methods:

Expr | expression / / show variables or the value of the expression. expr-fH - | variable expression / / in hexadecimal format shows the content of the variable or expression expr-fB - variable | expression / / display the contents of the variable or expression in binary format. Expr -o -- oc object // equivalent to Po oc object expr -p 3 -- oc object // Enhanced version of the above command, it will also display the structure of the object's data members, the specific P after the number is the level you want to display. Expr my_struct->a = my_array[3] Expr (char*)_cmd // Displays the name of an OC method. Expr (IMP)[self methodForSelector:_cmd] // Executes a method call.Copy the code

Viewing memory Address

When a program runs, the operating system builds a process for it and a virtual memory space. The operating system divides the virtual memory space in the process into code storage area, global data storage area, heap storage area, stack storage area, and so on. Each region has a special purpose: a code store area holds the code portion of the program (also known as an image image); The global data store contains global data, constants, and descriptions (for example, all OC class definitions in runtime are also stored in this area). The heap storage area is used to allocate heap memory dynamically. The stack storage area holds local variables in the function. So you can see that both code and data are kept in memory at run time. The size of the memory space that each process can access depends on the operating system. Generally speaking, the memory space of each process in a 32-bit operating system is 2^32 = 4GB. On a 64-bit operating system, memory space per process is 2^64 = 4TB. Note that this space is virtual accessible space, not real physical memory accessible space. The operating system internally converts virtual space to real physical space through paging mapping. The virtual memory space of a process is a linear space that can be continuously stored and accessed. In order to access this memory space, the operating system encodes it. This code is the address of the memory. Addresses are also called Pointers, so a pointer to a variable is its address in memory. To better understand the concepts of memory and address, you can think of memory as an array and address as the index used to access the elements of the array. We always read and write elements of an array by index, and the CPU accesses data in memory by memory address. Memory addresses in the process are always encoded from 0 and incremented in bytes up to the upper limit of virtual memory space. Debug Workflow -> View Memory or by using shortcut keys: Shift +command + m to call the memory view interface:

The image above just shows the memory location and layout of all the method names of a class. We can see that we can easily view the memory address menu function to understand and analyze the code and data structure in memory. You can enter any memory address you want to view in the address entry field. For example, if you want to view the machine instruction of a function code, you just need to enter the address of the function in assembly mode into the address bar of the memory view interface, and all the machine instruction bytecode of the function code will be displayed. The other thing to note here is that memory addresses are sorted in bytes from the lowest level, so for values of type int, for example, we read from the highest level to the lowest level.

Calculator application

Code and addresses, as well as some data, are often displayed in hexadecimal format during program debugging. Data processing, especially computing address offset, is presented in hexadecimal format. You can do this in LLDB using expr or p commands. If you prefer an interface, you can launch the MAC OS app: Calculator to handle various calculations. All you have to do is select programmatic from the display menu, and the programmatic interface looks like this (don’t tell me you don’t know how to use these functions as a programmer) :

The BC command

If you prefer to do calculations from the command line, you can also be introduced to a system-provided imperative calculation tool: BC. The official definition of this tool is: An Arbitrary precision Calculator Language. We can enter BC interactively: BC-I

When using the BC you can through the ibase = [16] 2 | | | 10 8 hexadecimal values to specify the input Numbers, can be specified by obase = [16] 2 | | | 10 8 to specify the value of the output digital display format. You can also specify the number of decimal places to output with scale=n, and you can use expressions, functions, operators, and even variables and functions. It can be seen that BC is not only the function of calculation so simple, you can use BC to write procedures!! For details about how to use BC, you can run man BC on the terminal to view the USER manual of BC. Here is some code written in BC (after executing the bc-i command) :

sum = 0
for (i = 0; i < 100; i++)
{
   sum += i
}
sum

Copy the code

👉 [Back to directory]

Welcome to visit myMaking the address

Deep into the iOS system of XCODE for assembly support

XCODE assembler mode switch

The breakpoint

Symbol breakpoint

debugging

Viewing memory Address

Calculator application

The BC command

Related Posts

IOS advancing-a low-level exploration of classes

IOS Low-level exploration – classes

Runtime messaging mechanism