IOS assembly
IOS assembler voice has a lot of clocks. Common 8086 assembly, ARM assembly, x86 assembly and so on.
Arm assembly
The iOS architecture evolved from armV6 to ARMV7 and ARMV7s, and finally to the present ARM64. Both ARMV6 and later ARMV7, as well as ARM64, are the instruction set of ARM processors. Armv7 and ARMV7s are the architectures used by real 32-bit processors, while ARM64 is the architecture used by real 64-bit processors. The iPhone 5C is the last arm32-bit version of the iPhone, and after the iPhone 5S, all iPhone devices use the ARM64 architecture. Arm64 assembler is used on a real machine as follows:
TestFont`-[ViewController test]:
0x10286e574 <+0>: sub sp, sp, #0x20 ; =0x20
0x10286e578 <+4>: mov w8, #0x14
0x10286e57c <+8>: mov w9, #0xa
0x10286e580 <+12>: str x0, [sp, #0x18]
0x10286e584 <+16>: str x1, [sp, #0x10]
-> 0x10286e588 <+20>: str w9, [sp, #0xc]
0x10286e58c <+24>: str w8, [sp, #0x8]
0x10286e590 <+28>: add sp, sp, #0x20 ; =0x20
0x10286e594 <+32>: ret
Copy the code
X86 assembly
X86 assembly is the assembly language used by the simulator, and its instructions differ from the syntax of ARM64 assembly, as follows
TestFont`-[ViewController test]:
0x10b089520 <+0>: pushq %rbp
0x10b089521 <+1>: movq %rsp, %rbp
0x10b089524 <+4>: movq %rdi, -0x8(%rbp)
0x10b089528 <+8>: movq %rsi, -0x10(%rbp)
-> 0x10b08952c <+12>: movl $0xa, -0x14(%rbp)
0x10b089533 <+19>: movl $0x14, -0x18(%rbp)
0x10b08953a <+26>: popq %rbp
0x10b08953b <+27>: retq
Copy the code
Why learn ARM64 assembly?
Code debugging
In normal development, if the program crashes while debugging, it will usually locate the specific crashed code. However, sometimes there are some weird crashes, such as the crash in the system library. At this time, it is very difficult to locate the specific cause of the crash. If we use assembly debugging techniques for debugging, we may get twice the result with half the effort.
Reverse debugging
In the process of reversing other people’s App, we can use LLDB to perform breakpoint operation on memory address, but when the breakpoint is executed, LLDB shows us assembly code instead of OC code, so if you want to reverse and dynamically debug other people’s App, you need to learn assembly knowledge.
Introduction to ARM64 assembly
To learn arm64 assembly, you need to start with the following three aspects: registers, instructions and stacks.
register
Arm64 has 34 registers, as follows
Universal register
- There are 29 64-bit general purpose registers, respectively x0 ~ X28
- W0 ~ w28 (lower 32 bits of X0 ~ x28)
- X0 ~ x7 is usually used to store the parameters of the function. If there are more parameters, the stack is used to pass them
- X0 usually holds the return value of a function
Some people also call X0 ~ x30 as general purpose registers, but in actual use x29 and X30 do not have corresponding lower 32 bit registers W29 and W30, and X29 and X30 registers have special purpose, so I only say X0 ~ x28 as general purpose registersCopy the code
Program counter
The PC (Program Counter) register, which records the address of the instruction currently being executed by the CPU, registers read PC to check the value stored in the register
(lldb) register read pc
pc = 0x000000010286e588 TestFont`-[ViewController test] + 20 at ViewController.m:28
(lldb)
Copy the code
The stack pointer
- sp (Stack Pointer)
- Fp (Frame Pointer), also known as X29
Link register
The LR (Link Register) Register, also known as the X30 Register, stores the return address of the functionCopy the code
Program status register
The ARM system contains a Current Program Status Register (CPSR) and five backup Program Status registers (SPSR). The backup program status register is used for exception handling.
- Each bit of the program status register has a specific purpose, and only a few commonly used flag bits are described here
- N, Z, C and V are all conditional code flag bits, whose contents can be changed by the result of arithmetic or logical operation, and can determine whether a certain instruction is executed. The specific meanings of the conditional code symbols are as follows
instruction
Mov instruction
The MOV instruction can load another register, a shifted register, or an immediate number into the destination register
The actual use of the MOV instruction in ARM64 assembly
- Create a new test.s file in Xcode and add the following code to the test.s file
; Here. Text means the code is in the text segment. Global means to expose the following method, otherwise it cannot be called externally, and the method name starts with _. Global _test; Here is _test method _test:; Mov instruction, load the immediate number 4 into register X0 mov x0,#0x4mov x1, x0 ; In assembly instructions, RET represents the termination RET of a functionCopy the code
- Create a new test.h header file in Xcode to expose the _test method in test.s
#ifndef test_h
#define test_h
void test(void);
#endif /* test_h */
Copy the code
- Call test() in viewDidLoad, and then use Register Read x0 in LLDB to read the value stored in the register
(lldb) register read x0
x0 = 0x000000010320c980
(lldb) si
(lldb) register read x0
x0 = 0x0000000000000004
(lldb) register read x1
x1 = 0x00000001e60f3bc7 "viewDidLoad"
(lldb) si
(lldb) register read x1
x1 = 0x0000000000000004
Copy the code
By adding breakpoints to the assembly instruction, step by step debugging showed that the values of registers X0 and X1 were changed after the execution of the MOV instruction
ret
The RET instruction represents the return of the function, and it has the very important function of assigning the value of the LR (X30) register to the PC register
- Call test() in viewDidLoad, set a breakpoint on test(), and execute as follows
- Use Register Read to view the values of the LR and PC registers
(lldb) register read lr
lr = 0x00000001021965a4 TestFont`-[ViewController viewDidLoad] + 68 at ViewController.m:23
(lldb) register read pc
pc = 0x00000001021965a4 TestFont`-[ViewController viewDidLoad] + 68 at ViewController.m:23
(lldb)
Copy the code
At this point, both the LR register and the PC register are the starting addresses of the test() function
- Jump to the test() function using the si instruction
- Looking at the values of the LR and PC registers again, we find that the value of LR becomes the address of the next instruction of the test() function, that is, the next instruction that the main program needs to execute after the test() function has been executed. The PC register holds the address of the current instruction to be executed, as follows
(lldb) register read lr
lr = 0x00000001021965a8 TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb) register read pc
pc = 0x0000000102196abc TestFont`test
Copy the code
- After the test() function is executed, it is found that the program jumps to the instruction address stored in the LR register, that is, 0x00000001021965A8. At this time, the value of LR and PC register is checked again, and it is found that the address stored in the PC register has become the address stored in the LR register
(lldb) register read lr
lr = 0x00000001021965a8 TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb) register read pc
pc = 0x00000001021965a8 TestFont`-[ViewController viewDidLoad] + 72 at ViewController.m:24
(lldb)
Copy the code
The add instruction
The add instruction adds the two operands and stores the result in the target register. The details are as follows
In ARM64 assembly, the corresponding operations are x0 to X28, which execute the following assembly code
.text
.global _test
_test:
mov x0, #0x4
mov x1, #0x3
add x0, x1, x0
ret
Copy the code
X0 = 7; x0 = 7
(lldb) register read x0
x0 = 0x0000000000000004
(lldb) si
(lldb) register read x1
x1 = 0x0000000000000003
(lldb) si
(lldb) register read x0
x0 = 0x0000000000000007
Copy the code
Sub instruction
The sub instruction subtracts operand 1 from operand 2, then subtracts the inverse of the C condition flag in CPSR, and stores the result in the target register
CMP instruction
The CMP instruction compares the contents of one register with the contents of another register or the immediate number, and updates the value of the conditional flag bit in the CPSR register
- Execute the following assembly code
.text
.global _test
_test:
mov x0, #0x4
mov x1, #0x3
cmp x0, x1
ret
Copy the code
- The CPSR register values are printed as follows before and after the CMP code is executed
(lldb) register read cpsr
cpsr = 0x60000000
(lldb) si
(lldb) si
(lldb) si
(lldb) register read cpsr
cpsr = 0x20000000
(lldb)
Copy the code
It can be seen that after the CMP operation, the CPSR register value is changed to 0x20000000. After converting to hexadecimal, the 32-bit flag bit is as follows
- Modify the assembly code by switching the x0 and X1 registers as follows
_test:
mov x0, #0x4
mov x1, #0x3
cmp x1, x0
ret
Copy the code
- The value of the CPSR register is read again before and after the CMP code executes
(lldb) register read cpsr
cpsr = 0x60000000
(lldb) s
(lldb) register read cpsr
cpsr = 0x80000000
(lldb)
Copy the code
At this point, the CPSR register value becomes 0x80000000, converted to hexadecimal, as follows
B instruction
B instruction is the simplest jump instruction, once encountered B instruction, the program will unconditionally jump to B after the specified target address for execution.
BL instruction
BL instruction is another jump instruction, but before jumping, it will first store the next instruction of the current marker bit in register LR (X30), then jump to the marker and start executing the code. When ret is encountered, the address stored in LR (X30) will be reloaded into the PC register. Causes the program to return the next instruction of the marker bit to continue execution.
- Start by executing the following assembly code
.text
.global _test
label:
mov x0, #0x1
mov x1, #0x8
ret
_test:
mov x0, #0x4
bl label
mov x1, #0x3
cmp x1, x0
ret
Copy the code
- The breakpoint to the BL label instruction reads the values of the LR register and PC register
- Execute bl label, jump to label, read the LR (X30) register and PC register again, at this time, it will find that the address stored in LR (X30) register has changed to mov X1, #0x3 memory address
- After executing all the code in the label, the program returns to the address stored in the LR register, that is, mov X1, #0x3, and the address stored in the PC register is also changed to the address of mov X1, #0x3.
Addressing mode
The so-called addressing mode is the way that the processor looks for the physical address according to the address information given in the instruction. Currently, ARM supports the following common addressing modes.
Addressing immediately
Register addressing
Register indirect addressing
Base address change address addressing
Multi-register addressing
Relative addressing
Stack addressing
Stack operation
Type of function
- A leaf function is one in which no other functions are called
- Non-leaf functions are values in this function that have calls to other functions