preface
From the beginning of this article, will share with you about iOS reverse security attack and defense and other related knowledge points, before the analysis of reverse, we must master the relevant knowledge points about assembly, as a preparation for reverse learning. This article will first explain some basic knowledge points of assembly, I hope you can master.
First, the assembly
1.1 Compilation of Development History
Let’s take a look at the history of assembly language 👇
Machine language
Machine language 👉 Machine instructions made up of zeros and ones. For example, the following directive 👇
- Add: 0100, 0000
- Reduction: 0100 1000
- Multiply: 1111 0111 1110 0000
- Except: 1111 0111 1111 0000
Assembly language
Assembly languages, called assembly languages, use mnemonics instead of machine languages, such as 👇
- Add:
INC EAX
Through the compiler 0100 0000 - Subtraction:
DEC EAX
Through the compiler 0100 1000 - By:
MUL EAX
Through the compiler 1111 0111 1110 0000 - In addition:
DIV EAX
Through the compiler 1111 0111 1111 0000
So what is a mnemonic? You can think of mnemonics as symbols that help us translate the instructions for addition, subtraction, multiplication and division into machine language zeros and ones.
A high-level language
Next comes the high-level programming language we use in our daily development, called high-level Programming Language. C++ Java OC Swift, for example, are natural languages that are closer to human beings. 👇
- Add:
A+B
Through the compiler 0100 0000 - Subtraction:
A-B
Through the compiler 0100 1000 - By:
A*B
Through the compiler 1111 0111 1110 0000 - In addition:
A/B
Through the compiler 1111 0111 1111 0000
To sum up, a developmental process of language is roughly 👇
Machine languages 0 and 1 --> mnemonics --> compilers (responsible for reading mnemonics) produce assembly --> high-level languages (natural languages close to humans)
#### Additional: Code execution process The code execution process is shown as 👇
The picture above shows:
- Assembly and machine are
One to one correspondence
Each machine instruction has an assembly instruction corresponding to itcompile
: Assembly language can be compiled into machine languagedecompiling
Machine language Can be disassembled to get assembly language
A high-level language
Can be achieved bycompile
getAssembly language machine language
, butAssembly language machine language
almostCan't be
Back intoA high-level language
1.2 Characteristics of assembly language
- can
Direct access and control
Various hardware devices, such as memory, CPU, etc., canmaximum
playhardware
The function of the - To be able to
Is not affected by
Compiler limitations on the generatedBinary code
forComplete control
- The target code
Short, less memory, fast execution
- Assembly instructions are machine instructions
mnemonics
Corresponding to machine instructions. Each CPU has its own machine instruction set, assembly instruction set, so assembly languageNot portable
- knowledge
Too much
Developers need toFor hardware such as CPU
I know the structure,Not easy to write, debug, and maintain
Does not distinguish between
Case, like MOV is the same as MOV
use
Amway again, learning assembly language can do what 😂👇
- Write drivers, operating systems (such as some key parts of the Linux kernel)
- A program or snippet of code that requires high performance and can be mixed with a high-level language (inline assembly)
- Software security
- Virus analysis and control
- Reverse, shell, unshell, crack, plug-in, avoid killing, encryption and decryption, vulnerability, hacker
- The best starting point and most efficient way to understand the entire computer system
- Lay the foundation for writing efficient code
- Get to the bottom of the code
Let’s finish with 13
The lower the more simple! Real programmers need to know a very important language, assembly!
At present, there are many kinds of assembly languages discussed
- 8086 assembly (the 8086 processor is a 16-bit CPU)
- Win32 compilation
- Win64 assembly
- ARM Assembly (Embedded, Mac, iOS)
We use ARM assembly 👇 in iPhone, but there are differences between different devices, due to the different CPU architecture.
architecture | equipment |
---|---|
armv6 | IPhone, iPhone2, iPhone3G, first generation, second generation iPod Touch |
armv7 | iPhone3GS, iPhone4, iPhone4S,iPad, iPad2, iPad3(The New iPad), iPad mini, iPod Touch 3G, iPod Touch4 |
armv7s | iPhone5, iPhone5C, iPad4(iPad with Retina Display) |
arm64 | IPhone5S, iPhoneX, iPad Air, iPad mini2 |
Two, necessary common sense points
Before learning assembly well, we need to understand the hardware structure such as CPU
The picture above shows the execution of the APP
- The most important hardware related is CPU/ memory
- In assembly, most instructions are CPU – and memory-specific
Added: Image file
As we know, our application programs are called executable files in disk, such as EXE on PC, EXC on iOS, etc., and this executable file is an image file when loaded into memory. The image file is exactly the same as the executable file, because it is copied from disk to memory, so it is called the image.
2.1 the bus
What is a bus? Take a look at 👇
The CPU chip for the Apple A11 is shown above, and there are many of them for each onepin
, thesepin
andThe bus
Connected, the CPU passesThe bus
withExternal devices
forinteraction
So the bus is between CPU and memorybridge
.
Bus: a collection of wires.
Bus classification
Buses are divided into three main categories, as shown in the following figure 👇
- Address bus: The CPU uses the address bus to specify storage units
- Data bus: data transfer channel between CPU and memory/other components
- Control bus: THE CPU controls external devices through the control bus
For example
The figure above shows the CPU reading data from unit 3 of memory. The general process is 👇
- The CPU must first find the memory address before it can read or write data in memory. The CPU
The address bus
That will be3
This address is passed to memory, i.eaddressing
To the memory ofNo. 3 unit
; - You need to manipulate the data in unit 3, and you need to make sure that yes
read
orwrite
. The CPUControl bus
Tells memory what to do, such as yes in the exampleread
; - The memory receives the operation that the CPU wants to perform
3
Unit number data throughThe data bus
Pass to the CPU.
At this point, the entire PROCESS of CPU and memory interaction is over.
2.1.1 Address bus
The width of the address bus determines the addressing capability. For example, if the address bus width of the 8086 is 20, then the addressing capacity is 2 ^ 20 = 1M (1048576), which is a unit of quantity.
The difference between quantitative and numerical units:
1 m and 1 MB
- 1M is a unit of quantity, the size of which is 1048576.
- 1MB is a numerical unit, for example, 👉 the unit of memory is B (byte). A commercial memory module is 512MB, which is 512x1024x1024 bytes. Each byte occupies 8 bits.
Please look at the memory icon 👇
2.1.2 Data Bus
The width of the data bus determines the amount of data a CPU can transmit per time, that is, the speed of data transfer.
Each of the
Data lines can only be transmitted oncea
Binary data, such as one 8-bit binary data (i.e., one byte of data) transmitted by 8 data lines at a time- The data bus is
Sum of data lines
For example, the data bus width of the 8086 is 16, so a maximum of 2 bytes of data can be transferred at a time.
Throughput is also known as throughput, which is the total amount of data transferred by the CPU in a single session. It is the same as width.
2.1.3 Control bus
- The width of the control bus determines the CPU’s ability to control other devices
Ability to control
To have aHow many kinds of
Control, that is, the ABILITY of the CPU to control external devices - The control bus is the sum of the number of control lines
2.2 memory
In the bus analysis above, memory was mentioned, but how exactly does memory interact with the CPU and other devices? Let’s take a look at the physical structure of memory 👇
The figure above shows 👇
- The CPU is connected to other hardware devices via a bus
- Memory is
RAM main storage
,RAM main memory
The following figure shows the memory divided by physical address, including main memory, video memory address, video card address, network card address
The low address in memory is for the user, and the high address is for the system 👇
The size of the memory address space is limited by the CPU address bus width. For example, the address bus width of the 8086 is 20, and 2^20 different memory units can be located (that is, the memory address range is 0x00000 to 0xFFFFF). Therefore, the memory space of the 8086 is 1MB
- 0x00000~0x9FFFF: Main memory,
Can read but write
- 0xA0000 to 0xBFFFF: Writes data to the video memory, which is output to the display by the video card.
Can read but write
- 0xC0000~0xFFFFF: Stores various hardware/system information,
read-only
2.3 base
We are most familiar with the base system is decimal, contact programming, and know the binary, octal, hexadecimal and so on, the specific meaning of 👇
- Octal by
Eight symbols
Composition :0 1 2 3 4 5 6 7Every eight into one
- The decimal system by the
Ten symbols
Composition :0 1 2 3 4 5 6 7 8 9Dot into one
- And so on: base N is by
N symbols
Composition:Every N into one
2.3.1 Obstacles in learning base system
Many people do not learn the base system, the reason is to rely on the decimal system to consider other bases, the need to calculate the time is always converted to decimal first, this learning method is wrong! Why convert to decimal? We convert simply because we are most familiar with decimal.
Every base system is perfect, and the best way to learn base system is to forget about the decimal system and the conversion between bases!
##### exercise: 1 + 1 equals 3 in the case of ____? Of course one would answer that 👉 is equal to 3 in the case of a miscalculation! Ha ha! Let’s throw away the established decimal rules and redefine 10 symbols, such as 👇
A B E S 7 Every ten into one
Now, is 1 plus 1 equal to 3? Definitely YES! So what is the purpose of this? The traditional and custom decimal notation are different. If we do not tell others the table of 10 symbols, others can not get our specific data, so we can use a custom symbol table to encrypt!
To sum up 👇
The decimal system consists of ten symbols, every ten into one, the symbol can be customized!!
2.3.2 Operation rules of base
Do an exercise 👉 octal arithmetic
2 + 3 = __, 2 * 3 = __, 4 + 5 = __, 4 * 5 = __. 277 + 333 = __, 276 * 54 = __, 237-54 = __, 234/4 = __.Copy the code
Octal addition table
0 12 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27... 1+1 = 2 1+2 = 3 2+2 = 4 1+3 = 4 2+3 = 5 3+3 = 6 1+4 = 6 3+4 = 7 4+4 = 10 1+5 = 6 2+5 = 10 4+5 = 11 5+5 = 5 12 1+6 = 7 2+6 = 10 3+6 = 11 4+6 = 12 5+6 = 13 6+6 = 14 1+7 = 10 2+7 = 11 3+7 = 12 4+7 = 14 6+7 = 15 7+7 = 16Copy the code
Octal multiplication table
0 12 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27... 1*1 = 11 *2 = 2 2*2 = 4 1*3 = 3 2*3 = 6 3*3 = 11 1*4 = 4 2*4 = 10 3*4 = 14 4*4 = 20 1*5 = 5 2*5 = 12 3*5 = 17 4*5 = 24 5*5 = 31 1*6 = 6 2*6 = 14 3*6 = 22 4*6 = 30 5*6 = 36 6*6 = 44 1*7 = 7 2*7 = 16 3*7 = 25 4*7 = 34 5*7 = 43 6*7 = 52 7*7 = 7 61Copy the code
Actual combat: four operations
277 236 276 234 + 333-54 * 54/4 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --Copy the code
Please calculate!
2.3.3 Short form of binary
Binary: 1011 1011 111 00 Three binary groups: 101 110 111 100 octal: 5 6 7 4 Four binary groups: 1011 1011 1100 Hexadecimal: B B CCopy the code
Write from 0 to 1111 using binary: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 So I’m going to change it to A simpler notation 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 A, B, C, D, E, F that’s hexadecimal
2.4 Width of data
Mathematical numbers have no size limit and can be infinitely large. In computers, however, due to hardware constraints, data is limited in length (we call it data width), and any data exceeding the maximum width is discarded. Sample 👇
#import <UIKit/UIKit.h> #import "AppDelegate.h" int test(){ int cTemp = 0x1FFFFFFFF; return cTemp; } int main(int argc, char * argv[]) { printf("%x\n",test()); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class])); }}Copy the code
Breakpoint debugging results show that cTemp overflowed.
You can also view 👇 by using the obtained address and entering the address in the Debug Workflow -> ViewMemory menu bar
2.4.1 Common data widths in computers
- Bit: A Bit is a binary Bit, i.e., 0 or 1
- Byte: A Byte consists of eight bits, the smallest memory unit Byte
- Word: A Word consists of two bytes (16 bits). The second byte is called high byte and low byte respectively
- 2. A DoubleWord consisting of two words (32 bits).
So the computer stores data, and it’s divided into signed and unsigned numbers. So see the picture below about this to understand! 👇
- Unsigned number, direct conversion
- A signed number, in which the sign is placed in the first bit, 0 is a positive number, and 1 is a negative number:
Positive numbers: 0 1 2 3 4 5 6 7 Negative numbers: F E D B C A 9 8 indicates: -1 -2 -3 -4 -5 6-7-8Copy the code
practice
- Now there are 10 base numbers: 2,9,1,7,6,5,4, 8,3, A enter 1 into 10 so: 123 + 234 = ____
AA6
We can write out our custom decimal notation, and then look it up in the table, one into ten, 👇
Decimal: 0 1 2 3 4 5 6 7 8 9 Custom: 2 9 1 7 6 5 4 8 3 A 92 99 91 97 96 95 94 98 93 9A 12 19 11 17 16 15 14 18 13 1A 72 79 71 77 76 75 74 78 73 7A 62 69 61 67 66 65 64 68 63 6A 52 59 51 57 56 55 54 58 53 5A 42 49 41 47 46 45 44 48 43 4A 82 89 81 87 86 85 84 88 83 8A 32 39 31 37 36 35 34 38 33 3A 922Copy the code
The answer can be found by performing a conversion against regular decimal notation.
- Now there are 9 base numbers and the 9 symbols are: 2,9,1,7,6,5,4, 8
9926
Similarly 👇
Decimal: 0 1 2 3 4 5 6 7 8 Custom: 2 91 76 5 4 8 3 92 99 91 97 96 95 94 98 93 12 19 11 17 16 15 14 18 13 72 79 71 77 76 75 74 78 73 62 69 61 67 66 65 64 68 63 52 59 51 57 56 55 54 58 53 42 49 41 47 46 45 44 48 43 82 89 81 87 86 85 84 88 83 32 39 31 37 36 35 34 38 33 922Copy the code
2.5 CPU& Register
Internal components are connected by a bus 👇
The CPU has controllers, arithmetic units and registers. The function of register is the temporary storage of data.
What is a register? What does it do? 👇
CPU computing speed is very fast. For performance purposes, the CPU creates a small temporary storage area and copies data from the memory to this small temporary storage area before performing operations. We call this small temporary storage area a register.
For ARM64 cpus, a register beginning with an X indicates a 64-bit register, and a register beginning with a W indicates a 32-bit register. There are no 16 – and 8-bit registers available for access and use. The 32-bit register is the lower 32-bit part of the 64-bit register and does not exist independently. Note the following 2 points 👇
- For programmers, the most important parts of CPU are registers, which can be controlled by changing the contents of registers
- The number and structure of registers are different for different cpus
2.5.1 Floating point and vector registers
Because of the storage of floating point numbers and the special nature of their operations, floating point registers are provided in the CPU to handle floating point numbers
- Floating point register 64-bit: D0-D31 32-bit: S0-S31
The current CPU support vector operation.(vector operation in the graphics processing related field is very much used) for the support vector calculation system also provides a number of vector registers.
- Vector register 128 bits :V0-V31
2.5.2 Universal Register
- The general purpose register is also called
Data address register
Usually used to do data calculationsTemporary storage, do sum, count, address save
, and other functions. These registers are defined primarily for use in theStore operands in CPU instructions
In the CPU as someThe conventional variable
To use. - ARM64 have
32
Four 64-bit general purpose registersX0 to x
, as well asXZR(Zero register)
.These general purpose registers are sometimes also available
Specific purpose ‘.- W0 through W28 these are 32 bits. Because 64-bit cpus are 32-bit compatible, only the lower 32 bits of the 64-bit register can be used.
- For example, w0 is the lower 32 bits of x0!
Note: For those of you who have read the 8086 assembly, there are special register segments: CS,DS,SS,ES registers to hold the base addresses of these segments. This belongs to the Intel architecture CPU, but not in ARM.
Typically, the CPU stores the data in memory into a general purpose register, and then performs operations on the data in the general purpose register. Look at the following example 👇 suppose you have a chunk of red memory space in memory with a value of 3, and now you want to increase its value by 1 and store the result in blue memory space 👇
- The CPU first places the value of the red memory space in the X0 register:
Mov X0 red memory space
- Then add register X0 to 1:
The add X0, 1
- Finally, assign the value to the memory space:
Mov Blue memory space,X0
2.5.3 PC Register (Program Counter)
- The PC register is also called
Instruction pointer
Register, which indicates the current CPUThe address of the read instruction
- Instructions and data in memory or on disk
There is no
Any difference, it isBinary information
- The CPU works by treating some information as instructions and some as data, assigning different meanings to the same information
- For example, 1110 0000 0000 0011 0000 1000 1010 1010
- Can be regarded as data 0xE003008AA
- Can also be used as instruction mov x0, x8
- On what basis does the CPU interpret information in memory as instructions?
- The CPU points the PC to
The contents of a memory cell
As instruction - If something in memory ever
Performed by the CPU
Then the memory unit in which it is located mustBy the PC to
pass
- The CPU points the PC to
Case presentation
Below through an example to demonstrate the PC register read and write, or the above overflow example 👇
Note: real machine tuning!
#import <UIKit/UIKit.h> #import "AppDelegate.h" int test(){ int cTemp = 0x1FFFFFFFF; return cTemp; } int main(int argc, char * argv[]) { printf("%x\n",test()); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class])); }}Copy the code
Run. Register types in Demo are as follows: 👇
Then let’s look at the assembly code 👇
PC register debugging
Next let’s try debugging the PC register. First print PC register address in console, instruction 👇
register read pc
The memory address of the current PC register is 0x0000000100AC9520. Press control+Step into to go to the next instruction and continue printing
The memory address of the PC register is 0x0000000100AC9524, proceed to 👇
The memory address of the PC register is 0x0000000100AC9528.
So, an instruction takes up 4 bytes of memory.
Write In addition to reading the PC register address, of course you can also write.
First the breakpoint ends on the first line 👇
Then enter the write instruction 👇
register write pc 0x10260151c
In the figure above, the PC cannot register read because the breakpoint is broken. If step into, where is the breakpoint broken? 👇
Finally, through verification, it is found that the next line of 0x10260151c will be broken, indicating that the instruction corresponding to the address 0x10260151c has been executed in the PC register, and then the next instruction is moved, so the instruction in 0x102601520 is not executed.
The cache
The ARM processor A11 on the iPhoneX has a level 1 cache of 64KB and a level 2 cache of 8M.
Before executing an instruction, the CPU reads the instruction from memory into the CPU and executes it. Registers run much faster than memory reads and writes, and the CPU integrates a cache storage area for performance. When a program is running, the code and data to be executed are copied to the cache (done by the operating system). The CPU reads the instructions from the cache to execute them.
2.5.4 bl instruction
- Where the CPU executes instructions from is determined by the contents of the PC, and we can control the CPU to execute the target instructions by changing the contents of the PC
- ARM64 provides one
Mov instruction (transfer instruction)
, can be used to modify the value of most registers, such as- Mov x0,#10, mov x1,#20
- However, the MOV instruction
Cannot be used to set the value of PC
ARM64 has no such functionality - ARM64 provides additional instructions to modify PC values, collectively called
Transfer instruction
The simplest one isBl instruction
Bl command practice
Now there are two pieces of code! Assuming the program executes A first, write down the order in which the instructions are executed. What is the value of the final register X0?
_A:
mov x0,#0xa0
mov x1,#0x00
add x1, x0, #0x14
mov x0,x1
bl _B
mov x0,#0x0
ret
_B:
add x0, x0, #0x10
ret
Copy the code
Let’s go straight to the machine. First write the above assembly code into the project 👉 com+n –> empty –> asm.s (.s stands for assembly code file) 👇
Next, write the assembly code 👇
.text
.global _A,_B
_A:
mov x0,#0xa0
mov x1,#0x00
add x1, x0, #0x14
mov x0,x1
bl _B
mov x0,#0x0
ret
_B:
add x0, x0, #0x10
ret
Copy the code
Note: we need to declare the following two lines.text. global _A,_B
Then 👉 how do you get assembly code to run?
- Where you need to call, first declare the function 👇 (for example in VC).
CMD +b compile, can succeed!
- Add A breakpoint at A() execution and execute the program, enabling assembly debugging 👇
As can be seen in the figure above, the assembly is the _A function assembly code we wrote before! 👏 👏 👏
Debug assembler
Next, LLDB debugging begins, and the following steps lead up to 0x0 👇
First go to the next step and check the value of the register 👇
Then go down 👇
Then go 👇
Enter function B👇 (hold down the control key step into function B)
Proceed to the command 👇
Further down, we return to A function 👇
If you go further down, you’ll find an infinite loop between the last two instructions! According to? Look at the next one!
conclusion
Under this article start with you to understand the history and characteristics of the assembly, and then probes into the CPU and memory, and other hardware, communication bus is for the exchange of data processing, the final sample analysis is mainly focused on the register, take you through LLDB instructions for PC register read and write, debug and bl instructions, hope that we can start work in field again, deepen the impression, Thanks for reading!