preface
This article focuses on π
- Status register
- Judge, select, and cycle
I. Status Register (CPSR)
What is a status register? π
Inside the CPU, there is a special type of register (the number and structure may vary from processor to processor). This kind of register is called current Program Status Register (CPSR) in ARM.
Different from other storage areas π
- Other registers are used to store data, the whole register only has
A meaning
. - The CPSR register is
It works by position
In other words, itsEach one has a special meaning
To record specific information.
A field distribution
The CPSR register is 32-bit and its distribution is roughly as follows π
- The CPSR
Low eight
(includingI, F, T and M
) is called theControl bits
, the programCan't modify
Unless the CPU is running in privileged mode, the program can change the control bit! 8 ~ 27
forKeep a
.N, Z, C, V
Are allConditional code flag bit
. Their contents can be changed by the results of arithmetic or logical operations, and can determine whether an instruction is executed or not! Significant!
The overall distribution is shown below π
Example View CPSR
Next, let’s look at the value of CPSR in the console through a simple example, such as π
void funcA() { int a = 1; int b = 2; if (a == b) { printf("a == b"); } else { printf("error"); }}Copy the code
View the compilation π
Next, use LLDB to change the CPSR value π
As you can see, we forcibly change the execution logic of the code by changing the value of CPSR, and finally execute printf(“a == b”); .
Inline assembly
We want to write assembly code in oc file, in addition to 01- Assembly basics (1) in the new assembly. S format file this way, there is another way π inline assembly.
Embedding assembly in C/OC code requires the ASM keyword (__ASm__, __ASM can also be used). This is related to the compiler and is equivalent in iOS. In asm, the code list, output operator list, input operator list, and changed resource list are separated by three “:” π
Asm (code list: list of output operators: List of input operators: list of changed resources);Copy the code
It seems that there is no way to directly inline assembly in SWIFT, but it can be handled by bridging with OC.
Let’s take a look at the four highest bits of the CPSR, N, Z, C, V, and what each bit means π
1.1 N(Negative)(sign flag bit)
The 31st bit of the CPSR is the N π symbol flag bit. It records whether the result is negative after the relevant instruction is executed.
- If it’s minus N is equal to 1
- If it’s non-negative, N is equal to 0
Example demonstrates
Next we execute a simple assembly instruction to look at the value of the N symbol flag bit π
void funcA() {
asm(
"mov w0,#0xffffffff\n"
"adds w0,w0,#0x0\n"
);
}
Copy the code
- Before the adds instruction is executed, CPSR is
0x60000000
And the top four digits are0110
, thenN = 0
π
- After executing the Adds instruction, CPSR =
0x80000000
And the top four digits are1000
οΌN = 1
π
β οΈ Note: in the ARM64 instruction set, some instructions affect the status register, such as add, sub, or etc., they are mostly operational instructions (perform logical or arithmetic operations);
1.2Z (Zero)(0 flag)
The 30th bit of the CPSR is the flag bit Z π 0. It records whether the result is 0 after the relevant instruction is executed.
- If it’s 0, then Z is equal to 1
- If it’s not 0, then Z is equal to 0
β οΈ Note that the resulting values and Z values are opposite!
We can understand it this way: π
- In the computer
1 indicates logical truth
C.0 indicates logical false
I don’t think so. - when
The result is 0
When representsThe result is 0
This condition is positivetrue
All,Z = 1
It's not going to be 0
When representsThe result is 0
This condition is negativefalse
, soZ = 0
Example demonstrates
Take a look at the following example π
void funcA() {
asm(
"mov w0,#0x0\n"
"adds w0,w0,#0x0\n"
);
}
Copy the code
- Adds before execution π
The CPSR value is π 0x60000000, where Z is 1
- Adds executes after π
The CPSR value is π 0x40000000, where Z = 0
Modify the sample code π
void funcA() {
asm(
"mov w0,#0x0\n"
"adds w0,w0,#0x1\n"
);
}
Copy the code
You can view the change of CPSR value in this case.
Before and after the adds breakpoint of the same operation, CPSR = 0x60000000 and CPSR = 0x00000000 π correspond to N = 1 and N = 0.
1.3c (Carry)(Carry flag bit)
Bit 29 of the CPSR is the carry flag bit C π. In general, unsigned numbers are performed.
add
Operation π when the result of the operation is producedcarry
(Unsigned overflow
), C=1, otherwise C=0.subtraction
Operations (includingCMP
) π is generated when an operation is performedA borrow
(Unsigned overflow
), C=0, otherwise C=1.
For an unsigned number with bits N, the highest bit of the corresponding binary information, i.e., the n-1st bit, is its most significant bit, while the imaginary NTH bit is the higher bit relative to the most significant bit. As shown in the following figure π
Carry & borrow
We talked about carry and borrow, so let’s explain them.
Carry First look at the carry case, we know that when two data is added, it is possible to produce a carry from the most significant bit to a higher bit.
For example, two 32-bit bits of data: 0xAAAAaAAA + 0xAAAAaAAA will produce a carry. Since the carry value cannot be stored in 32 bits, we simply say that the carry value is lost. In fact, the CPU does not discard the carry system, but records it in a special register. ARM uses C bits to record the carry value. For example, the following directive π
Void funcA() {asm("mov w0,# 0xaaAAaaaa \n"//0xa's binary is 1010 "adds w0,w0,w0\n" // After execution equals 1010 << 1 carry 1 (unsigned overflow) so C is marked with 1 "Adds w0,w0,w0\n" // after execution equivalent to 0101 << 1 carry 0 (unsigned no overflow) so C marked 0 "adds w0,w0,w0\n" // repeat the above operation "adds w0,w0,w0\n"); }Copy the code
- Adds performed before
- After the first adds is executed
- After the second adds is executed
- After the third adds is executed
- After the fourth adds is executed
In conclusion, The value of the CPSR varies as follows: π 0x60000000 π 0x30000000 π 0x90000000 π 0x30000000 π 0x90000000 disk 0x90000000 The value of the high four bits is 0110 10000 10000 10000 0011 π 1001 So, the first and third time add, unsigned overflow, so C = 1; And the second time and the fourth time, unsigned, no overflow, all C is equal to 0.
Let’s see what happens when we borrow, when we subtract two numbers, it’s possible to borrow to a higher place.
For example, two 32-bit data: 0x00000000-0x000000FF will generate a debit, which is equivalent to calculating 0x100000000-0x000000FF π to obtain the value 0xFFffff01. Because we borrowed one bit, the C bit is used to mark the borrowing, so C is equal to 0. For example, the following command π
void funcA() {
asm(
"mov w0,#0x0\n"
"subs w0,w0,#0xff\n"
"subs w0,w0,#0xff\n"
"subs w0,w0,#0xff\n"
);
}
Copy the code
The same debugging as the carry case, and I’m not going to do that here, π CPSR: π 0x60000000 π 0x80000000 π 0xA0000000 π 0xA0000000 π 0xA0000000 π 0xA0000000 π 0xA0000000 π 0xA0000000 The value of the top four bits is 0110 10000 10000 10000 10000 So, the first time I subtract, I borrow one bit, unsigned overflow, C=0; The second and third subtraction, unsigned and no overflow, so C is equal to 1.
The 28th bit of CPSR is the Overflow flag bit of V π. When a signed number operation is performed, if it exceeds the range that the machine can identify, it is called an overflow.
- Positive + positive overflow for negative numbers
- Negative + negative is positive overflow
- Positive and negative numbers cannot overflow
- The overflow
V = 1
, no overflowV = 0
Since the CPU does not know if there is a sign, the CPSR register CV is marked at the same time, the C mark is unsigned and the V mark is signed. Flag bits are returned at the same time.
It’s easy to understand, so I’m not going to do an example here.
Judgment, selection and circulation
Before we talk about judgment, selection, and loops, let’s look at the five largest partitions of memory π
The stack area
: Parameters, local variables, and temporary data. Can short can writeThe heap area
: Dynamic application. Can read but writeGlobal static region
: Readable and writableThe constant area
: read onlyCode section
: Stores code, readable and executable
For detailed instructions, please refer to the five major memory partitions I wrote earlier.
2.1 Basic Knowledge
Global variables and constants
How do global variables and constants read values in assembly? Let’s start with the following example π
int g = 12;
int func(int a, int b) {
printf("test");
int c = a + g + b;
return c;
}
Copy the code
View the compilation π
In the figure above, we can see from the x0 register value that the parameter source of printf function is π
0x102c6dc54 <+20>: adrp x0, 1
0x102c6dc58 <+24>: add x0, x0, #0x5ec ; =0x5ec
Copy the code
X0 stores an address as a string constant area. How do these two instructions calculate the value 0x0000000102C6E5EC? π
adrp
πAddress Page
Memory addressTo page addressing
.0x102c6dc54 <+20>: adrp x0, 1
π Locate the start of a page of data (file)The starting position
)- will
1
The value of theLeft 12
become0x1000
- The current PC value is cleared 12 bits lower.
0x102c6dc54 -> 0x102c6d000
. 0x102c6d000 + 0x1000
get0x102c6e000
. That’s the same thing as the third position after PC is 0, the fourth position plus x0 followed by x0.
- will
0x102c6dc58 <+24>: add x0, x0, #0x5ec
π Offset address (current code offset)0x102c6e000 + 0x5ec
get0x102c6e5ec
This gives you the address of the constant string “test”.
Where, the mantissa of 0x102C6D000 means that 000~ FFF -> 0~4095 is 4096, i.e. 4k. That is, to locate the beginning of a page of data.
- MAC in 4 k pagesize
- Pagesize 16K in iOS. Here is compatible π 4K * 4 = 16K.
Let’s continue debugging and look at the assembly handling of the global variable g π
The instructions in the red box above, the same as above, eventually calculate the final value of x9 to be 0x0000000102C715f0, which is the value of the global variable G.
In summary, global variables and constants are retrieved from a base address + offset.
Disassembler tool restore
Next, we use the disassembly tool to demonstrate restoring assembly code to high-level code.
- First, compile the project to restore, go to.app, find the Macho file and drag it in
Hopper
In π
- Disassembly tool
Hopper
After the analysis is done,search
The functions to be analyzed π
- First look at the assembly code π
0000000100005c40 sub sp, sp, #0x20 ; CODE XREF=-[ViewController viewDidLoad]+76
0000000100005c44 stp x29, x30, [sp, #0x10]
0000000100005c48 add x29, sp, #0x10
0000000100005c4c stur w0, [x29, #-0x4]
0000000100005c50 str w1, [sp, #0x8]
0000000100005c54 adrp x0, #0x100006000 ; argument #1 for method imp___stubs__printf
0000000100005c58 add x0, x0, #0x5ec ; "test"
0000000100005c5c bl imp___stubs__printf
0000000100005c60 ldur w8, [x29, #-0x4]
0000000100005c64 adrp x9, #0x100009000
0000000100005c68 add x9, x9, #0x5f0 ; _g
0000000100005c6c ldr w10, [x9] ; _g
0000000100005c70 add w8, w8, w10
0000000100005c74 ldr w10, [sp, #0x8]
0000000100005c78 add w8, w8, w10
0000000100005c7c str w8, [sp, #0x4]
0000000100005c80 ldr w8, [sp, #0x4]
0000000100005c84 mov x0, x8
0000000100005c88 ldp x29, x30, [sp, #0x10]
0000000100005c8c add sp, sp, #0x20
0000000100005c90 ret
Copy the code
The “test” line above is 0x100006000 + 0x5EC = 0x1000065EC.
- You can look it up in MachOView
0x1000065ec
π
- Similarly, view
Global variable g
0000000100005c64 adrp x9, #0x100009000
0000000100005c68 add x9, x9, #0x5f0 ; _g
Copy the code
The address of global variable G is 0x1000095f0π
- Next, we restore the above assembly code to π
0000000100005c40 sub sp, sp, #0x20 ; CODE XREF=-[ViewController viewDidLoad]+76 W0 w1 0000000100005C4C STUr W0, [x29, #-0x4] 0000000100005C50 STR w1, [sp, #0x8] // Take constant "test" 0000000100005C54 ADRP x0, #0x100006000; argument #1 for method imp___stubs__printf 0000000100005c58 add x0, x0, #0x5ec ; "Test" // call printf("test") 0000000100005C5c bl imp___stubs__printf // w8 = a; 0000000100005C60 lDUR w8, [x29, #-0x4] // Take global variable g 0000000100005C64 ADRP x9, #0x100009000 0000000100005C68 add x9, x9, #0x5f0 ; _g // w10 = g 0000000100005c6c ldr w10, [x9] ; _g // w8 += w10 0000000100005c70 add w8, w8, w10 // w10 = b 0000000100005c74 ldr w10, [sp, W8 += w10 0000000100005C78 add w8, w8, w10 #0x4] 0000000100005c80 ldr w8, [sp, #0x4] 0000000100005c84 mov x0, x8 0000000100005c88 ldp x29, x30, [sp, // Stack balance 0000000100005C8c add sp, sp, #0x20 0000000100005C90 retCopy the code
After the above reduction, is not before the original stone code?… The result is exactly the same! We can operate according to this example, deepen the impression!
2.2 judgment
Next, let’s look at how the judgment logic is executed in assembly.
if
Take a look at the most familiar if judgments, such as π
int g = 12; void func(int a, int b) { if (a > b) { g = a; } else { g = b; }}Copy the code
We look directly at assembly π using Hopper, the disassembly tool above
Is it difficult to compile the above image? It’s pretty simple, you do a CMP comparison, and then you execute block 1 and block 2, which is the if and else block.
CMP instruction
CMP compares the contents of one register to the contents or immediate number of another register, but does not store the result, just the correct change flag (CPSR). The general CMP will jump after finishing the judgment, usually followed by the B command!
B jump instruction
B itself represents a jump, followed by other operations:
Instruction names | Instruction meaning |
---|---|
bl |
Jumps to execute at the label and affectsLr register The value of the. Used for function returns. |
br |
Jumps from a value in a register. |
b.gt |
The result of the comparison isGreater than (= greater than) Execute the label, otherwise no jump. |
b.ge |
The result of the comparison isGreater than or equal to Execute the label, otherwise no jump. |
b.lt |
The result of the comparison isLess than Execute the label, otherwise no jump. |
b.le |
The result of the comparison isLess than or equal to Execute the label, otherwise no jump. |
b.eq |
The result of the comparison isIs equal to Execute the label, otherwise no jump. |
b.ne |
The result of the comparison isNot equal Execute the label, otherwise no jump. |
b.hi |
The result of the comparison isUnsigned greater than Execute the label, otherwise no jump. |
b.hs |
The result of the comparison isUnsigned greater than or equal to Execute the label, otherwise no jump. |
b.lo |
The result of the comparison isUnsigned less than Execute the label, otherwise no jump. |
b.ls |
The result of the comparison isUnsigned less than or equal to Execute the label, otherwise no jump. |
β οΈ Note: CMP follows the label condition else.
Back to the example assembly π 0000000100005C7cb.loc_100005C94 executes b.lee, so block 1 is the if greater case and block 2 is the else case.
2.3 cycle
Then, we look at what instructions the logic of the loop executes in the assembly. ####2.3.1 do-while First look at the do-while loop, such as π
void func() {
int nSum = 0;
int i = 0;
do {
nSum = nSum + 1;
i++;
} while (i < 100);
}
Copy the code
Hopper assembly π
The above compilation is also very simple π
- We initialize two variables, and the address of variable two is
0xc
and0x8
, just rightEach 4 bytes
(corresponding to theThe int type
) - And then execute the loop
do
Part of the cmp
That’s the judgment condition for while,b.lt
If the condition is met, jump todo
Part of the
2.3.2 while
Again, take a look at the example π
void func() { int nSum = 0; int i = 0; while (i < 100) { nSum = nSum + 1; i++; }}Copy the code
2.3.3 the for
Finally, let’s look at the most common for loop example, π
void func() { int nSum = 0; for (int i = 0; i < 100; i++) { nSum = nSum + 1; }}Copy the code
β οΈ Note: in the assembles of for and while, conditions are judged by b. gee.
2.4 choose
Finally, let’s look at what instructions the selection logic executes in assembly. # # # # # the Switch selection
void func(int a) { switch (a) { case 1: printf("case 1"); break; case 2: printf("case 2"); break; case 3: printf("case 3"); break; default: printf("case default"); break; }}Copy the code
Case > 3
void func(int a) { switch (a) { case 1: printf("case 1"); break; case 2: printf("case 2"); break; case 3: printf("case 3"); break; case 4: printf("case 4"); break; default: printf("case default"); break; }}Copy the code
The figure above shows the assembly code analyzed in Hopper, except for w8-=1, the rest of the code block. Next, let’s take a closer look at π
- Block 1
0000000100005be0 mov x9, x8
0000000100005be4 ubfx x9, x9, #0x0, #0x20
0000000100005be8 cmp x9, #0x3
0000000100005bec str x9, [sp]
Copy the code
mov x9, x8
π the value of register X8 to register X9 isThe value of the parameter
.ubfx x9, x9, #0x0, #0x20
πubfx
“Against” means againstposition
toreset
(β οΈStart at the top
), then it is the address value of x90 to 32
Reset (0x0
The decimal system that is0
.0x20
The decimal system that is32
)cmp x9, #0x3
π Compares the values of x9 and 0x3. here0x3
isMaximum case - Minimum case
thedifference
.str x9, [sp]
π x9 is pushed, i.eX8 is 32 bits lower
Into the stack.
If b.hi is unsigned greater than, the default branch is directly jumped.
- Code block 2
0000000100005bf4 adrp x8, #0x100005000
0000000100005bf8 add x8, x8, #0xc64
Copy the code
The address stored in X8 is 0x100005C64, as shown in the adRP instruction in 02- Assembly Basics (2).
- Block 3
0000000100005bfc ldr x11, [sp]
0000000100005c00 ldrsw x10, [x8, x11, lsl #2]
0000000100005c04 add x9, x8, x10
0000000100005c08 br x9
Copy the code
ldr x11, [sp]
π fetches data from the stack to X11, which is currently X9. X9 forX8 is 32 bits lower
ldrsw x10, [x8, x11, lsl #2]
πlsl #2
Move two places to the leftX10 = x8 + (x11 << 2)
add x9, x8, x10
π is very simple,x9 = x8 + x10
br x9
Jump according to the value in register X9.
For example to calculate x9
Now let’s examine the calculation of the value in x9 (if the argument is 2, that is, the call to func(2)) π
- X9 is originally associated with x8 (the input value), so x9 is the lower 32 bits of x8, and the value of x9 is 1
subs w8, w8, #0x1
Minus 1), soldr x11, [sp]
X11 is 1 - Then after
ldrsw x10, [x8, x11, lsl #2]
, 1(x11) << 2 = 4, then 4 + 0x100005C64 (x8 address) = 0x100005C68 (x10 value), queryThe code block 5
, we know that the value of x10 is0xffffffb8
π
The decimal value of 0xffffffB8 is -72π
- Then,
add x9, x8, x10
Because theThe add instruction
Calculate thehexadecimal
, x10 is -72, the corresponding hexadecimal is0x48
, soX8 +x10 = 0x100005C64-0x48 (negative subtraction) = 0x100005C1C = x9
And finally x9 is going to be0x100005C1C
- Block 4
0000000100005c0c adrp x0, #0x100006000
0000000100005c10 add x0, x0, #0x5c8
0000000100005c14 bl imp___stubs__printf
0000000100005c18 b _func+144
0000000100005c1c adrp x0, #0x100006000
0000000100005c20 add x0, x0, #0x5cf
0000000100005c24 bl imp___stubs__printf
0000000100005c28 b _func+144
0000000100005c2c adrp x0, #0x100006000
0000000100005c30 add x0, x0, #0x5d6
0000000100005c34 bl imp___stubs__printf
0000000100005c38 b _func+144
0000000100005c3c adrp x0, #0x100006000
0000000100005c40 add x0, x0, #0x5dd
0000000100005c44 bl imp___stubs__printf
0000000100005c48 b _func+144
Copy the code
It is clear that the assembly of the code block is executing the logic of the case code block. In the above example, the resulting value of x9 is 0x100005C1C, which skips the assembly of case 2 π
- The code block 5
0000000100005c64 db 0xa8 ; '. '. DATA XREF=_func+48 0000000100005c65 db 0xff ; '.' 0000000100005c66 db 0xff ; '.' 0000000100005c67 db 0xff ; '.' 0000000100005c68 db 0xb8 ; '.' 0000000100005c69 db 0xff ; '.' 0000000100005c6a db 0xff ; '.' 0000000100005c6b db 0xff ; '.' 0000000100005c6c db 0xc8 ; '.' 0000000100005c6d db 0xff ; '.' 0000000100005c6e db 0xff ; '.' 0000000100005c6f db 0xff ; '.' 0000000100005c70 db 0xd8 ; '.' 0000000100005c71 db 0xff ; '.' 0000000100005c72 db 0xff ; '.' 0000000100005c73 db 0xff ; '. 'Copy the code
This code block is like a table, and you can look up the values stored at that address.
Compile execution summary
- First of all by
Parameter - Minimum case
getThe index in the table
index
δΈ(Maximum case - minimum case)
Unsigned comparison
Check if it’s in the interval.Not in
Range πJump defalult
in
Range πTable header address + index << 2
- Perform the mapping based on the offset address
Case logic
.
β οΈ Note: why not directly save the address in the table? 1. The address is too long. 2
The Switch summary
-
There is no need to use a table structure when the branch of the switch statement is < 3, equivalent to if.
-
When the difference between the branch constants is large, the compiler will choose between efficiency and memory, and the compiler will compile something like if-else. For example, 100, 200, 300, 400 are the same as if-else, 10, 20, 30, 40 will generate a table. So it’s best to use sequential values when writing switch logic. As for the specific logic, the compiler will optimize the selection based on case and difference. The more cases, the smaller the difference, the more consistent the value and the compiler will generate jump tables, otherwise if-else.
-
When there are many branches: a table is generated at compile time (jump tables are four bytes per address).
-
The number of cases in the jump table is maximum case – minimum case + 1 is how many possibilities there are.
-
Case branch code addresses are sequential, using the idea of space for time.
conclusion
- The status (flag) register CPSR
- The CPSR register (32 bits) in ARM64 is the status register
- The highest four bits (28,29,30,31) are the flag bits. NZ (Execution result) CV (unsigned/Signed Overflow)
- N flag (negative flag bit)
- Negative N = 1, non-negative N = 0
- Z flag (0 flag bit)
- The result is 0, Z = 1, and the result is not 0, Z = 0
- C flag (unsigned overflow)
- Addition: carry C = 1, otherwise C = 0
- Subtraction: borrow C = 0, otherwise C = 1
- V flag (signed number overflow)
- Positive + positive = negative overflow V = 1
- Negative + negative = positive overflow V = 1
- Positive and negative numbers can’t overflow V = 0
- The overflow
V = 1
, no overflowV = 0
- N flag (negative flag bit)
- Judge, select, and cycle