preface

This article focuses on πŸ‘‡

  1. Status register
  2. Judge, select, and cycle

I. Status Register (CPSR)

What is a status register? πŸ‘‡

Inside the CPU, there is a special type of register (the number and structure may vary from processor to processor). This kind of register is called current Program Status Register (CPSR) in ARM.

Different from other storage areas πŸ‘‡

  • Other registers are used to store data, the whole register only hasA meaning.
  • The CPSR register isIt works by positionIn other words, itsEach one has a special meaningTo record specific information.
A field distribution

The CPSR register is 32-bit and its distribution is roughly as follows πŸ‘‡

  • The CPSRLow eight(includingI, F, T and M) is called theControl bits, the programCan't modifyUnless the CPU is running in privileged mode, the program can change the control bit!
  • 8 ~ 27forKeep a.
  • N, Z, C, VAre allConditional code flag bit. Their contents can be changed by the results of arithmetic or logical operations, and can determine whether an instruction is executed or not! Significant!

The overall distribution is shown below πŸ‘‡

Example View CPSR

Next, let’s look at the value of CPSR in the console through a simple example, such as πŸ‘‡

void funcA() { int a = 1; int b = 2; if (a == b) { printf("a == b"); } else { printf("error"); }}Copy the code

View the compilation πŸ‘‡

Next, use LLDB to change the CPSR value πŸ‘‡

As you can see, we forcibly change the execution logic of the code by changing the value of CPSR, and finally execute printf(“a == b”); .

Inline assembly

We want to write assembly code in oc file, in addition to 01- Assembly basics (1) in the new assembly. S format file this way, there is another way πŸ‘‰ inline assembly.

Embedding assembly in C/OC code requires the ASM keyword (__ASm__, __ASM can also be used). This is related to the compiler and is equivalent in iOS. In asm, the code list, output operator list, input operator list, and changed resource list are separated by three “:” πŸ‘‡

Asm (code list: list of output operators: List of input operators: list of changed resources);Copy the code

It seems that there is no way to directly inline assembly in SWIFT, but it can be handled by bridging with OC.

Let’s take a look at the four highest bits of the CPSR, N, Z, C, V, and what each bit means πŸ‘‡

1.1 N(Negative)(sign flag bit)

The 31st bit of the CPSR is the N πŸ‘‰ symbol flag bit. It records whether the result is negative after the relevant instruction is executed.

  • If it’s minus N is equal to 1
  • If it’s non-negative, N is equal to 0
Example demonstrates

Next we execute a simple assembly instruction to look at the value of the N symbol flag bit πŸ‘‡

void funcA() {
    asm(
        "mov w0,#0xffffffff\n"
        "adds w0,w0,#0x0\n"
        );
}
Copy the code

  • Before the adds instruction is executed, CPSR is0x60000000And the top four digits are0110, thenN = 0πŸ‘‡

  • After executing the Adds instruction, CPSR =0x80000000And the top four digits are1000 ,N = 1πŸ‘‡

⚠️ Note: in the ARM64 instruction set, some instructions affect the status register, such as add, sub, or etc., they are mostly operational instructions (perform logical or arithmetic operations);

1.2Z (Zero)(0 flag)

The 30th bit of the CPSR is the flag bit Z πŸ‘‰ 0. It records whether the result is 0 after the relevant instruction is executed.

  • If it’s 0, then Z is equal to 1
  • If it’s not 0, then Z is equal to 0

⚠️ Note that the resulting values and Z values are opposite!

We can understand it this way: πŸ‘‡

  • In the computer1 indicates logical truthC.0 indicates logical falseI don’t think so.
  • whenThe result is 0When representsThe result is 0This condition is positivetrueAll,Z = 1
  • It's not going to be 0When representsThe result is 0This condition is negativefalse, soZ = 0
Example demonstrates

Take a look at the following example πŸ‘‡

void funcA() {
    asm(
        "mov w0,#0x0\n"
        "adds w0,w0,#0x0\n"
        );
}
Copy the code
  • Adds before execution πŸ‘‡

The CPSR value is πŸ‘‰ 0x60000000, where Z is 1

  • Adds executes after πŸ‘‡

The CPSR value is πŸ‘‰ 0x40000000, where Z = 0

Modify the sample code πŸ‘‡

void funcA() {
    asm(
        "mov w0,#0x0\n"
        "adds w0,w0,#0x1\n"
        );
}
Copy the code

You can view the change of CPSR value in this case.

Before and after the adds breakpoint of the same operation, CPSR = 0x60000000 and CPSR = 0x00000000 πŸ‘‰ correspond to N = 1 and N = 0.

1.3c (Carry)(Carry flag bit)

Bit 29 of the CPSR is the carry flag bit C πŸ‘‰. In general, unsigned numbers are performed.

  • addOperation πŸ‘‰ when the result of the operation is producedcarry(Unsigned overflow), C=1, otherwise C=0.
  • subtractionOperations (includingCMP) πŸ‘‰ is generated when an operation is performedA borrow(Unsigned overflow), C=0, otherwise C=1.

For an unsigned number with bits N, the highest bit of the corresponding binary information, i.e., the n-1st bit, is its most significant bit, while the imaginary NTH bit is the higher bit relative to the most significant bit. As shown in the following figure πŸ‘‡

Carry & borrow

We talked about carry and borrow, so let’s explain them.

Carry First look at the carry case, we know that when two data is added, it is possible to produce a carry from the most significant bit to a higher bit.

For example, two 32-bit bits of data: 0xAAAAaAAA + 0xAAAAaAAA will produce a carry. Since the carry value cannot be stored in 32 bits, we simply say that the carry value is lost. In fact, the CPU does not discard the carry system, but records it in a special register. ARM uses C bits to record the carry value. For example, the following directive πŸ‘‡

Void funcA() {asm("mov w0,# 0xaaAAaaaa \n"//0xa's binary is 1010 "adds w0,w0,w0\n" // After execution equals 1010 << 1 carry 1 (unsigned overflow) so C is marked with 1 "Adds w0,w0,w0\n" // after execution equivalent to 0101 << 1 carry 0 (unsigned no overflow) so C marked 0 "adds w0,w0,w0\n" // repeat the above operation "adds w0,w0,w0\n"); }Copy the code
  • Adds performed before

  • After the first adds is executed

  • After the second adds is executed

  • After the third adds is executed

  • After the fourth adds is executed

In conclusion, The value of the CPSR varies as follows: πŸ‘‡ 0x60000000 πŸ‘‰ 0x30000000 πŸ‘‰ 0x90000000 πŸ‘‰ 0x30000000 πŸ‘‰ 0x90000000 disk 0x90000000 The value of the high four bits is 0110 10000 10000 10000 0011 πŸ‘‰ 1001 So, the first and third time add, unsigned overflow, so C = 1; And the second time and the fourth time, unsigned, no overflow, all C is equal to 0.

Let’s see what happens when we borrow, when we subtract two numbers, it’s possible to borrow to a higher place.

For example, two 32-bit data: 0x00000000-0x000000FF will generate a debit, which is equivalent to calculating 0x100000000-0x000000FF πŸ‘‰ to obtain the value 0xFFffff01. Because we borrowed one bit, the C bit is used to mark the borrowing, so C is equal to 0. For example, the following command πŸ‘‡

void funcA() {
    asm(
        "mov w0,#0x0\n"
        "subs w0,w0,#0xff\n"
        "subs w0,w0,#0xff\n"
        "subs w0,w0,#0xff\n"
        );
}
Copy the code

The same debugging as the carry case, and I’m not going to do that here, πŸ‘‡ CPSR: πŸ‘‡ 0x60000000 πŸ‘‰ 0x80000000 πŸ‘‰ 0xA0000000 πŸ‘‡ 0xA0000000 πŸ‘‡ 0xA0000000 πŸ‘‡ 0xA0000000 πŸ‘‡ 0xA0000000 πŸ‘‡ 0xA0000000 The value of the top four bits is 0110 10000 10000 10000 10000 So, the first time I subtract, I borrow one bit, unsigned overflow, C=0; The second and third subtraction, unsigned and no overflow, so C is equal to 1.

The 28th bit of CPSR is the Overflow flag bit of V πŸ‘‰. When a signed number operation is performed, if it exceeds the range that the machine can identify, it is called an overflow.

  • Positive + positive overflow for negative numbers
  • Negative + negative is positive overflow
  • Positive and negative numbers cannot overflow
  • The overflowV = 1, no overflowV = 0

Since the CPU does not know if there is a sign, the CPSR register CV is marked at the same time, the C mark is unsigned and the V mark is signed. Flag bits are returned at the same time.

It’s easy to understand, so I’m not going to do an example here.

Judgment, selection and circulation

Before we talk about judgment, selection, and loops, let’s look at the five largest partitions of memory πŸ‘‡

  • The stack area: Parameters, local variables, and temporary data. Can short can write
  • The heap area: Dynamic application. Can read but write
  • Global static region: Readable and writable
  • The constant area: read only
  • Code section: Stores code, readable and executable

For detailed instructions, please refer to the five major memory partitions I wrote earlier.

2.1 Basic Knowledge

Global variables and constants

How do global variables and constants read values in assembly? Let’s start with the following example πŸ‘‡

int g = 12;

int func(int a, int b) {
    printf("test");
    int c = a + g + b;
    return c;
}
Copy the code

View the compilation πŸ‘‡

In the figure above, we can see from the x0 register value that the parameter source of printf function is πŸ‘‡

0x102c6dc54 <+20>: adrp   x0, 1
0x102c6dc58 <+24>: add    x0, x0, #0x5ec            ; =0x5ec 
Copy the code

X0 stores an address as a string constant area. How do these two instructions calculate the value 0x0000000102C6E5EC? πŸ‘‡

  1. adrp πŸ‘‰ Address PageMemory addressTo page addressing.
  2. 0x102c6dc54 <+20>: adrp x0, 1πŸ‘‰ Locate the start of a page of data (file)The starting position)
    • will1The value of theLeft 12become0x1000
    • The current PC value is cleared 12 bits lower.0x102c6dc54 -> 0x102c6d000.
    • 0x102c6d000 + 0x1000get0x102c6e000. That’s the same thing as the third position after PC is 0, the fourth position plus x0 followed by x0.
  3. 0x102c6dc58 <+24>: add x0, x0, #0x5ecπŸ‘‰ Offset address (current code offset)
    • 0x102c6e000 + 0x5ecget0x102c6e5ec

This gives you the address of the constant string “test”.

Where, the mantissa of 0x102C6D000 means that 000~ FFF -> 0~4095 is 4096, i.e. 4k. That is, to locate the beginning of a page of data.

  • MAC in 4 k pagesize
  • Pagesize 16K in iOS. Here is compatible πŸ‘‰ 4K * 4 = 16K.

Let’s continue debugging and look at the assembly handling of the global variable g πŸ‘‡

The instructions in the red box above, the same as above, eventually calculate the final value of x9 to be 0x0000000102C715f0, which is the value of the global variable G.

In summary, global variables and constants are retrieved from a base address + offset.

Disassembler tool restore

Next, we use the disassembly tool to demonstrate restoring assembly code to high-level code.

  • First, compile the project to restore, go to.app, find the Macho file and drag it inHopperIn πŸ‘‡

  • Disassembly toolHopperAfter the analysis is done,searchThe functions to be analyzed πŸ‘‡

  • First look at the assembly code πŸ‘‡
0000000100005c40         sub        sp, sp, #0x20                               ; CODE XREF=-[ViewController viewDidLoad]+76
0000000100005c44         stp        x29, x30, [sp, #0x10]
0000000100005c48         add        x29, sp, #0x10
0000000100005c4c         stur       w0, [x29, #-0x4]
0000000100005c50         str        w1, [sp, #0x8]
0000000100005c54         adrp       x0, #0x100006000                            ; argument #1 for method imp___stubs__printf
0000000100005c58         add        x0, x0, #0x5ec                              ; "test"
0000000100005c5c         bl         imp___stubs__printf
0000000100005c60         ldur       w8, [x29, #-0x4]
0000000100005c64         adrp       x9, #0x100009000
0000000100005c68         add        x9, x9, #0x5f0                              ; _g
0000000100005c6c         ldr        w10, [x9]                                   ; _g
0000000100005c70         add        w8, w8, w10
0000000100005c74         ldr        w10, [sp, #0x8]
0000000100005c78         add        w8, w8, w10
0000000100005c7c         str        w8, [sp, #0x4]
0000000100005c80         ldr        w8, [sp, #0x4]
0000000100005c84         mov        x0, x8
0000000100005c88         ldp        x29, x30, [sp, #0x10]
0000000100005c8c         add        sp, sp, #0x20
0000000100005c90         ret
Copy the code

The “test” line above is 0x100006000 + 0x5EC = 0x1000065EC.

  • You can look it up in MachOView0x1000065ecπŸ‘‡

  • Similarly, viewGlobal variable g
0000000100005c64         adrp       x9, #0x100009000
0000000100005c68         add        x9, x9, #0x5f0                              ; _g
Copy the code

The address of global variable G is 0x1000095f0πŸ‘‡

  • Next, we restore the above assembly code to πŸ‘‡
0000000100005c40 sub sp, sp, #0x20 ; CODE XREF=-[ViewController viewDidLoad]+76 W0 w1 0000000100005C4C STUr W0, [x29, #-0x4] 0000000100005C50 STR w1, [sp, #0x8] // Take constant "test" 0000000100005C54 ADRP x0, #0x100006000; argument #1 for method imp___stubs__printf 0000000100005c58 add x0, x0, #0x5ec ; "Test" // call printf("test") 0000000100005C5c bl imp___stubs__printf // w8 = a; 0000000100005C60 lDUR w8, [x29, #-0x4] // Take global variable g 0000000100005C64 ADRP x9, #0x100009000 0000000100005C68 add x9, x9, #0x5f0 ; _g // w10 = g 0000000100005c6c ldr w10, [x9] ; _g // w8 += w10 0000000100005c70 add w8, w8, w10 // w10 = b 0000000100005c74 ldr w10, [sp, W8 += w10 0000000100005C78 add w8, w8, w10 #0x4] 0000000100005c80 ldr w8, [sp, #0x4] 0000000100005c84 mov x0, x8 0000000100005c88 ldp x29, x30, [sp, // Stack balance 0000000100005C8c add sp, sp, #0x20 0000000100005C90 retCopy the code

After the above reduction, is not before the original stone code?… The result is exactly the same! We can operate according to this example, deepen the impression!

2.2 judgment

Next, let’s look at how the judgment logic is executed in assembly.

if

Take a look at the most familiar if judgments, such as πŸ‘‡

int g = 12; void func(int a, int b) { if (a > b) { g = a; } else { g = b; }}Copy the code

We look directly at assembly πŸ‘‡ using Hopper, the disassembly tool above

Is it difficult to compile the above image? It’s pretty simple, you do a CMP comparison, and then you execute block 1 and block 2, which is the if and else block.

CMP instruction

CMP compares the contents of one register to the contents or immediate number of another register, but does not store the result, just the correct change flag (CPSR). The general CMP will jump after finishing the judgment, usually followed by the B command!

B jump instruction

B itself represents a jump, followed by other operations:

Instruction names Instruction meaning
bl Jumps to execute at the label and affectsLr registerThe value of the. Used for function returns.
br Jumps from a value in a register.
b.gt The result of the comparison isGreater than (= greater than)Execute the label, otherwise no jump.
b.ge The result of the comparison isGreater than or equal toExecute the label, otherwise no jump.
b.lt The result of the comparison isLess thanExecute the label, otherwise no jump.
b.le The result of the comparison isLess than or equal toExecute the label, otherwise no jump.
b.eq The result of the comparison isIs equal toExecute the label, otherwise no jump.
b.ne The result of the comparison isNot equalExecute the label, otherwise no jump.
b.hi The result of the comparison isUnsigned greater thanExecute the label, otherwise no jump.
b.hs The result of the comparison isUnsigned greater than or equal toExecute the label, otherwise no jump.
b.lo The result of the comparison isUnsigned less thanExecute the label, otherwise no jump.
b.ls The result of the comparison isUnsigned less than or equal toExecute the label, otherwise no jump.

⚠️ Note: CMP follows the label condition else.

Back to the example assembly πŸ‘‡ 0000000100005C7cb.loc_100005C94 executes b.lee, so block 1 is the if greater case and block 2 is the else case.

2.3 cycle

Then, we look at what instructions the logic of the loop executes in the assembly. ####2.3.1 do-while First look at the do-while loop, such as πŸ‘‡

void func() {
    int nSum = 0;
    int i = 0;
    do {
        nSum = nSum + 1;
        i++;
    } while (i < 100);
}
Copy the code

Hopper assembly πŸ‘‡

The above compilation is also very simple πŸ‘‡

  • We initialize two variables, and the address of variable two is0xcand0x8, just rightEach 4 bytes(corresponding to theThe int type)
  • And then execute the loopdoPart of the
  • cmpThat’s the judgment condition for while,b.ltIf the condition is met, jump todoPart of the

2.3.2 while

Again, take a look at the example πŸ‘‡

void func() { int nSum = 0; int i = 0; while (i < 100) { nSum = nSum + 1; i++; }}Copy the code

2.3.3 the for

Finally, let’s look at the most common for loop example, πŸ‘‡

void func() { int nSum = 0; for (int i = 0; i < 100; i++) { nSum = nSum + 1; }}Copy the code

⚠️ Note: in the assembles of for and while, conditions are judged by b. gee.

2.4 choose

Finally, let’s look at what instructions the selection logic executes in assembly. # # # # # the Switch selection

void func(int a) { switch (a) { case 1: printf("case 1"); break; case 2: printf("case 2"); break; case 3: printf("case 3"); break; default: printf("case default"); break; }}Copy the code

Case > 3

void func(int a) { switch (a) { case 1: printf("case 1"); break; case 2: printf("case 2"); break; case 3: printf("case 3"); break; case 4: printf("case 4"); break; default: printf("case default"); break; }}Copy the code

The figure above shows the assembly code analyzed in Hopper, except for w8-=1, the rest of the code block. Next, let’s take a closer look at πŸ‘‡

  • Block 1
0000000100005be0         mov        x9, x8
0000000100005be4         ubfx       x9, x9, #0x0, #0x20
0000000100005be8         cmp        x9, #0x3
0000000100005bec         str        x9, [sp]
Copy the code
  1. mov x9, x8πŸ‘‰ the value of register X8 to register X9 isThe value of the parameter.
  2. ubfx x9, x9, #0x0, #0x20 πŸ‘‰ ubfx“Against” means againstpositiontoreset(⚠ ️Start at the top), then it is the address value of x90 to 32Reset (0x0The decimal system that is0.0x20The decimal system that is32)
  3. cmp x9, #0x3πŸ‘‰ Compares the values of x9 and 0x3. here0x3isMaximum case - Minimum casethedifference.
  4. str x9, [sp]πŸ‘‰ x9 is pushed, i.eX8 is 32 bits lowerInto the stack.

If b.hi is unsigned greater than, the default branch is directly jumped.

  • Code block 2
0000000100005bf4         adrp       x8, #0x100005000
0000000100005bf8         add        x8, x8, #0xc64               
Copy the code

The address stored in X8 is 0x100005C64, as shown in the adRP instruction in 02- Assembly Basics (2).

  • Block 3
0000000100005bfc         ldr        x11, [sp]
0000000100005c00         ldrsw      x10, [x8, x11, lsl #2]
0000000100005c04         add        x9, x8, x10
0000000100005c08         br         x9
Copy the code
  1. ldr x11, [sp]πŸ‘‰ fetches data from the stack to X11, which is currently X9. X9 forX8 is 32 bits lower
  2. ldrsw x10, [x8, x11, lsl #2] πŸ‘‰ lsl #2Move two places to the leftX10 = x8 + (x11 << 2)
  3. add x9, x8, x10πŸ‘‰ is very simple,x9 = x8 + x10
  4. br x9Jump according to the value in register X9.
For example to calculate x9

Now let’s examine the calculation of the value in x9 (if the argument is 2, that is, the call to func(2)) πŸ‘‡

  1. X9 is originally associated with x8 (the input value), so x9 is the lower 32 bits of x8, and the value of x9 is 1 subs w8, w8, #0x1Minus 1), so ldr x11, [sp]X11 is 1
  2. Then after ldrsw x10, [x8, x11, lsl #2], 1(x11) << 2 = 4, then 4 + 0x100005C64 (x8 address) = 0x100005C68 (x10 value), queryThe code block 5, we know that the value of x10 is0xffffffb8πŸ‘‡

The decimal value of 0xffffffB8 is -72πŸ‘‡

  1. Then,add x9, x8, x10Because theThe add instructionCalculate thehexadecimal, x10 is -72, the corresponding hexadecimal is0x48, soX8 +x10 = 0x100005C64-0x48 (negative subtraction) = 0x100005C1C = x9And finally x9 is going to be0x100005C1C
  • Block 4
0000000100005c0c         adrp       x0, #0x100006000
0000000100005c10         add        x0, x0, #0x5c8
0000000100005c14         bl         imp___stubs__printf
0000000100005c18         b          _func+144
0000000100005c1c         adrp       x0, #0x100006000
0000000100005c20         add        x0, x0, #0x5cf
0000000100005c24         bl         imp___stubs__printf
0000000100005c28         b          _func+144
0000000100005c2c         adrp       x0, #0x100006000
0000000100005c30         add        x0, x0, #0x5d6
0000000100005c34         bl         imp___stubs__printf
0000000100005c38         b          _func+144
0000000100005c3c         adrp       x0, #0x100006000
0000000100005c40         add        x0, x0, #0x5dd
0000000100005c44         bl         imp___stubs__printf
0000000100005c48         b          _func+144
Copy the code

It is clear that the assembly of the code block is executing the logic of the case code block. In the above example, the resulting value of x9 is 0x100005C1C, which skips the assembly of case 2 πŸ‘‡

  • The code block 5
0000000100005c64 db 0xa8 ; '. '. DATA XREF=_func+48 0000000100005c65 db 0xff ; '.' 0000000100005c66 db 0xff ; '.' 0000000100005c67 db 0xff ; '.' 0000000100005c68 db 0xb8 ; '.' 0000000100005c69 db 0xff ; '.' 0000000100005c6a db 0xff ; '.' 0000000100005c6b db 0xff ; '.' 0000000100005c6c db 0xc8 ; '.' 0000000100005c6d db 0xff ; '.' 0000000100005c6e db 0xff ; '.' 0000000100005c6f db 0xff ; '.' 0000000100005c70 db 0xd8 ; '.' 0000000100005c71 db 0xff ; '.' 0000000100005c72 db 0xff ; '.' 0000000100005c73 db 0xff ; '. 'Copy the code

This code block is like a table, and you can look up the values stored at that address.

Compile execution summary
  1. First of all byParameter - Minimum casegetThe index in the table
  2. index 与 (Maximum case - minimum case) Unsigned comparisonCheck if it’s in the interval.
    • Not inRange πŸ‘‰Jump defalult
    • inRange πŸ‘‰Table header address + index << 2
  3. Perform the mapping based on the offset addressCase logic.

⚠️ Note: why not directly save the address in the table? 1. The address is too long. 2

The Switch summary

  1. There is no need to use a table structure when the branch of the switch statement is < 3, equivalent to if.

  2. When the difference between the branch constants is large, the compiler will choose between efficiency and memory, and the compiler will compile something like if-else. For example, 100, 200, 300, 400 are the same as if-else, 10, 20, 30, 40 will generate a table. So it’s best to use sequential values when writing switch logic. As for the specific logic, the compiler will optimize the selection based on case and difference. The more cases, the smaller the difference, the more consistent the value and the compiler will generate jump tables, otherwise if-else.

  3. When there are many branches: a table is generated at compile time (jump tables are four bytes per address).

  4. The number of cases in the jump table is maximum case – minimum case + 1 is how many possibilities there are.

  5. Case branch code addresses are sequential, using the idea of space for time.

conclusion

  1. The status (flag) register CPSR
    • The CPSR register (32 bits) in ARM64 is the status register
    • The highest four bits (28,29,30,31) are the flag bits. NZ (Execution result) CV (unsigned/Signed Overflow)
      • N flag (negative flag bit)
        • Negative N = 1, non-negative N = 0
      • Z flag (0 flag bit)
        • The result is 0, Z = 1, and the result is not 0, Z = 0
      • C flag (unsigned overflow)
        • Addition: carry C = 1, otherwise C = 0
        • Subtraction: borrow C = 0, otherwise C = 1
      • V flag (signed number overflow)
        • Positive + positive = negative overflow V = 1
        • Negative + negative = positive overflow V = 1
        • Positive and negative numbers can’t overflow V = 0
        • The overflowV = 1, no overflowV = 0
  2. Judge, select, and cycle