preface

The executable file installed on a mobile phone by an APP is binary file in nature, because the instructions executed by iPhone are binary in nature and are executed by the CPU on the phone, so static analysis is built on the analysis binary. Analysis of binary must understand assembly language, which is the basis of static analysis of reverse application. What is assembly? Assembly is a language, assembly instructions are machine instructions mnemonic, with machine instructions one – to – one correspondence. What is the use of learning assembly? Programs or code snippets with high performance requirements can be mixed with high-level languages; Virus analysis and prevention; Lay the foundation for writing efficient code. So the lower the more simple! Real programmers need to know a very important language, assembly!

register

This is the execution diagram of the APP, and the program on disk is oneExecutable fileAnd then load intomemoryIs aImage file, the CPU reads and writes the memory, and finally controls the display on the terminal. The programmer needs to care most is the CPU to the memory read and write, CPU computing speed is very fast, for performanceCPUCarve out a small area insideTemporary storage areaAnd when you do an operation, you copy the data from memory to this little temporary storage area, and then you do the operation in this little temporary storage area, and we call this little temporary storage arearegister.

Floating point and vector registers

Because of the storage of floating point numbers and the special nature of their operations, floating point registers are provided in the CPU to handle floating point numbers

  • Floating point register 64-bit: D0-D31 32-bit: S0-S31
  • Vector register 128 bits :V0-V31

Universal register

  • Universal registerAlso known as a data address register, usually used for temporary data calculationsStorage, do sum, count, address save, and other functions. The main purpose of these registers is to store operands in CPU instructions and use them as regular variables in the CPU.
  • ARM64with32a64Bit general purpose registerX0 to x, and XZR(Zero register). These general-purpose registers are sometimes used for specific purposes. 64-bit registers account for 8 bytes
  • W0 to w28These are32Since 64-bit cpus are 32-bit compatible, it is possible to use only the lower 32 bits of 64-bit registers, such as W0, which is the lower 32 bits of X0

As you can see hereFPThe register isx29Register,LRThe register isx30Registers, FP and LR are just aliases for them.

SP, FP register, LR register

  • Sp registersThe address at the top of our stack is saved at any time, not a general purpose register.
  • Fp registersAlso known as the X29 register is a general purpose register, but at some point we use it to store the address at the bottom of the stack!
  • Lr registerAlso known as the X30 register is a general purpose register. The address of the next instruction is placed in the LR (X30) register, which is used to store the way home in the function call stack. I’ll give you an example.
  • ARM64In the faceThe stackThe operation is16-byte alignmentthe

PC register

  • forInstruction pointer register, itIndicate theCPUThe currentTo readInstruction addressTo distinguish it fromlr(x30)register
  • In memory or on disk,There is no difference between instructions and data, all binary information
  • CPUIn the work, some information is regarded as instructions, and some information is regarded as data, which endows the same information with different meaningsPC registerTo tell the CPU that this is an instruction and not data

Let’s write a demo breakpoint to look at the PC register

int test(){
    int a=10;
    int b=20;
    return a+b;
}
Copy the code

The breakpoint breaks at the beginning of the test function

PC registerThe value of is the address of the instruction to be executed, and the instruction address in line 4 is0x1026ddeb4, the instruction address in line 5 is0x1026ddeb8In the firstSix linesInstruction address is0x1026ddebcEach instruction address differs by 4 bytes, i.eOne line of instruction takes four bytes. Change the PC register value manually at the breakpoint to0x1026ddebcAnd then step through it and look at itWhen setting PC to No6the0x1026ddebc, the single step execution reaches the no7Line, indicating that PC is pointing to the part of the assembly to be executed, single-step is to execute that part of the assembly and then skip to the first7Line.

The stack

Registers are used to read and write memory, so let’s look at the ios memory structure and some simple assembly instructions.

The overall layout of the memory is shown in the following figure. The opening direction of the stack area is downward, from the high address to the low address. The opening direction of the heap area is upward, from the low address to the high address. The so-called stack overflow is simply understood as the stack address from the top down and the heap address from the bottom up. It is also important to note that although the stack opens down, data is read and written up, as illustrated in the following examples.

  • str: Transfers data fromRead out from the register.Put it in memory. For example, STR x0 [sp] takes out the data in register X0 and stores it in the memory corresponding to sp address. STR stores one register, STP can store two registers.
  • ldr: Transfers data fromIn-memory read-outCome on,Save to registerIn the. LDR x0 [sp], for example, removes the value from sp address memory and stores it in a register. Similarly, LDR operates on one register and LDP operates on two registers
  • bl: puts the address of the next instruction inLr (x) registerAnd go to the label to execute the command
  • ret:The value of the LR (X30) register is used by defaultThrough the underlying instructionsCPUThis is the next instruction address! The x30 register stores the return address of the function. When the RET instruction executes, the address value stored in the X30 register will be searched. When a function is called in a nested manner, theX30 into stack.

Function call stack

With all the concepts mentioned above, let’s look at an example of a function call stack

sub    sp, sp, #0x40; The tensile0x40(64STP x0, x1, [sp, #0x30]; X0 \x1 register is pushed... ldp x1, x0, [sp, #0x30]; Swap x0, x1 for add sp, sp, #0x40; Stack balance retCopy the code
  • subSp namely (reduction)The tensileStack space,The stack areaThe opening direction isdownFrom the high address to the ground address, stretch 0x40 16 bytes of stack space, sp refers to the position of the top of the stack, namely sp= SP address -0x40;
  • stpX0, x1The register value is taken out and stored in sp address +0x30 memory. Note that data is read and written to the high address, that is, x0 is at the low address, x1 is at the high address.
  • ldpThe instruction takes out the value in memory at SP address +0x30, that is, the value just saved by X0, and stores the value in register X1. By memory translation, it takes out the value just saved by X1 and stores it in x0, realizing the exchange of x0 and X1 values. And just to make sense of this, notice the order of x0 and x1.
  • addSp (add),The stack balancingFree up stack space. If the stack stretches, it must release.

The graphic aid is as follows:

Nested calls to a function

Write a compilation of function nested calls as follows

.text
.global _test,_test2
_test:
   sub sp,sp,#0x10  // Stretch the stack space
   mov x0,#0x10
   mov x1,#0x30
   add x1,x0,#0xa0
   mov x0,x1
   bl _test2
   mov x0,#0x0
   add sp,sp,#0x10 / / stack balance
  ret

_test2:
    add x0,x1,#0x10
   ret
Copy the code

Assembles breakpoints to follow up the test() function

As shown, the breakpoint is set at line 7. Before bl instruction is executed at line 7, the value of the LR (X30) register is0x000000100ad5f34.Lr registerStore the address of the next instruction (The way homeThe address of the LR register after bl execution should be the address on line 80x10ad6518Follow test() to see

Sure enoughblInstructions to entertest2()After the function, the LR register holds the address on line 8 below the BL instruction0x10ad6518, so that test2() executes after the ret instructionlrLr ensures that the test() function is called to the address stored in the registertest2()Can correctly return totest()The function continues below. So let’s see if the test() function continues

The function is in an infinite loopThe test() function performs ret to find the LR register address (line 8) and so on. The reason for this is that the LR register did not save the previous calltest()The address of the function, so that the test() function cannot go back. So we need to stack the address of the last call to test(), soTo optimize theThe code above is as follows

.text
.global _test,_test2
_test:
   sub sp,sp,#0x10  // Stretch the stack space
   stp x30,[sp]  // Save the value of the LR register to the stack and protect it
   mov x0,#0x10
   mov x1,#0x30
   add x1,x0,#0xa0
   mov x0,x1
   bl _test2
   mov x0,#0x0
   ldr x30,[sp]  // Remove the updated LR from stack memory so that the RET can go back
   add sp,sp,#0x10 / / stack balance
  ret

_test2:
    add x0,x1,#0x10
   ret
Copy the code

Analysis: We want to save a call on the test () address, will have to be stored in the function in the stack, because functions of stack space belongs to own, each function has a separate stack space, and registers are common, so this address cannot be saved in a register, should be to enter the test (), is this address the function stack memory, Then, when it comes time to return, it takes the address out of the stack and updates lr, so that it can return normally. The above assembly will be optimized by the system into the following, and our reverse analysis is also optimized assembly

.text
.global _test,_test2
_test:
   //sub sp,sp,#0x10 //
  // STP x30,[sp] //
   stp x30,[sp,#0x10]! // save x30 to sp-0x10, then sp=sp-0x10
   mov x0,#0x10
   mov x1,#0x30
   add x1,x0,#0xa0
   mov x0,x1
   bl _test2
   mov x0,#0x0
  // LDR x30,[sp] // Update LR from stack memory so that ret can go back
 // add sp,sp,#0x10
   ldr x30,[sp],#0x10 // insert sp into x30, sp=sp+0x10
  ret

_test2:
    add x0,x1,#0x10
   ret
Copy the code

Parameters of a function

Write a demo and learn how the system compiles for us

int sum(int a,int b){
    return a+b;
}
sum(10.20);
Copy the code

Assembly as followsBefore calling sum(), store argument 10(0xa) into w0 and argument 20(0x14) into w1. Note that w0 is only the low address register of X1. Look again at the sum() assemblyAdd w0,w8,w9; w0=w8+w9w0As a functionReturn value register.

  • Under the ARM64, function parameters are stored inX0 to X7W0 through W7, in these eight registers, ifMore than eightParameter, will be pushed, and pushing will affect efficiency, so in the definitionOc methodWhen, pass at mostSix parametersBecause there are also two hidden parametersId and SEL.
  • The return value of the function is placed in the X0 register

conclusion

  • Registers account for eight bytes and a line of instruction for four bytes
  • X0 return value register
  • Lr (X30) home route register

Assembly is really a difficult language to watch and a fun language to learn.