Static analysis is an important step in reverse development. We’re reverting to an APP in iOS, and an APP installed on an iPhone is essentially an executable binary, because that’s what the CPU on the iPhone does. So static analysis is actually based on analyzing binary. To analyze binary, we have to understand assembly language.

Language development


Machine language: machine instructions made up of zeros and ones.

Add: 0100, 0000

Minus: 0100 1000 times: 1111 0111 1110 0000 Divided by: 1111 0111 1111 0000

2. Assembly Language: Use mnemonics instead of machine language.

Plus: INC EAX through compiler 0100 0000

DEC EAX through the compiler 0100 1000 times MUL EAX through the compiler 1111 0111 1110 0000 divided by DIV EAX through the compiler 1111 0111 1111 0000

3. High-level Programming language: C/C++/OC/Swift/Java and other natural languages that are closer to human beings.

For example, C:

+ : A+B through the compiler 0100 0000 minus: A-b through the compiler 0100 1000 times: A*B through the compiler 1111 0111 1110 0000 divided by: A/B through the compiler 1111 0111 1111 0000

The compilation process of the code on the terminal device is shown in Figure 1:

Assembly language and machine language one – to – one correspondence, each machine instruction has a corresponding assembly instruction

Assembly language can be compiled into machine language, and machine language can be disassembled into assembly language

High-level languages can be compiled into assembly language machine language, but assembly language machine language is almost impossible to restore to high-level language

Introduction of the CPU


In assembly, most instructions are related to CPU and memory, so learn assembly must have a general understanding of CPU.

The bus

A bus is a collection of wires. Bus can be divided into: address bus, data bus, control bus

The width of the address bus determines the addressing capability of the CPU. For example, the address bus width of the 8086 is 20, so the addressing capability is 1M (2 ^ 20).

The width of the data bus determines the amount of single data transmission of the CPU, that is, the data transmission speed. For example, the data bus width of the 8086 is 16, so a maximum of 2 bytes of data can be transferred at a time.

(3) Control bus its width determines the CONTROL ability of CPU to other devices, how many kinds of control.

Into the system

Barriers to learning

Many people do not learn the base system, the reason is that they always rely on the decimal system to consider other bases, when the operation is always converted to the decimal system first, this learning method is wrong.

Why do we have to convert to decimal? We convert simply because we are most familiar with decimal. Every base system is perfect, and the best way to learn base system is to forget about the decimal system and the conversion between bases!

Definition of base

(1) octal is composed of eight symbols: 0 1 2 3 4 5 6 7 every 8 into a (2) decimal is composed of ten symbols: 0 1 2 3 4 5 6 7 8 9 dot into a (3) N into the system is composed of N symbols: every N into one

Question: When does 1+1=3 hold?

If the decimal system consists of 10 symbols: 0, 1, 3, 2, 8, A, B, E, S, 7

Is: 1 + 1 = 3

What is the purpose of this?

Traditional decimal and custom decimal are not the same, if we do not tell others this symbol table, others can not get our specific data! This method can then be used for encryption!

Width of data

Mathematical numbers have no size limit and can be infinitely large. In computers, however, due to hardware constraints, data is limited in length (we call it data width), and any data exceeding the maximum width is discarded.

The width of data commonly found in computers

Bit: a Bit is a binary Bit 0 or 1; Byte: A Byte consists of eight bits (8 bits). Word: A Word consists of two bytes (16 bits), which are called high byte and low byte respectively. Doubleword: a Doubleword consists of two characters (32 bits).

The computer stores data, which can be divided into signed and unsigned numbers, as shown in Figure 2:

No sign number, direct conversion!

Signed numbers: positive numbers: 0 1 2 3 4 5 6 7 Negative numbers: F E D B C A 9 8-1-2-3-4-5-6-7-8

& CPU registers

The CPU has controllers, arithmetic units and registers. The function of register is the temporary storage of data.

For ARM64 cpus, a register beginning with an X indicates a 64-bit register, and a register beginning with a W indicates a 32-bit register. There are no 16 – and 8-bit registers available for access and use. The 32 bit register is the lower 32 bit part of the 64 bit register and does not exist independently.

  • For programmers, the most important parts of CPU are registers, which can be controlled by changing the contents of registers
  • The number and structure of registers are different for different cpus

Universal register

The general purpose register, also known as the data address register, is usually used for temporary storage, accumulation, counting, address storage and other functions of data calculation. The main purpose of these registers is to store operands in CPU instructions and use them as regular variables in the CPU.

ARM64 has 32 64-bit general-purpose registers X0 through X30, as well as XZR(Zero register). These general-purpose registers are sometimes used for specific purposes.

  • So w0 through W28 are 32-bit, and since 64-bit cpus are 32-bit compatible, you can only use the lower 32 bits of the 64-bit register. For example, w0 is the lower 32 bits of x0!

Typically, the CPU stores the data in memory into a general purpose register, and then performs operations on the data in the general purpose register.

PC Register (Program Counter)

  • Is the instruction pointer register, which indicates the address of the instruction that the CPU is currently reading
  • In memory or on disk, instructions and data are indistinguishable as binary information
  • The CPU works by treating some information as instructions and some as data, assigning different meanings to the same information
    • For example, 1110 0000 0000 0011 0000 1000 1010 1010
    • Can be regarded as data 0xE003008AA
    • Can also be used as instruction mov x0, x8
  • On what basis does the CPU interpret information in memory as instructions?
    • The CPU treats the contents of the memory cell that the PC points to as instructions
    • If something in memory has been executed by the CPU, the memory location it resides in must have been pointed to by the PC

Floating point and vector registers

Because of the storage of floating point numbers and the special nature of their operations, floating point registers are provided in the CPU to handle floating point numbers.

  • Floating point register 64-bit: D0-D31 32-bit: S0-S31

Current CPU support vector computing (vector computing is used very much in the field of graphics processing) also provides a number of vector registers for support vector computing systems.

  • Vector register 128 bits :V0-V31

The cache

The ARM processor A11 on the iPhoneX has a level 1 cache of 64KB and a level 2 cache of 8M.

Before executing an instruction, the CPU reads the instruction from memory into the CPU and executes it. Registers run much faster than memory reads and writes, and the CPU integrates a cache storage area for performance. When a program is running, the code and data to be executed are copied to the cache (done by the operating system), and the CPU reads the instructions from the cache to execute them.

Bl instruction

  • Where the CPU executes instructions from is determined by the contents of the PC, and we can control the CPU to execute the target instructions by changing the contents of the PC
  • ARM64 provides a MOV instruction (transfer instruction) that can be used to change the values of most registers, such as MOV x0,#10; mov x1,#20
  • However, the MOV instruction cannot be used to set the value of a PC, and ARM64 does not provide such functionality

ARM64 provides additional instructions to modify PC values. These instructions are collectively called transfer instructions, the simplest of which are BL instructions

Practice BL instruction

.text
.global _A,_B
_A:
mov x0,#0x0000
mov x1,#0xffff
add x0,x1,#0x00ff
mov x1,x0
bl _B
mov x0,#0x00
ret


_B:
add x0,x0,#0x00
ret

Copy the code
1. mov x0,#0x000 ; X0 =0x000 2.mov x1,# 0xFFFF; X1 =0xffff 3. add x0,x1,#0x00ff; Add 0x00ff and x1 and assign x0 x0= 0xFFFF 4.mov x1,x0; Assign the value of x0 to x1 x1=0xffff 5.bl _B; X0 = 0xffff 6.mov x0,#0x00; Assign 0x00 to x0. X0 =0x00Copy the code

So x0 is going to be 0