This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021.
In reverse development, a very important link is static analysis. We know that an executable that an APP installs on a phone is essentially a binary file. Because the iPhone essentially executes instructions in binary, executed by the CPU on the phone, static analysis is built on analyzing binary.
1. Get to know assembly
Programming languages evolved from machine language -> assembly language -> high-level language. Now we write C/C++/Java/OC/Swift code, on the terminal device is like this process:
- Assembly language and machine language one – to – one correspondence, each machine instruction has a corresponding assembly instruction
- Assembly language can be compiled into machine language, and machine language can be disassembled into assembly language
- High-level languages can be compiled into assembly language or machine language, but assembly language or machine language is almost impossible to restore to high-level language
1.1 Characteristics of assembly language
- It can directly access and control various hardware devices, such as memory, CPU, etc., to maximize the functions of hardware
- The ability to have complete control over the generated binary code without the limitations of the compiler
- The object code is short, occupies little memory and executes fast
Assembly instruction is a mnemonic of machine instruction, corresponding to machine instruction. Each CPU has its own machine instruction set/assembly instruction set, so assembly language is not portable
- Knowledge points too much, developers need to understand CPU and other hardware structure, not depressed writing, debugging, maintenance
- Case insensitive, for example mov is the same as MOV
1.2 Purpose of assembly
- Write drivers, operating systems (such as some key parts of the Linix kernel)
- A program or snippet of code that requires high performance and can be mixed with a high-level language (inline assembly)
- Software security
- Virus analysis and control
- Reverse/shell/shell/crack/plug-in/kill free/encryption decryption/vulnerability/hacker
- The best starting point and most efficient way to understand the entire computer system
- Lay the foundation for writing efficient code
- Get to the bottom of the code
- What is the nature of a function?
- How is the underlying method executed?
- What does the compiler really do for us?
- What are the key things about DEBUG and RELEASE modes that we missed, etc.
1.3 Types of assembly languages
- At present, more assembly language is discussed
- 8086 assembly (the 8086 processor is a 16-bit CPU)
- Win32 compilation
- Win64 assembly
- ARM Assembly (Embedded, Mac, iOS)
- We use ARM assembly in the iPhone, but it varies from device to device because of the CPU architecture
architecture | equipment |
---|---|
armv6 | IPhone, iPhone2, iPhone3G, first generation, second generation iPod Touch |
armv7 | iPhone3GS, iPhone4, iPhone4S,iPad, iPad2, iPad3(The New iPad), iPad mini, iPod Touch 3G, iPod Touch4 |
armv7s | iPhone5, iPhone5C, iPad4(iPad with Retina Display) |
arm64 | IPhone5S, iPhoneX, iPad Air, iPad mini2 |
2. Some essential common sense
To learn assembly well, we must first understand the hardware structure such as CPU and the execution process of APP/ program. The most important hardware concern is CPU/ memory. In assembly, most instructions are related to CPU and memory
2.1 the bus
Each CPU chip has a number of pins that are connected to a bus through which the CPU interacts with the outside world. The bus is a collection of wires. The bus is divided into:
- Address bus: Its width determines the addressing capability of the CPU. For example, the address bus width of the 8086 is 20, so the addressing capability is 1M (2^20).
- Data bus: Its width determines the amount of data a CPU can transmit at a single time, that is, the speed of data transfer. For example, the total width of 8086 data is 16, so a maximum of 2 bytes of data can be transmitted at a time
- Control bus: its width determines the CPU’s ability to control other devices and how many kinds of control it can have
2.2 memory
The size of the memory address space is limited by the CPU address bus width. The address bus width of the 8086 is 20, and it can locate 2^20 different memory units (memory address range 0x00000 to 0xFFFFF), so the memory size of the 8086 is 1MB. 0x00000 ~ 0x9FFFF is the main memory, which can be read and written; 0xA0000 to 0xBFFFF indicates that data is written to the video memory, which can be read and written by the video card to the display. 0xC)))) to 0xFFFFF Stores hardware or system information and is read-only.