IOS reverse development (a) cryptography RSA

IOS reverse development (2) Cryptographic HASH

IOS reverse development (3) application signature

IOS Reverse Development (21) Assembler – Basics

1. Compile profiles

Why do I need to learn assembly?

I have learned assembly language in university, and at that time I felt that assembly was difficult to understand, and I generally use high-level language in the actual development after work, and seldom have direct contact with assembly. However, by understanding assembly, you will have a better understanding of why high-level language functions are written the way they are, and how they are written for maximum performance. Sometimes you can view trace assembly code directly by using symbolic breakpoints. If you can read assembly code, everything is easy.

In reverse development, a very important link is static analysis. So we know that an executable file that an APP installs on a phone is essentially a binary file. Because the iPhone is essentially executing binary instructions. It’s executed by the CPU on the phone. So static analysis is based on analyzing binary. By understanding assembly, we can call the relevant code ourselves from assembly.

The lower the more simple! Real programmers need to know a very important language, assembly!

What is assembly?

Assembly execution instruction is the symbolic representation of machine instruction, whose operation code is represented by memory, and address code is directly represented by label, variable name, constant, etc. The assembly execution instructions are translated into machine instructions by the assembly program, and the relationship between the two basically keeps one-to-one correspondence. Assembly pseudo-instruction, also known as assembly instruction, is used to provide the assembly program with user-defined symbols, data type, length of data space, as well as the format of the target program, storage location and other suggestive information, its function is to instruct the assembly program how to assemble. Source code written in assembly language needs to be converted into executable machine code by using the corresponding assembler. This process is called assembly.

1.1 Assembly language development process

The development of compilation?

The prototype of assembler program is successfully developed on electronic discrete time sequence automatic computer EDSAC. This system is characterized by the fact that the instructions in the user program consist of single-letter instruction codes, decimal addresses, and terminal letters. The first assembler was the symbolic optimized assembler (SOAP) system, which was developed for the IBM650 computer in the mid-1950s. The computer uses a drum as its memory, and each instruction indicates the position of its successor in the drum. At the beginning, SOAP system was developed not to introduce the symbolic characteristics of assembly language, but to focus on solving the problem of reasonable instruction distribution in the magnetic drum, so as to improve the efficiency of the program. The symbolic assembler program (SAP) of IBM704 computer is an important milestone in the development of assembler program. The following assembler program mostly takes this system as the model, its main characteristic has not undergone the essential change. With the rapid development and wide application of computer software, assembler program has absorbed some advantages of macro processing program, high-level language translation program and other systems, and developed macro assembler program and high-level assembler program.

Machine language

Machine instructions made up of zeros and ones. Plus: 0100 0000 minus: 0100 1000 times: 1111 0111 1110 0000 divided by: 1111 0111 1111 0000

Assembly Language

Use mnemonics instead of machine language such as: + : INC EAX through compiler 0100 0000 minus: DEC EAX through compiler 0100 1000 times: MUL EAX through compiler 1111 0111 1110 0000 Divided by: DIV EAX passes the compiler 1111 0111 1111 0000

High-level Programming Language

C++ Java OC Swift, closer to a natural human language such as C: + : A+B through the compiler 0100 0000 minus: A-b through the compiler 0100 1000 times: A*B through the compiler 1111 0111 1110 0000 divided by: A/B through the compiler 1111 0111 1111 0000

Our code on a terminal device looks like this:

Assembly language and machine language one – to – one correspondence, each machine instruction has a corresponding assembly instruction.

Assembly language can be compiled into machine language, and machine language can be disassembled into assembly language.

High-level languages can be compiled into assembly language machine language, but assembly language machine language is almost impossible to restore to high-level language.

1.2 Features of Assembly

It can directly access and control all kinds of hardware devices, such as memory and CPU, which can maximize the functions of hardware.
The ability to have complete control over the generated binary code without the limitations of the compiler.
The object code is short, occupies little memory and executes fast.
Assembly instruction is a mnemonic of machine instruction, corresponding to machine instruction. Each CPU has its own machine instruction set, assembly instruction set, so assembly language is not portable.
Knowledge points too much, developers need to understand the CPU and other hardware structure, is not easy to write, debugging, maintenance.
Case insensitive, for example mov is the same as MOV.

1.3 Types of assembly languages

At present, there are many assembly languages discussed:

8086 assembly (the 8086 processor is a 16-bit CPU)

Win32 compilation

Win64 assembly

ARM Assembly (Embedded, Mac, iOS)

We use ARM assembly in the iPhone, but it varies from device to device. The CPU architecture is different.

architecture	equipment
armv6	IPhone, iPhone2, iPhone3G, first generation, second generation iPod Touch
armv7	iPhone3GS, iPhone4, iPhone4S,iPad, iPad2, iPad3(The New iPad), iPad mini, iPod Touch 3G, iPod Touch4
armv7s	iPhone5, iPhone5C, iPad4(iPad with Retina Display)
arm64	IPhone5S, iPhone13, iPad Air, iPad mini2

1.4 Types of assembler

Simple assembler

A simple assembler is also called a “load and execute” assembler. It is widely used because of its simplicity. The feature of this assembler is that the assembled machine language program is placed directly in memory ready for execution. The storage location occupied by the target program is fixed at assembly time and cannot be changed later, so this way of working cannot combine multiple independent assembly subroutines into a complete program, and can only call subroutines in the library whose location does not conflict with the target program.

Modular assembler

Module assembler is developed to adapt to module programming method. In addition to overcoming the shortcomings of simple assembler programs, it also provides the ability to design, code, and debug different program modules in parallel, and only change the relevant modules when changing the program. Each program module after assembly is called the target module, and multiple target modules are combined into a complete executable program by connecting the assembly program.

Conditional assembler

The main feature of conditional assembler is the ability to select and assemble certain program segments. It is suitable for writing selective programs or packages, so as to tailor and compile appropriate software according to the needs of users and the configuration of devices. This kind of assembly language usually introduces assembly instructions such as “conditional transfer” and “transfer” to selectively assemble some program segments or control the processing path of the assembly program according to the assembly conditions specified by the user.

Macroassembler

The main feature of macro assembler is to increase the macro processing function in assembler. It allows the user to define and use the macro instruction easily. It is suitable for the occasions where the program appears in many places, has a certain format, and can be changed by adjusting a few parameters. This method not only reduces the length of the program and increases readability, but also changes the format of the program paragraphs by changing only the definition, rather than every use.

High level assembler

An assembler program that uses a high-level programming language to control the structure of statements. It not only maintains the advantages of assembly language, but also absorbs the advantages of high level language, such as simple writing and easy reading. This is due to the advanced assembly program allows the user to use the advanced programming language of control statements, such as conditional statements, looping statements, functions and procedures) control part of the program, but also allows the user to directly using the assembly language direct control storage allocation and access hardware registers, describe the high-level language is difficult to express algorithms. The first high level assembler is the PL/360 language assembler developed by n. worth for the IBM360 system. Its characteristics are that the control part of the program is written by the control statements of the high level language, and the data processing part is written by the IBM360 assembly instructions. Since then, there have been ALGOL like assembler, FORTRAN like assembler FAT.

2. Compile the necessary knowledge

To learn assembly well, first of all, we need to understand the hardware structure such as CPU and the execution process of APP/ program. The most important hardware related is CPU/ memory. In assembly, most instructions are CPU – and memory-specific.

2.1 App execution process

How is APP/ program executed?

2.2 the bus

Each CPU chip has a number of pins connected to a bus through which the CPU interacts with external devices
Bus: A collection of wires
Bus classification

The address bus

The data bus

Control bus

Example 1: The CPU reads data from unit 3 of memory
The address bus

Its width determines the addressing capability of the CPU

The address bus width of the 8086 is 20, so the addressing capability is 1M (220)

The data bus

Its width determines the amount of data a CPU can transmit at a single time, or data transfer speed

The 8086’s data bus width is 16, so it can transfer up to 2 bytes of data at a time

Control bus

Its width determines the CPU’s ability to control other devices and how many controls it can have.

Example 2:

A CPU with 8KB addressing capacity has a width of _ for its address bus13___

The address bus width of 8080,8088,80286,80386 is 16,20,24,32, respectively. So what are their addressing capabilities __64__KB, __1__MB,___16_MB,__4__GB respectively?

The data bus width of 8080808 8808 6802 86803 86 8, respectively, 8, 16, 16, and 32. Then they can transmit data at a time is: ___1_B, __1__B, __2__B, __2__B, __4__B,

To read 1024 bytes of data from memory, the 8086 must read at least __512__ and the 80386 must read at least __256__.

2.3 memory

The size of the memory address space is limited by the CPU address bus width. The address bus width of the 8086 is 20, and 220 different memory units (memory address range 0x00000 to 0xFFFFF) can be located. Therefore, the memory space of the 8086 is 1MB

0x00000 to 0x9FFFF: Main memory. Can read but write

0xA0000 to 0xBFFFF: Writes data to the video memory, which is output to the display by the video card. Can read but write

0xC0000~0xFFFFF: Stores various hardware \ system information. read-only

2.4 the CPU

The CPU has controllers, arithmetic units and registers. The function of register is the temporary storage of data.
CPU computing speed is very fast. For performance purposes, the CPU creates a small temporary storage area and copies data from the memory to this small temporary storage area before performing operations. We call this small temporary storage area a register.
For ARM64 cpus, a register beginning with an X indicates a 64-bit register, and a register beginning with a W indicates a 32-bit register. There are no 16 – and 8-bit registers available for access and use. The 32 bit register is the lower 32 bit part of the 64 bit register and does not exist independently.

2.4.1 Caching

The ARM processor A11 on the iPhoneX has a level 1 cache of 64KB and a level 2 cache of 8M.
Before executing an instruction, the CPU reads the instruction from memory into the CPU and executes it. Registers run much faster than memory reads and writes, and the CPU integrates a cache storage area for performance. When a program is running, the code and data to be executed are copied to the cache (done by the operating system). The CPU reads the instructions from the cache to execute them.

2.5 base

Many people do not learn the base system, the reason is that they always rely on the decimal system to consider other bases, when the operation is always converted to the decimal system first, this learning method is wrong. Why do we have to convert to decimal? We convert simply because we are most familiar with decimal. Every base system is perfect, and the best way to learn base system is to forget about the decimal system and the conversion between bases!

2.5.1 Definition of base

Octal is made up of eight symbols :0, 1, 2, 3, 4, 5, 6, 7

The decimal system consists of 10 symbols :0, 1, 2, 3, 4, 5, 6, 7, 8, 9

The n-base system is made up of N symbols: carry one every N

Under what circumstances is 1 plus 1 equal to 3?

The decimal system consists of 10 symbols: 0, 1, 3, 2, 8, A, B, E, S, 7

If you define the decimal system like this: 1 + 1 = 3! Just right!

The decimal system consists of ten symbols, every ten into one, the symbol can be customized!!

The traditional decimal notation is different from the custom decimal notation. So these 10 symbols if we don’t tell people about this symbol table, they can’t get our specific data! This way we can use custom decimal symbols for encryption!

2.5.2 Base multiplication table

If you were given a base 8 problem, how long would it take you to figure it out if you were given only a pen?

Calculate the addition, subtraction, multiplication and division of the following two base 8 data: 2 + 3 = __, 2 * 3 = __, 4 + 5 = __, 4 * 5 = __. 277 + 333 = __, 276 * 54 = __, 237-54 = __, 234/4 = __.

If the above 8 questions are changed to base 10, I believe that many children’s shoes can be quickly calculated, this is because we memorized 99 times table in primary school. It is also difficult to multiply and divide in base 10 without the aid of the multiplication table.
So we also need to do the multiplication table to calculate other bases.

2.5.3 Base operation

Octal addition table

0 12 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27… 1+1 = 2 1+2 = 3 2+2 = 4 1+3 = 4 2+3 = 5 3+3 = 6 1+4 = 6 3+4 = 7 4+4 = 10 1+5 = 6 2+5 = 10 4+5 = 11 5+5 = 5 12 1+6 = 7 2+6 = 10 3+6 = 11 4+6 = 12 5+6 = 13 6+6 = 14 1+7 = 10 2+7 = 11 3+7 = 12 4+7 = 14 6+7 = 15 7+7 = 16

Octal multiplication table

0 12 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27… 11 = 1 12 = 2 22 = 4 13 = 3 23 = 6 33 = 11 14 = 4 24 = 10 34 = 14 44 = 20 15 = 5 25 = 12 35 = 17 45 = 24 55 = 31 16 = 6 26 = 14 36 = 22 46 = 30 56 = 36 66 = 44 17 = 7 27 = 16 37 = 25 47 = 34 57 = 43 67 = 52 77 = 61

Example 3: Four operations

277 236 276 234 + 333-54 * 54/4 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --Copy the code

2.5.4 binary

Short for binary

Binary: 1 0 1 1 1 0 1 1 1 1 0 0Copy the code
Three binary groups: 101 110 111 100 octal: 5 6 7 4 Four binary groups: 1011 1011 1100 Hexadecimal: B B C

Binary: write from 0 to 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 This binary is too cumbersome to use, change to a simpler symbol: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F that’s hexadecimal

2.5.6 Width of data

Mathematical numbers have no size limit and can be infinitely large. In computers, however, due to hardware constraints, data is limited in length (we call it data width), and any data exceeding the maximum width is discarded.

#import <UIKit/UIKit.h> #import "AppDelegate.h" int test(){ int cTemp = 0x1FFFFFFFF; return cTemp; } int main(int argc, char * argv[]) { printf("%x\n",test()); @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class])); }}Copy the code

2.5.7 Common data widths in computers

Bit: a Bit is a binary Bit, 0 or 1
Byte: A Byte consists of eight bits (8 bits). The smallest unit of memory Byte.
Word: A Word consists of two bytes (16 bits), which are called high bytes and low bytes.
2. A Doubleword consisting of two words (32 bits).

Unsigned number, direct conversion! Signed numbers: positive numbers: 0 1 2 3 4 5 6 7 Negative numbers: F E D B C A 9 8-1-2-3-4-5-6-7-8

2.5.8 Customizing Base Symbols

Example 4: now there are 10 decimal numbers: 2,9,1,7,6,5,4, 8,3, A every 10 into 1 then: 123 + 234 = ____

Decimal: 0 1 2 3 4 5 6 7 8 9 Custom: 2 9 1 7 6 5 4 8 3 A 92 99 91 97 96 95 94 98 93 9A 12 19 11 17 16 15 14 18 13 1A 72 79 71 77 76 75 74 78 73 7A 62 69 61 67 66 65 64 68 63 6A 52 59 51 57 56 55 54 58 53 5A 42 49 41 47 46 45 44 48 43 4A 82 89 81 87 86 85 84 88 83 8A 32 39 31 37 36 35 34 38 33 3A 922

So you can just convert to base 10 and look it up! But in other bases. We can’t convert, we have to learn to look up tables

Example 5: now there are 9 symbols: 2,9,1,7,6,5,4, 8,3 each of the 9 is 1 then: 123 + 234 = ____

Decimal: 0 1 2 3 4 5 6 7 8 Custom: 2 91 76 5 4 8 3 92 99 91 97 96 95 94 98 93 12 19 11 17 16 15 14 18 13 72 79 71 77 76 75 74 78 73 62 69 61 67 66 65 64 68 63 52 59 51 57 56 55 54 58 53 42 49 41 47 46 45 44 48 43 82 89 81 87 86 85 84 88 83 32 39 31 37 36 35 34 38 33 922

2.6 register

2.6.1 Introduction to Registers

What is a register

Registers are small storage areas in the CPU for data storage. They are used to temporarily store the data and results of an operation. Register is a common sequential logic circuit, but this sequential logic circuit only contains memory circuit. The memory circuit of register is composed of latch or flip-flop, because a latch or flip-flop can store 1 bit binary number, so by N latches or flip-flop can constitute N bit register. Registers are an integral part of the CPU. Registers are high-speed storage units with limited storage capacity. They can be used to temporarily store instructions, data, and addresses.

The function of a register is to store binary code, which is made up of a combination of triggers with storage function. A flip-flop can store 1 bit of binary code, so a register containing n bits of binary code needs n flip-flops.

According to different functions, registers can be divided into basic register and shift register. The basic register can only input data in parallel and output data in parallel. The data in the shift register can be shifted right or left bit by bit under the action of the shift pulse. The data can be input and output in parallel, or input and output in serial. It can also be input and output in parallel, or input and output in serial.

In computer, registers are the internal components of the CPU, including general purpose registers, special registers and control registers. Registers have very high read and write speeds, so data transfer between registers is very fast.

The internal components are connected by a bus

For programmers, the most important parts of CPU are registers, which can be controlled by changing the contents of registers
The number and structure of registers are different for different cpus

Characteristics of registers

The register has at least the following four functions: ① Clear digit: clear the original digit in the register. (2) Receiving digit: under the action of receiving pulse, the input digit is stored in the register. ③ Digital storage: before there is no new write pulse, the register can save the original digital unchanged. (4) output digital: under the action of output pulse, only through the circuit output digital.

Registers with only the above functions are called digital registers. Some registers also have the shift function, called shift registers

Registers can be accessed in serial and parallel ways. The way in which a binary number is stored in or read out of a register at once is called parallel mode. The method of dividing n bits into n registers and reading them out of registers l bits at a time is called serial mode. The parallel mode only needs a clock pulse to complete the data operation, which is fast, but requires N input and output data lines. Serial mode to use several clock pulse to complete the input or output operation, the work speed is slow, but only need one input or output data line, transmission line is few, suitable for long-distance transmission.

In a digital circuit, the circuit used to store binary data or code is called a register. Registers are composed of flip-flops with storage functions. A flip-flop can store 1 bit of binary code, and the register containing the gate binary code must be composed of bamboo flip-flop

The flip-flop in the register is only required to have the function of setting 1 and 0, so no matter the flip-flop triggered by level, or the flip-flop triggered by pulse or edge, it can form a register.

74HC175 is a 4-bit register composed of CMOS edge trigger. According to the action characteristics of edge trigger, the state of the output end of the trigger only depends on the state of the D-end when CLK rising edge arrives. It can be seen that although 74LS75 and 74HC175 are 4-bit registers, the action characteristics are different due to the use of different structure types of triggers

In order to increase the flexibility of use, some control circuits are also added in some register circuits, so that the register also added asynchronous zero, output three state control and hold functions. The hold mentioned here means that when CLK signal arrives, the flip-flop does not change its state with the input signal of the D end, and keeps the original state unchanged.

In the two register circuits introduced above, when receiving data, all bits of code are input at the same time, and the data in the flip-flop appear at the output side in parallel, so this input and output mode is called parallel input and parallel output mode.

2.6.2 Register type

2.6.2.1 General register

The universal register set includes AX, BX, CX and DX4 16-bit registers, which are used to store 16-bit data or address. Can also be used as an 8-bit register. When used as an 8-bit register, they are denoted as AH, AL, BH, BL, CH, CL, DH, DL. Only 8 bits of data can be stored, not addresses. They are the high and low octets of AX, BX, CX and DX, respectively. If AX=1234H, then AH=12H, AL=34H. General purpose registers are versatile and have the same function for any instruction. In order to shorten the length of the instruction code, certain general-purpose registers were used for specialized purposes in the 8086. For example, the CX register must be used as the count register in the serial instruction to store the length of the string. In this way, the register number of CX need not be given in the serial operation instruction, thus shortening the length of the serial operation instruction code.

AX(AH, AL) : accumulator. Some instruction conventions have AX(or AL) as the source or destination register. Input/output instructions must be implemented through AX or AL. For example, the contents of port address 43H read into CPU are TERMINAL, 43H or INAX, 43H. The destination operand must be AL/AX, not any other register.
BX(BH, BL) : base address register. BX can be used as indirection address registers and base address registers, BH, BL can be used as 8-bit general data registers.
CX(CH, CL) : count register. CX acts as a counter in loop and string operations, and the CX contents are automatically modified after the instruction is executed, so it is called a count register.
DX(DH, DL) : data register. In addition to being used as a general purpose register, it can be used as a port address register in 1/O instructions and as an auxiliary accumulator in multiply and divide instructions.

ARM64 has 31 64-bit general-purpose registers x0 through X30. These registers are usually used to store general data, called general-purpose registers (and sometimes special-purpose registers).

So w0 through W28 these are 32 bits. 64-bit cpus are 32-bit compatible. So you can use only the lower 32 bits of the 64-bit register. For example, w0 is the lower 32 bits of x0!

Typically, the CPU stores the data in memory into a general purpose register, and then performs operations on the data in the general purpose register
Suppose you have a chunk of red memory with a value of 3, and now you want to increase its value by 1 and store the result into blue memory

The CPU will first place the value of the red memory space in register X0: mov X0, red memory space and then add register X0 to 1: add X0,1 and finally assign the value to the memory space: MOV blue memory space,X0

2.6.2.2 Segment register

8086/8088CPU can directly address 1MB of memory space, direct addressing requires 20 bits of address code, and all internal registers are 16 bits, can only directly address 6KB, so the segmentation technology is used to solve. Divide 1MB of storage space into logical segments with the maximum length of 64KB each. These logical segments can float in the entire storage space.
Four 16-bit segment registers are set up in the CPU, which are code segment register CS, data segment register DS, stack segment register SS, and additional segment register ES. The first address of corresponding logical segment is given by them, which is called “segment base address”. The segment base address is combined with the intra-segment offset address to form a 20-bit physical address, which can be stored in registers or memory.
For example, the code segment register CS stores the base address of the current code segment, and the IP instruction pointer register stores the offset address of the next instruction to be executed, where CS=2000H and IP=001AH. By combination, the address of the 20-bit storage unit is 2001AH.
The code segment holds the executable instruction code, the data segment and the add-on segment hold the data of the operation, usually the operand is in the current data segment, and in the serial instruction, the destination operand must be in the current add-on segment. The stack section is opened up as the stack area to be used in the execution of the program, which is accessed in a first-in, last-out manner. The segment registers indicate a specified active segment and are not used interchangeably. When the program is small, the code segment, data segment, and stack segment can be placed in one segment, that is, contained within 64KB. When the program or data volume is large, exceeding 64KB, multiple code segments or data segment, stack segment, and additional segment can be defined. The current segment indicates the segment address by the segment register, and the contents of the segment register can be modified to point to other segments. Sometimes, for clarity, an instruction is prefixed with a segment transcendence to specify the segment in which the operand resides.

2.6.2.3 Instruction pointer register

PC Register (Program Counter)

Is the instruction pointer register, which indicates the address of the instruction that the CPU is currently reading
In memory or on disk, instructions and data are indistinguishable as binary information
The CPU works by treating some information as instructions and some as data, assigning different meanings to the same information

For example, 1110 0000 0000 0011 0000 1000 1010 1010 can be regarded as data 0xE003008AA or as instruction mov x0, x8

On what basis does the CPU interpret information in memory as instructions?

The CPU treats the contents of the memory cell that the PC points to as instructions. If the contents of memory have been executed by the CPU, the memory cell in which it resides must have been pointed by the PC

8086/8088CPU set up a 16-bit instruction pointer register IP, used to store the next instruction to be executed in the current code segment offset address. When the program is running, it is automatically modified by BIU so that IP always points to the address of the next instruction to be executed. Therefore, it is used to control the execution process of the instruction sequence and is an important register. The 8086 program cannot access the IP directly, but it can modify the contents of the IP through certain instructions. For example, when encounter interrupt instruction or call subroutine instruction, 8086 automatically adjust the content of IP, IP will be the next instruction to be executed in the address offset stack protection, to interrupt the execution of the program or subroutine return, can be protected from the stack popup to IP, so that the main program to continue to run. In the jump instruction, the new jump target address into IP, change its content, realize the program transfer.

2.6.2.4 Pointers and indexing registers

BP(Base Pointer regilter) : Base address Pointer register.
SP(Stack Pointer Register) : indicates the Stack Pointer Register.
SI(Source Index Register) : Source indexing register.
Destination Index Register (DI) : indicates the Destination Index Register.
This set of registers holds an internal address offset and is used to form operand addresses for stack operations and indexing operations. BP and SP registers, called pointer registers, are used in conjunction with SS to facilitate access to the current stack segment. The BP register is usually used in the indirect addressing, and the operand is in the stack segment. The SS segment register is combined with BP to form the operand address, which is the offset of the “base address” of a data area in the current stack segment in BP, so the BP register is called the base address pointer.
The SP register is used in stack operations, and the PUSH and POP instructions get the offset of the current stack segment from the SP register, so the SP register is called the stack pointer, and SP always points to the top of the stack.
The registers SI and DI, called indexing registers, are commonly used with DS to provide an in-segment address offset for accessing the current data segment. In serial instructions, where the offset of the source operand is stored in S ⅰ and the offset of the destination operand is stored in DI, SI and DI cannot be used interchangeover, otherwise the transfer address is opposite. In serial instruction, SI and DI are implicit addressing, in which SI and DS are used together, and D and ES are used together.

2.6.2.5 Flag Register FR register

The flag register FR is also called the program state word register.
The FR is a 16-bit register with 9 significant bits to store status flags and control flags. There are six status flags in total, CF, PF, AF, ZF, SF and OF, which are used to store the status information OF program operation. These flags are often used as the basis for the judgment OF subsequent instructions. The control flag has three bits, IF, DF and TF, to control the operation of the CPU. It is set manually.

2.6.2.6 Data Address register

Data address register is usually used for temporary storage, accumulation, counting, address storage and other functions of data calculation. The main purpose of these registers is to store operands in CPU instructions and use them as regular variables in the CPU.

ARM64 64-bit: x0-x30, XZR(zero register) 32-bit: W0-W30, WZR(zero register) The 8086 assembly has a special register segment register :CS,DS,SS,ES four registers to hold the base address of these segments, this belongs to the Intel architecture CPU. Not in ARM

2.6.2.7 Floating point and vector registers

Because of the storage of floating point numbers and the special nature of their operations, floating point registers are provided in the CPU to handle floating point numbers

Floating point register 64-bit: D0-D31 32-bit: S0-S31

The current CPU support vector operation.(vector operation in the graphics processing related field is very much used) for the support vector calculation system also provides a number of vector registers.

Vector register 128 bits :V0-V31

2.6.2.8 Status Register

Inside the CPU, there is a special type of register (the number and structure may vary from processor to processor). This kind of register is called the status register (CPSR) register in ARM
The CPSR is different from other registers, which are used to store data, and the whole register has one meaning. The CPSR register works bitwise, meaning that each bit has a special meaning and records specific information.

Note: the CPSR register is 32 bits

The lower 8 bits of the CPSR (including I, F, T, and M[4:0]) are called the control bits and cannot be modified by a program unless the CPU is running in privileged mode!

N, Z, C, and V are all conditional code flag bits. Their contents can be changed by the results of arithmetic or logical operations, and can determine whether an instruction is executed or not! Significant!

N indicates Negative

Bit 31 of the CPSR is N, the symbol flag bit. It records whether the result is negative after the relevant instruction is executed. If it’s negative N is equal to 1, if it’s non-negative N is equal to 0.

Note that in the ARM64 instruction set, some instructions that affect the status register, such as add, sub, or etc., are mostly operational instructions (perform logical or arithmetic operations).

Z (Zero)

The 30th bit of CPSR is the Z, 0 flag bit. It records whether the result is 0 after the relevant instruction is executed. If the result is 0, then Z = 1. If it’s not 0, then Z is equal to 0.
For the value of Z, we can see that Z marks whether the calculation result of relevant instructions is 0. If it is 0, Z should record positive information such as 0. In computers, 1 means logical truth, positive. So when the result is 0 Z = 1 means that the result is 0. If the result is not 0, Z records the negative message that it is not 0. In the computer, 0 means logic false, means negation, so when the result is not 0, Z = 0 means the result is not 0.

C (Carry)

Bit 29 of the CPSR is C, the carry flag bit. In general, unsigned numbers are performed.
Addition operation: C=1 if the operation results in a carry (unsigned overflow), otherwise C=0.
Subtraction operations (including CMP) : C=0 when a debit occurs (unsigned overflow), otherwise C=1.
For an unsigned number with bits N, the highest bit of the corresponding binary information, i.e., the n-1st bit, is its most significant bit, while the imaginary NTH bit is the higher bit relative to the most significant bit. As shown below:

carry

We know that when two pieces of data are added, it is possible to produce a carry from the most significant bit to a higher one. For example, two 32-bit bits of data: 0xAAAAaAAA + 0xAAAAaAAA will produce a carry. Since the carry value cannot be stored in 32 bits, we simply say that the carry value is lost. In fact, the CPU does not discard the carry system, but records it in a special register. ARM uses C bits to record the carry value. For example, the following command

mov w0,# 0 xaaaaaaaa; The binary of 0xA is 1010Adds w0, w0, w0; After execution equals 1010 << 1 carry 1 (unsigned overflow) so C is marked with 1 adds W0,w0,w0; After execution equals 0101 << 1 carry 0 (unsigned without overflow) so C is marked with 0 adds W0,w0,w0; Repeat the above to add W0,w0,w0Copy the code

A borrow

When you subtract two numbers, it’s possible to borrow higher. For another example, two 32-bit data: 0x00000000-0x000000FF will generate a debit. After the debit, it is equivalent to calculating 0x100000000-0x000000FF. I get the value 0xffffFF01. Since we borrowed one bit, the C bit is used to mark the borrowing. C = 0. For example:

mov w0,#0x0
subs w0,w0,#0xff ;
subs w0,w0,#0xff
subs w0,w0,#0xff
Copy the code

V(Overflow) Indicates the Overflow flag

Bit 28 of the CPSR is V, the overflow flag bit. When a signed number operation is performed, if it exceeds the range that the machine can identify, it is called an overflow.

Positive + positive is negative overflow negative + negative is positive overflow positive + negative cannot overflow

2.6.3 Working principle of registers

In the computer and other computing system, register is a very important, essential digital circuit caustic component, it is usually composed of trigger (D trigger), the main function is to temporarily store digits or instructions. One flip-flop can hold one bit of binary code. To hold N bits of binary code, you need N flip-flops.
Register should have the function of receiving data, storing data and outputting data. It consists of trigger and gate circuit. The register can receive data only when it gets “depositing pulse” (also known as “depositing instruction” and “writing instruction”). The register outputs data only when the “read out” instruction is received.
Registers store numbers in parallel and serial ways. Parallel mode is the digital input from each corresponding bit into the register at the same time; In serial mode, digits are entered bit by bit from one input into a register.
Registers can also be read out in parallel and serial ways. In parallel mode, the digits being read appear simultaneously on each output; In serial mode, the digit being read appears bit by bit at an output

2.6.4 Register addressing

Register addressing is to use the value of the register as the operand, which is often used by all kinds of microprocessors, but also a high efficiency of addressing.
Register addressing means that the operand is stored in a register inside the CPU, and the instruction gives the register name in which the operand resides. Register operands can be 8-bit registers AH, AL, BH, BL, CH, CL, DH, DL, or 16-bit registers AX, BX, CX, DX, SP, BP, SI, DI, etc. Because register addressing does not require access to memory through bus operations, instruction execution is faster.
Register Addressing is Addressing the contents of general purpose registers as operands. In this Addressing, operands are stored in registers. The addressable objects of register addressing mode are: A, B, DPTR, RO~R7. Where, B addresses the register only in multiplication and division instructions and directly in other instructions. A can be addressed either by register or directly. When addressing directly, it is written ACC.

2.6.5 Registers of the ARM processor

The ARM microprocessor has a total of 37 32-bit registers, of which 31 are general purpose registers and 6 are state registers. However, these registers cannot be accessed at the same time. The specific registers that can be accessed programmatically depend on the working state and specific operating mode of the microprocessor. But at any time, the general purpose registers R14 to R0, the program counter PC, and one or both status registers are accessible.
The ARM9 processor has a total of 37 32-bit registers, including:

(1) RO ~ R12: all are 32-bit general purpose registers used for data operation. Note, however, that most 16-bit Thumb instructions can access only R0 to R7, whereas 32-bit thumb-2 instructions can access all registers. (2) Stack pointer: The lowest two digits of the stack pointer are always O, which means the stack is always 4-byte aligned. (3) Link register: when a subroutine is called, R14 stores the return address. (4) program counter: point to the current program address, if you modify its value, you can change the program execution flow. (5)6 status registers (1 CPSR, 5 SPSR), used to mark the CPU’s ding operation state and the running state of the program, are all 32 bits, only a part of the current use of its software.

3. Common assembly instructions

3.1 bl instruction

Where the CPU executes instructions from is determined by the contents of the PC, and we can control the CPU to execute the target instructions by changing the contents of the PC
ARM64 provides a MOV instruction (transfer instruction) that can be used to change the values of most registers, such as MOV X0,#10, MOV X1,#20
However, the MOV instruction cannot be used to set the value of a PC, and ARM64 does not provide such functionality
ARM64 provides additional instructions to modify PC values. These instructions are collectively called transfer instructions, the simplest of which are BL instructions
Bl instruction instance:

Now there are two pieces of code! Assuming the program executes A first, write down the order in which the instructions are executed. What is the value of the final register X0?

_A:
    mov x0,#0xa0
    mov x1,#0x00
    add x1, x0, #0x14
    mov x0,x1
    bl _B
    mov x0,#0x0
    ret

_B:
    add x0, x0, #0x10
    ret
Copy the code