preface
Recently I prepared to learn assembly, and then I saw the video sent by the author named iOS Xiaoxian on B website which is quite good, I plan to follow it, the article is a note to watch the video, and finally there is a link to the original video, if you want to watch the video, you can check the video through the link.
Machine language
- Machine instructions made up of zeros and ones.
- For example, 0101 0001 1101 0110
** Note: ** machine instructions are eventually converted into electrical signals.
Assembly Language
- Use of symbols instead of machine language, also known as symbolic language
- Such as: mov ax, bx
A high-level language
- C++ Java OC Swift, closer to human natural language
- Int a = b
Our code on a terminal device looks like this:
* * note: Ax and Bx are registers, mov AX, Bx is the value of the AX register into BX, MOV AX, Bx is the bottom of the CPU is 1000100111011000, Doing the reverse development process there is a concept called the disassembly, installation of the mobile phone is encrypted or shell machine language, through the tool can hit shell, after smashing shells can get machine language, and then can be disassembled into assembly language (because a assembly instruction and a single machine instruction is one-to-one), but cannot be compiled into a high-level language, assembly language Because different platforms have different assembly instructions, the same C/OC language may generate different assembly code, and it may also generate the same assembly code.
High-level languages are wrapped on top of assembly language, so we see assembly language with a lot of code to configure the environment to support some of the syntax features of high-level languages, such as object-oriented features.
- Assembly language and machine language one – to – one correspondence, each machine instruction has a corresponding assembly instruction
- Assembly language can be compiled into machine language, and machine language can be disassembled into assembly language
- High-level languages can be compiled into assembly language machine language, but assembly language machine language is almost impossible to restore to high-level language
Assembly language features
- It can directly access and control all kinds of hardware devices, such as memory and CPU, which can maximize the functions of hardware
- The ability to have complete control over the generated binary code without the limitations of the compiler
- The object code is short, occupies little memory and executes fast
- Assembly instruction is a mnemonic of machine instruction, corresponding to machine instruction. Each model of CPU has its own machine instruction set, assembly instruction set, so assembly language is not portable
- Knowledge points too much, developers need to understand the CPU and other hardware structure, is not easy to write, debugging, maintenance
- Case insensitive, for example mov is the same as MOV
** Note: ** assembly can directly access the CPU, the register belongs to the CPU, can maximize the functionality of the hardware, and is not limited by the compiler. For example, to write some viruses or do some security related things, why can you do these things? Because when you do some development in a high-level language, there are a lot of functions that are limited, but because assembly is directly accessing the CPU, and because the operating system is also an instruction set, we can do everything the operating system can do. “Execution speed” is a relatively high-level language, a high-level language code generation of assembly language can be very much, because high-level language to build an environment, such as object-oriented language features, with a lot of assembly instruction to support, so direct write assembly language “target code short, less memory, fast execution”. Again, “standard code is short and takes up less memory”, the code we write needs to be loaded into memory during execution, called the loading process, high-level languages may you write very little code, but generate a lot of assembly, so it takes up a lot of memory. And the binary that the high-level language eventually transforms, because it has to build the environment, it has to go around a lot, it has to link a lot of libraries, for example NSLog, it has to link a lot of system libraries, so the binary that it eventually generates has a lot of binary things, so the machine code that it eventually transforms is bigger.
Use of assembly
- Write drivers, operating systems (such as some key parts of the Linux kernel)
- A program or snippet of code that requires high performance and can be mixed with a high-level language (inline assembly)
- Software security
- Virus analysis and control
- Reverse, shell, unshell, crack, plug-in, avoid killing, encryption and decryption, vulnerability, hacker
- The best starting point and most efficient way to understand the entire computer system
- Lay the foundation for writing efficient code
- Get to the bottom of the code
- What is the nature of a function?
- sizeof
- ++a ++ A ++ +a how does the bottom layer implement?
- What does the compiler really do for us?
- What are the key aspects of DEBUG and RELEASE modes that we missed
- .
Assembly001 isa new project. Set the Debug Workflow to Always Show Disassembly
Execute a program
Movl %eax, -0x14(% RBP) MovL %eax, -0x14(% RBP) MovL %eax, -0x14(% RBP) MovL %eax, -0x14(% RBP) MovL %eax, -0x14(% RBP) That’s 4 bytes in memory, 32 bits.
0x10cc7f795: 89 45 ec e8 e9 02 00 00 8b 7d f8 48 8b 75 f0 48 .E....... }.H.u.H 0x10cc7f7a5: 8b 0dcd24 00 00 48 8b 15 be 24 00 00 89 7d e8 ... $.. H... $... }.Copy the code
Memory read 0x10CC7f795, 48 83 ec 30 is the instruction for subq $0x30, % RSP.
0x10cc7f774: 48 83 ec 30 c7 45 fc00 00 00 00 89 7d f8 48 89 H.. 0.E...... }.H. 0x10cc7f784: 75 f0 bf 01 00 00 00 be 02 00 00 00 e8 bb ff ff u...............Copy the code
Enter s challenge to test
The types of assembly languages
-
At present, more assembly language is discussed
- 8086 assembly (the 8086 processor is a 16-bit CPU)
- Win32 compilation
- Win64 assembly
- ARM Assembly (Embedded, Mac, iOS)
- .
-
We use ARM assembly in the iPhone, but it varies from device to device. The CPU architecture is different.
architecture | equipment |
---|---|
armv6 | IPhone, iPhone2, iPhone3G, first generation, second generation iPod Touch |
armv7 | iPhone3GS, iPhone4, iPhone4S,iPad, iPad2, iPad3(The New iPad), iPad mini, iPod Touch 3G, iPod Touch4 |
armv7s | iPhone5, iPhone5C, iPad4(iPad with Retina Display) |
arm64 | iPhone6s, iphone6s plus, iPhone6, iPhone6 plus, iPhone5S, iPad Air, iPad mini2 |
-
Because of learning, it is recommended to start with the most classic 8086
- The structure is simple and easy to understand
- The instructions are simple and easy to remember
- The principle is same
Mach-o is an executable file that needs to be loaded into memory. Local images, bundles, and NIB files are not executable files, they are data, and they need to be loaded into memory, but only when they are used, but the executable files are loaded into memory first. After loading into the memory, the CUP reads the instructions through the address bus, and after reading the corresponding instructions, it controls the screen of the terminal device through the control bus. Each pixel is RGBA, and each value is from 0 to 255. The binary of 255 is 1111, 1111, 8 binary bits, that is, one byte.
- The most important hardware related is CPU/ memory
- In assembly, most instructions are CPU – and memory-specific
Reference:
Underlying principles (decompilation) – First introduction to assembly