Computers execute machine code. Machine code translated into readable form is assembly and reverse engineering. We actually generated assembly when we used GCC, but the process was omitted, and we specified an optimization level of -OG when we learned
Some details omitted in C are visible in assembly language.
Program counter
: used in x86-64%rip
Represents, giving the location in memory of the next instruction to be executedInteger register
Contains 16 named locations, each storing 64-bit values. These registers can hold address values or integers. Some registers are used to hold a temporary stateConditional code register
Holds information about the status of the most recently executed arithmetic or logic instruction. Used to implement changes in control flowA set of vector registers
Can hold one or more integer or floating point values
Let’s look at an example of assembly in the book. Write an mStore.c file. Contains the following contents
long mult2(long.long);
void multstore(long x, long y, long *des) {
long t = mult2(x, y);
*des = t;
}
Copy the code
$gcc -Og -S mstore.c # generate assembler file
$gcc -Og -c mstore.c # generate binary file (.o)
Copy the code
Generate a.o file and a.s file to view the definition of Multstore in the.s file
Each line represents a machine instruction (except for a few pseudo-instructions). The o file contains binary data. To view the contents of the machine code, we need to use a disassembler. On Linux, you can use objdump
$ objdump -d mstore.o
mstore.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <multstore>:
0: f3 0f 1e fa endbr64
4: 53 push %rbx
5: 48 89 d3 mov %rdx,%rbx
8: e8 00 00 00 00 callq d <multstore+0xd>
d: 48 89 03 mov %rax,(%rbx)
10: 5b pop %rbx
11: c3 retq
Copy the code
Another important step in generating an executable is linking. Suppose we have a main.c file that contains the following code
#include <stdio.h>
void multstore(long.long.long *);
int main(a) {
long d;
multstore(2.3, &d);
printf("2 * 3 ---> %ld\n", d);
return 0;
}
long mult2(long a, long b) {
long s = a * b;
return s;
}
Copy the code
We can generate the executable file prog in the following way
gcc -Og -o prog main.c mstore.c
Copy the code
The disassembler then generates disassembly code
$ objdump -d prog > prog.s
## contains this paragraph, as you can see it is almost identical to the disassembly of mstore.c
00000000000011d5 <multstore>:
11d5: f3 0f 1e fa endbr64
11d9: 53 push %rbx
11da: 48 89 d3 mov %rdx,%rbx
11dd: e8 e7 ff ff ff callq 11c9 <mult2>
11e2: 48 89 03 mov %rax,(%rbx)
11e5: 5b pop %rbx
11e6: c3 retq
11e7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
11ee: 00 00
Copy the code
All to.
The leading lines are all directives that direct the linker and assembler. We usually ignore these fields.
The data format
Intel uses terms to refer to 16-bit data types. Therefore, the 32 digit number is called a double word, and the 64 digit number is called a four word.
Most gCC-generated assembly code has a one-character suffix indicating the size of the operands. For example, there are four variants of data transmission: MOVB (transmission byte), MOVW (transmission word), MOVL (transmission double word), MOVQ (transmission four word). L stands for double word. Assembly code also uses suffixesl
Represents a four-byte integer and an eight-byte double-precision floating-point number. There is no ambiguity because of the different instructions used.
16 general purpose registers
Operand indicator
Data transfer instruction
Notice that the first is the source operand and the second is the destination operand
There are two types of expansion when copying a smaller source value to a larger destination, zero expansion and sign expansion
A zero extension fills all the extra bits with zeros, and a sign extension fills all the extra bits with sign bitscltq
Instructions have no operands and always use register %eax as source and %rax as destination for sign extension
Data transfer Example
Write a piece of code that looks like this
long exchange(long *xp, long y) {
long x = *xp;
*xp = y;
return x;
}
Copy the code
You can see the assembly code below
0000000000000000 <exchange>:
0: f3 0f 1e fa endbr64
4: 48 8b 07 mov (%rdi),%rax
7: 48 89 37 mov %rsi,(%rdi)
a: c3 retq
Copy the code
Push and pop stack data
Pushq % RBQ is equal to
Pop %rax is equivalent to
Arithmetic and logical operations
Load valid address
Loading the effective address instruction leaq is actually a variation of the MOVQ instruction. Its instructions take the form of reading data from memory into registers, without actually referencing memory. Let’s take an example from the book
long scale(long x, long y, long z)
{
long t = x + 4 * y + 12 * z;
return t;
}
Copy the code
After disassembly is
The Leaq instruction can perform addition and limited multiplication
Unitary and binary operations
The second set of operations is unary. The third group is binary operations. The second operand is both source and destination, and the processor must read from memory when the second operand is a memory address.
Shift operation
The last group is the shift operation, which gives you the amount to shift, and then the second term gives you the number to shift. You can do arithmetic and logical right shifts. The shift quantity can be an immediate number or placed in the single-byte register % CL. When the value in % CL is 0xFF, the instruction SALb shifts by 7 bits, salw by 15 bits, sall by 31 bits and salq by 63 bits. There are two kinds of left-shift instructions: SAL and SHL. The effect is the same, you put a 0 on the right-hand side. The right shift instruction is different. SAR performs an arithmetic shift (fill in the sign bit) and SHL performs a logical shift (fill in the 0). The destination operand of a shift operation can be a register or a memory location.
Special arithmetic operations
Imulq is complement multiplication and mulq is unsigned multiplication. The two operations compute the 128-bit product, and the two instructions require that one parameter be in %rax and the other be given as the source operand of the instruction. The product is then stored in % RDX (high 64 bits) and %rax(low 64 bits). Although the name imulq can be used for two different multiplication operations, the assembler can tell which instruction to use by counting the number of operands. Let’s do an example from the book
#include <inttypes.h>
typedef unsigned __int128 uint128_t;
void store_uprod(uint128_t *dest, uint64_t x, uint64_t y) {
*dest=x* (uint128_t) y;
}
Copy the code
So let’s see how do we do division
void remdiv(long x, long y, long *qp, long *rp) {
long q = x / y;
long r = x % y;
*qp = q;
*rp = r;
}
Copy the code
control
Machine code provides two basic low-level mechanisms to implement conditional behavior: test data values and then change the flow of control or data based on the results of the test
Condition code
Except for the integer register. The CPU also maintains this set of single-bit conditional code registers. They describe the attributes of the most recent arithmetic or logical operation. The most commonly used condition code is
If you calculate t is equal to a plus b. So the following C expression sets the conditional code
Access condition code
Condition codes are usually not read directly. There are three common ways to use condition codes: 1) Set a word to 0 or 1 based on some combination of condition codes. 2) Conditional jump to some other part of the program. 3) Data can be transmitted conditionally. For the first, we call it the SET instruction
Let’s look at an example from the book
Jump instruction
JMP is an unconditional jump, others are jumps based on comparison results
Conditional branching is realized by conditional control
Let’s look at an example
long lt_cnt = 0;
long ge_cnt = 0;
long absdiff_se(long x, long y)
{
long result;
if(x < y)
{
lt_cnt++;
result = y - x;
}
else
{
ge_cnt++;
result = x - y;
}
return result;
}
Copy the code
Conditional branching is implemented by conditional transfer
The traditional approach is conditional transfer through the use of control. When the condition is met, the program follows one execution path; when the condition is insufficient, it follows another path. But it can be very inefficient. We can use conditional transfer instructions to achieve, more in line with the performance characteristics of modern processors