Computer programs and assembly language
As we all know, the nature of a program is an input/output process. Just like we have input/output all the time in our life. Seeing, hearing, smelling, pointing, etc., are all ways of receiving input information, which is then processed by the brain or body instinctively to produce output — and then various actions. But these processes are so natural and integrated that they are not easily perceived.
Computer programs, naturally, conform to the way people respond to the world. They take an input of a piece of information, process it in a specific way, and produce an output. Furthermore, in a specific context, the output will be interpreted differently. As in the movie’s “Deal or No Deal,” the input may end up with a loud clang and a broken cup, followed by “500 Axers” or “jump out: ‘I’m sorry, I’m undercover.'”
In mathematical terms, y=f(x), x is the input, f() is the processing, and y is the output. Y =f(x) abstractly describes both the way we see the world and the most central part of a computer — the core function of the CPU — computing.
For the CPU, its work is inseparable from memory, so to speak, the CPU only knows memory. The memory stores the processing code of F (), the input data X, and the data Y processed by F (X).
Observing this process, we can find:
- You need a way to access some memory address so that you can find f(), x
- There needs to be a way to read the information contained in f() and x
- You need to have a way of doing all kinds of operations, so you can do f()
- There has to be some way to store y
No matter how complex the program is, the actual process is the same. All of this can be done by machine code writing machine instructions, but machine code consisting of only zeros and ones is too unfriendly to humans to read or write. Here comes assembly language, which assists in the logical control of the underlying execution in a manner consistent with human reading and writing. Finally, assembly language + compiler -> machine code = the specific processing of the program written.
Now, we take 8086CPU as the operating environment, learning to pry assembly language ideas, to have the basic knowledge. Environment 8086 unless otherwise specified.
register
Consider the simplest operation c = a + b, where a and b are the factors involved and C is the result. Although a and B can come from memory, the location of the information participating in the computation itself should be independent of its source location, so additional places are needed to store this part of information.
In order to ensure that the operation itself can be well executed, and the need to store information storage components, is the register. Now A can be put in register 1 and B can be put in register 2. For example, the search basis of the source of A and B, the basis of the destination of C, these information can also be in the register.
In the 8086, there are 14 registers to help complete the operation. Each register has its own name and can be divided into segment registers, special registers, general registers and pointer registers for different functions. When it is used, it will be explained further.
Each register is 16 bits, that is, it can represent 16 bits of information. For easy reading, the information is expressed in hexadecimal format. For example, “0010 0101 0110 1100 B” = “256C H”, where “B” indicates a binary number and” H “indicates a hexadecimal number.
The bus
Ever wonder why information in a computer is represented in binary?
The answer has to do with the electronics that carry the message. A wire may represent “0” and “1” at high and low levels for a very short time. When the voltage reaches a certain threshold value, the high level is 1, and the low level is 0. It can be loosely interpreted as 1 if the power is on, 0 if the power is not on. So a wire can only represent one of two states for a very short period of time. So if you look at it for a period of time, you might get something like “1, 0, 0, 1”.
Of course, for efficiency, you can use more than one wire at a time
As above, you can get the message “1001 1101” once.
The information itself is meaningless, and when we set the rules, the different information values have meaning. As sometimes we have codes: 1 for gaming, 2 for busy, 3 for going out, etc.
So, for the CPU, which treats all the storage parts as memory, it exchanges information with the CPU over the bus
Among them:
- Address bus: Can address the storage unit.
- Data bus: Reads and writes data to the target storage unit.
- Control bus: determines the ability to control CPU external devices.
If there are 10 root address buses, say the width of the address bus is 10, then there is a 2^10 = 1024 probability that it will transfer data at one time.
memory
Any storage component with storage capability can be regarded as memory. In memory, a storage area is divided equally into several storage units, each of which can hold eight bits of binary information, also known as one byte, or byte.
Each storage unit has a logical id. In this way, you can locate the destination storage unit based on the ID.
Of course, storage capacity can be expressed in many equivalent ways, and each unit is converted to
1TB = 1024GB
1GB = 1024MB
1MB = 1024KB
1KB = 1024B
1B = 8Bit
Copy the code
Now, observing the process of exchanging information between CPU and memory, such as reading a piece of data, can be represented as:
- The CPU sends power information through the address bus to find the storage unit at address X
- The CPU sends read and write commands to the memory through the control bus
- The memory sends data Y on address X to the CPU over the data bus
In general, the address bus finds the location of the information, the data gets the data at the location, and the control bus decides which storage unit to find. In fact, from the CPU’s point of view, all the storage devices associated with it across the bus are treated as memory and take data input from somewhere in memory, perform operations on it, and finally output to somewhere in memory. The CPU is only involved in the calculation, and the storage device itself does not care what the final result means. Just as the CPU sends data to the graphics card, the graphics card device knows what to display on the screen.
addressing
If the width of the address bus is expressed as the number of memory cells, the address bus can easily find any address. But in 8086, a memory addressing range might require a 20-bit bus to find all addresses. So how do you find 20 bits in a 16 – width address bus?
16-bit addresses range from 0 to FFFF, and 20-bit representations range from 0 to F FFFF. In the representation of hexadecimal numbers, a 20-bit binary has one more hexadecimal bit than a 16-bit binary.
Addressing can be done in a composite manner, introducing the concept of physical address = segment address + offset address.
The segment address and offset address can be expressed in different ways. If there are 100 meters between the starting point and the end point, “if I stand at the starting point, I only need to walk 100 meters to reach the end point” and “if I stand 50 meters from the starting point, I only need to walk 50 meters to reach the end point” can be used to find where the end point is.
Returning to CPU addressing, for segment address A, offset address B, physical address = A * 16 + B. For example, addressing F FFFF, A=F000 and B=FFFF is one expression, as is X=FFF0 and Y=00FF.
In this way, the address bus first data is the segment address, the second data is the offset address, so that it can be addressed.
Memory structure
Memory is all memory to the CPU, and memory itself can be simply divided into RAM(random access memory) and ROM(read only memory). From an assembly point of view, the RAM area is at our disposal, but ROM can only be read and written. The reason lies in that the ROM on the motherboard saved the BIOS(basic input and output system) content, energized after the first run inside the program, it is the beginning of everything, and is an important medium connected with external devices. Therefore, the contents should not be rewritten.
Get to know some registers
- AX, BX, CX, DX: This register is the general register, they are stored in the middle of the calculation data storage. For compatibility, each 16-bit register can be further divided into two 8-bit registers. Taking AX as an example, AX can be further divided into AH and AL for the high and low 8 bits of AX, respectively.
- CS and IP: The segment registers CS indicate where the program code starts. At any time, the CPU thinks CS points to where the code starts. Note that we now operate directly and participate in computing and memory access, without the constraints of operating through the operating system. Therefore, we need to have a host to indicate the function of a certain memory area, which has been shielded and protected for us on the operating system. The IP register indicates the start of the next command, so the start of the next command of the program = the value of CS * 16 + the value of IP.
- DS: The segment register DS specifies the location of the data segment. When reading or writing information to a memory unit, the physical address of the read/write = the value of DS * 16 + the offset address of the output.
- SS and SP: a segment of memory can be divided to complete the stored procedure of the stack. The segment register SS represents the starting position of this segment, and the special register SP represents the offset range, that is, the top position of the stack = SS * 16 + SP. Note that the growth of the stack is in the direction of the low address. For example, when SP = 0012H, push 2Byte data to the stack, then SP = 0012H-0002H = 0010H. The stack header points to the lower address unit.
Cognitive word unit
In addition to the storage unit to represent the unit, sometimes in order to need, will also use a word unit to represent a storage unit, a word unit accounts for two storage units.
When we read a number, we read it from the highest to the lowest. For example, the values in register AX are also read in the order “1234”. The data capacity of a register is exactly the capacity of a word unit. During storage, the high level content of the register is stored at the high level of the memory address, and the low level content of the register is stored at the low level of the memory address. For example, AH = 12 is stored at position 1, AL = 34 is stored at position 0, and position 0 and position 1 form a 2-word unit.
Understanding assembly instruction
The original intention of assembly language is to make better use of machine instructions to complete computation. For each machine instruction, there will be a corresponding assembly instruction.
Consider an operation to send the contents of register BX to AX:
Machine instructions:1000 1001 1101 1000Assembly instructions: MOV AX,bxCopy the code
Machine instructions, by contrast, are unintelligible and error-prone, and the dense number of zeros and ones is a disaster for anyone trying to write and read. The mnemonic use of symbols in assembly instructions is more human.
It is easy to imagine that assembly language is the use of assembly instructions to build procedures. For the following instructions, we will use:
- Mov: Expression to move content from one place to another.
- Add: Adds something and stores it somewhere.
- Sub: Subtracts something and stores it somewhere.
- JMP: indicates an unconditional jump to a certain location.
Each command has its own supported variety, which is explained when used.
Environmental installation
With this knowledge, you can see the CPU’s calculations more intuitively. Now we need to install the 8086 environment to write the assembly.
You can use a VIRTUAL machine to install Windows 2000, or you can go to the DOSBox official website to download the corresponding emulator, and then install the required plug-ins. For details about how to install DOSBOX on a Mac, see Installing DOSBOX on a Mac.
After the installation is complete, run the following command on a VM:
Start -> Run -> Command to invoke the DOS window and enter DEBUG to enter the target interfaceCopy the code
If it is DOSBox, run it as follows:
Type "mount C 'any path you choose'" to mount as DOSBox working disk type "C:" to enter this mount path and type "debug" to go to the target pageCopy the code
The Debug command
On the Debug page, you can view register and memory information by running some high-frequency commands, which are further described in the following information:
- R: View and change the value of the CPU register
- D: View the contents of the target memory
- E: Overwrite the contents of the target memory
- U: Translates the contents of the target memory into assembly instructions
- T: Executes a machine instruction
- A: Translate the input assembly instructions into machine instructions and write them into memory
R View and change register contents
Enter r to view the registerCopy the code
The first two lines of output list the current values on all registers. The red line part is CS: IP. As mentioned above, these two registers represent the starting position of the next instruction. The segment address =073F and the offset address =0100 are used to describe an address in this way. The following 0000 refers to the machine code of the pointed command and displays its corresponding machine instruction in the blue line section.
// Rewrite axInput rax input1200
// Check the registerInput rCopy the code
You can see that it has taken effect. Other registers can be overwritten in the same way using the Debug command.
D often looks at the contents of memory
Input d1000:0View the contents of the memory addressCopy the code
The preceding command accesses the contents starting from 10000H of the physical address. If no range is specified, the contents of 128 storage units are displayed by default. The address information is on the left, the corresponding machine code (16 storage units per line) is in the middle, and the corresponding ASCII code translation is on the right. (The memory on my diagram can be considered empty because I use DOSBox and mount it to a clean directory.)
For example, you can enter D1000:0 f to limit the contents of 16 storage units starting from the physical address 10000H
E overwrites the contents of memory
Input e1000:0 12 00 'hello world' 00 34
Copy the code
The format of the e command is segment E address: the offset address is a string separated by Spaces. After the command is executed, data is written from 10000H.
Enter D1000:0 to view the contents of memoryCopy the code
It has taken effect.
A translates the incoming assembly instructions into machine code and writes them into memory
Machine instructions have corresponding assembly instructions, but machine code is difficult to remember, you can use the A command to write multiple commands in the form of assembly commands to memory at one time
Input a1000:10
mov bx,ax
mov bl,ah
// View what was writtenInput d1000:10
Copy the code
Mov register 1, the meaning of the register 2 command is to move the contents of register 2 to register 1. The mov command supports the following format:
- Mov Register 1, register 2: Moves the contents of register 2 to register 2
- Mov register, data: Moves data to a register
- Mov Register Memory unit: Moves the contents of a memory unit to a register
- Mov Memory unit, register: To move the contents of a register to a memory unit
- Mov segment register, register: Moves the contents of a register to the segment register
The a command has translated the input assembly instruction into the corresponding machine code and written it to the memory starting from the physical address 10010H.
The u command views the assembly instructions corresponding to the machine code of the target memory
Type U1000:10 to see the assembly instruction just written with command ACopy the code
If no number is specified, the u command translates 16 machine instructions by default. You can see that the machine code content starting with the physical addresses 10010H and 10012H is the assembly instruction we just wrote with the a command.
T Performs the next execution
To execute an instruction, the CPU executes the instruction pointed to at CS: IP. Currently CS: IP does not point to the instruction we wrote with command A.
Enter -r // to view register information enter RCS 1000 enter RIP 10 Enter -r // to view register informationCopy the code
CS: IP has pointed to the instruction “mov bx, ax”, the green line also shows the corresponding content
Enter r to view the register enter T to execute the command enter T to execute the commandCopy the code
The blue arrow indicates the change of BX value after each command is executed, and the red arrow indicates the change of IP offset value to the next command after each command is executed.
The first command, “mov bx, ax”, moves the value of ax to bx, which changes from 0000 to 1200. The second command, “mo bl, ah”, moves al(the upper 8 bits of AX) to bl(the lower 8 bits of Bx), al=12, so bx changes from 1200 to 1212.
More assembly command explanations
With the help of the Debug window, you can observe the execution of the CPU by viewing the format semantics of the assembly command. Now, let’s take out some special points to make the observation a little smoother.
Reads and writes to memory units
Observe the MOV command format — “MOV register memory address”, is the contents of the memory address written to the register, such as the format example like “mov ax [2]” form. As mentioned earlier, the segment register DS is used to point to the address of the data segment, so in this example, a word unit is stored in ax from the memory address starting with DS: 2. It is not difficult to know that the number in [2] refers to the offset address.
Now, we implement this example:
// Write the assembly command to 10020H
a1000:20
mov ax, [2]
// make CS: IP point to this command
rcs
1000
rip
20
// Check memory usage
d1000:00 2f
Copy the code
The machine code of the corresponding command has been written to the target memory address.
// The purpose is to read the value from 10002H, the corresponding word unit content is 68 65
rds
1000
// Check the register information before the command is executed
-r
// Execute the command
-t
Copy the code
The word cell memory 6865 at destination address 1000:2 has been written to register AX, ax=6568. Don’t forget that in word cell storage, the high content is in the high memory address and the low content is in the low memory address.
Stack operation
SS: SP can specify a memory area to be used as the stack, and as the stack grows, it grows toward the lower address.
// Write stack operation command a 1000:30 push bx push ax pop ax // check the contents of write d 1000:3fCopy the code
The command has been written starting at 10030H.
// The next command points to 1000:30
rcs
1000
rip
30
// Set 10040H ~ 1004FH as stack space, remember that SP points to the top of the stack
rss
1004
rsp
f
// Perform push bx
t
// Execute push ax
t
Copy the code
The two commands write the values of bx and AX to the stack, respectively
-r // Check the contents of the stack -d1003:0 2f // Run the command -t // check the contents of the stack -d1003:0 2fCopy the code
After the push operation is executed, the contents of bx and AX are stored on the stack, because two word units are stored, the top position SP is changed from F to B. After pop Bx is removed from the stack, the top content of the stack is successfully moved to BX, and SP also moves a word unit position from B to D
Other Matters needing attention
Although we can overwrite the register value directly in the Debug window by using the r command. However, commands in the format of “MOV register data” can only directly overwrite the values of general registers, and some registers cannot. Therefore, when the assembly command is executed to rewrite the value of other registers, it is necessary to rewrite by “MOV General register, data source” -> “MOV other register general register”.
A reserved exercise
With the above knowledge, you can query the semantics of the assembly command and observe its execution. Now, if you are interested, you can check whether the situation of each step meets our expectations by using an example to prove whether you have read the article.
// Clear a clean memory area, convenient to observe the memory condition
e 1000:50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e 1000:60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e 1000:70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e 1000:80 12 34 56 78 1A 2B 3C 4D 00 00 00 00 00 00
// Enter the instruction
a 1000:50
mov ax, 1080
mov ds, ax
mov ax, 0000
mov sp, f
mov ax, [0]
add ax, [2]
mov bx, [4]
add bx, [6]
push ax
push bx
pop ax
pop bx
push [4]
push [6]
// Modify the necessary register pointing
CS -> 1000 , IP -> 50, SS -> 1070
Copy the code
Note:
- The format of Add is similar to mov in that it adds the contents of the two and exists in the position of the former.
- If the contents of the storage unit are added, the highest bit overflow is not carried oh, directly discarded.
conclusion
This article mainly explains the basic knowledge that should be mastered before learning assembly language, including:
- In a computer, information is stored in binary zeros and ones, which can be converted to higher base bits, such as hexadecimal, for easy reading.
- Each storage unit consists of 8 binaries, or 1 byte. Each word unit contains two storage units.
- Each machine instruction has a corresponding assembly instruction to represent, more close to human reading, writing habits.
- The CPU’s job is to complete y=f(x). In order to complete the job, a storage device is needed to record programs, data sources, and calculation results. In the calculation process, registers are needed to store the necessary auxiliary information in the process.
- Each register occupies 16 bits, and the general register can be further divided into two 8-bit registers.
- CPU and memory exchange information through the bus, the information transmission capacity of the bus depends on the line width.
- With the Debug window commands, you can view memory and register information to observe the CPU calculation process.
Currently, if you want to experiment first, other commands can be queried by yourself.
Most of the knowledge comes from the fourth edition of Assembly Language — Wang Shuang, chapters 1-3
reference
Assembly Language fourth edition — Wang Shuang, chapter 1-3