The origin of everything is the Turing machine

Turing machine is mainly composed of data storage unit, control unit, operation unit and a read and write external data first several parts. Turing machine work needs to have a paper tape, the paper tape is full of grid, you can record characters on the grid, characters can be divided into data characters and instruction characters; The tape passed through the Turing machine and moved forward; Turing on the read/write head in turn read the characters on the paper tape grid, according to the control unit to distinguish read character belongs to the data or instructions, when read data characters, the characters stored in the storage unit, when read instructions characters, computing unit will retrieve data in a storage unit and corresponding operation, and the results by the read/write head into the next grid paper tape.

The basic mode of operation of a Turing machine is the same as that of modern computers. Data and instructions are stored in memory (tape and memory cell), which the processor reads and calculates. In a computer, the CPU is used to calculate instructions and store data in memory. We often hear about disks, SSDS and memory. Instead of reading and executing instructions directly from these stores, the CPU uses a tiered cache strategy.

Memory grading

Why grading is needed

We are more familiar with the disk, data can be saved after power failure, and the disk storage space is large, usually can have T capacity, but its data read speed is very slow; Memory reads nearly 100 times faster than disk, but still “slow” compared to CPU execution. In addition, the memory is pressed on the motherboard, data is transmitted to the CPU through the circuit board, data transmission time relative to the CPU execution speed is also not negligible.

The smaller the storage volume, the storage capacity will be limited; The faster the read/write speed, the higher the power consumption and cost. Second, the further away the memory is from the CPU, the greater the data transfer. So, for now, using a single memory can’t keep the data in memory up to the processing speed of the CPU.

The scheme used by the computer is to classify the memory, the higher the FREQUENCY of CPU use data, stored in the faster read and write speed, closer to the CPU memory (cache); The data that is used less frequently is stored in the memory that has a low read/write speed, a long distance from the CPU, a large storage capacity, and a low cost. In this way, when the CPU reads data, it reads it directly from the cache, and then reads it from farther away memory.

The feasibility of hierarchical cache scheme lies in the existence of the principle of locality in computers. Just think about the code programs we write at ordinary times, the most used for operation is the for loop, and then calculate and read the several variables defined. Therefore, when the CPU executes a program, there are several data areas that are read and written at high frequencies. So, you can cache the data in these “hot spots” so that the next read will be much faster. According to statistics, the memory cache hit rate can reach 95%, that is, only 5% of the data will penetrate into memory.

Memory grading strategy

Generally, storage is divided into the following levels:

  • register

  • CPU cache:

  • L1-cache

  • L2-cache

  • L3-cache

  • memory

  • Disk/SSD

Disk/SSD

SSD/ disk is the furthest away from the CPU and the slowest type of memory. The advantage is that the cost is lower and the data will survive the power failure. SSDS are solid-state drives that have a similar structure to memory and are 10 to 1000 times slower than memory. Disk reads are slower, about 100W times slower than memory, and have been slowly replaced by SSDS.

memory

Memory is inserted in the motherboard, with a distance from the CPU, CPU through the motherboard bus to read the data in memory, the cost is slightly more expensive than the disk, but read faster than the disk, the speed is about 200-300 CPU cycles; In terms of capacity, personal computers typically have 8-16 GIGABytes of memory, and servers can have several terabytes of memory.

CPU cycle: An instruction can be divided into several stages, such as fetch instruction, execute instruction, etc. The time required to complete each stage is the CPU cycle.Copy the code

CPU cache

CPU caches are classified into L1 (level-1 cache), L2 (level-2 cache), and L3 (level-3 cache). Each CPU core has its own L1 and L2 caches. Multiple cores of a CPU share one L3 cache.

Distance from the CPU: L1 < L2 < L3

Capacity: L1 (tens to hundreds of KB) <L2 (hundreds of KB to several MB) < L3 (several MB to dozens of MB)

Read/write speed: L1 (2-4CPU cycles) > L2 (10-20CPU cycles) > L3 (20-60CPU cycles)

(The L1 cache divides instruction areas and data areas, as explained below)

Note that the minimum unit of each cache in the CPU cache is a block of memory, not a variable; CPU cache and memory can be mapped in various ways, such as cache line number = memory page number mod cache total line number. In this way, the memory page number of the address is calculated according to the memory address, and then the cache line number is calculated based on the mapping relationship. If the memory page number exists in the cache, the data can be obtained directly. If the memory page number does not exist, the data can be obtained in the memory.

register

Register is the place where THE CPU actually carries out instruction reading and writing. It is the closest memory to the CPU and the fastest reading and writing speed. It can complete reading and writing in half a CPU cycle. There are dozens to hundreds of registers in a CPU. Each register is small and can only store a certain amount of data (4-8 bytes).

Most registers in 32-bit cpus can hold up to 4 bytes. Most registers in 64-bit cpus can hold up to 8 bytesCopy the code

Registers can be divided into several categories according to different uses. In order to facilitate the learning of the following instruction execution process, we first understand the following categories:

  • General purpose register: Used to store program parameter data.

  • Instruction register: Each instruction executed by the CPU is read from memory into an instruction register and then read and executed by the CPU.

  • Instruction pointer register: store the MEMORY address of the CPU’s next instruction to be executed. The CPU reads the instruction into the instruction register according to the instruction pointer register’s memory address. The instruction pointer register also becomes an IP register.

  • Segment register: In order to access more physical space, the CPU locates a physical memory address by base address + offset. The segment register stores base address information. CS is a segment register that holds the address of the instruction. Together with the IP register, it locates the address of the instruction in memory.

    Suppose a register stores a maximum of 4 bytes of data, 4 bytes = 4*8=32 bits, and the value represents the range: 0~(2^32) -1, conversion unit is 4G, that is, this register can find the maximum range of 0-4G address, but we mentioned before the memory capacity of several T, so, directly through a register can not represent the full range of memory address. The addressing mode of “base address + offset address = physical address” can greatly expand the memory addressing capability. For example, the 32-bit base address is moved 32 bits to the left and the 32-bit offset address represents the 64-bit memory address (16EiB). It is important to note that the computer’s final addressing range is determined by the address bus described below.

Bus – The bridge between the CPU and the outside world

According to the storage hierarchy above, data is first loaded from disk into memory and then read into caches and registers inside the CPU, which are read and processed by the CPU. Data reads and writes between the CPU and the CPU cache are completed inside the CPU, while CPU reads and writes to the memory are completed through the bus on the mainboard.

A bus can be regarded as a collection of wires that transmit data by controlling the voltage of the wires, which is 1 for high voltage and 0 for low voltage.

According to the different transmission information, the bus is divided into address bus, data bus and control bus

Consider that the read instruction “read data to location 3 in memory” contains several information:

  • The memory location of the operation is 3 (address information)

  • Operation commands are read commands (control information)

  • Data transmission (data information)

The three types of buses are responsible for the transmission of corresponding information respectively. The CPU transmits the memory address information to the memory through the address bus. Through the control bus issued memory read command; Memory passes data from the data bus to the CPU.

Image from Assembly Language (3rd edition)

The address bus

Before we talk about address bus, let’s talk about partition of memory addresses. The memory is divided into several storage units, which are numbered from zero and can be regarded as the location of the storage unit in the memory.

Each storage unit consists of eight bits, that is, one byte of data. Suppose a memory has 128 storage units and can store 128 bytes.

The CPU uses the address bus to specify storage units. The number of lines in the address bus determines the range of memory that can be addressed. For example, a CPU with a 16-root address bus can look for up to 2 ^ 16 memory units.

Assuming a 16-bit CPU has 20 address buses, how can a 16-bit CPU give 20 bit addresses at once?

In fact, the answer has been given before, the CPU will use the “base address” + “offset address” method to synthesize a 20-bit address.

Image from Assembly Language (3rd edition)

The data bus

The CPU transmits data to the memory or other devices through the data bus. The width of the data bus (the total number of lines) determines the data transmission speed between the CPU and the outside world. Eight data buses can transfer one byte (8bit) of data at a time, and 16 data buses can transfer two bytes (16bit) at a time.

Control bus

CPU control of external devices is carried out through the control bus, how many control bus, means that the CPU has many kinds of control of external devices, so the width of the control bus determines the CPU control ability of external devices.

Instruction execution

Now that we know about the various stores and buses, let’s look at how programs are loaded from disk into memory and then executed by the CPU.

We write programs that need to be translated by the compiler into instructions that the CPU understands, a process called instruction construction. When the program starts, the instructions and data of the program are stored in two memory segments. At the same time, the PC pointer (IP register + CS register) points to the starting address of the instruction segment (that is, the starting address is assigned to the PC pointer), indicating that the CPU will read the instructions in memory from this address and execute them.

Command parsing

The instruction is read into the instruction register first, and when the CPU takes out and executes it, it needs to parse the instruction first.

As we all know, the contents of memory are binary (we write the above instructions in hexadecimal), and the CPU reads the instructions to be executed, and parses the binary instructions first. Take “0x8C400104” as an example and split it into binary:

The above instructions are divided into three parts: opcode, register number and memory address:

  • The six bits on the left are called the opcode. “10011” indicates the load instruction.

  • The middle four bits specify the register number. 0001 indicates the R1 register.

  • The last 22 bits indicate the memory address to operate on.

Therefore, this instruction refers to loading the contents of the specified memory address into register R1.

To summarize, when the program executes:

  1. The instructions and data of the program are stored in the memory instruction segment and data segment respectively, and the PC pointer refers to the starting address of the instruction segment.

  2. The CPU reads the instructions that the PC pointer points to and stores them in an instruction register.

  3. The CPU uses the address bus to specify the memory address to be accessed. Send “read instructions” through the control bus.

  4. The memory passes data to the CPU via the data bus, and the CPU stores this data in an instruction register.

  5. The CPU parses the instruction contents in the instruction register.

  6. The CPU executes instructions through the arithmetic unit and the control unit.

  7. The PC pointer increments to the memory address of the next instruction.

So, address, decode, execute, this is the execution cycle of an instruction, and all instructions are executed exactly in this order

Instructions to proofread

The CPU executes instructions very fast, but the memory reads and writes are very slow, so if the instructions are read from memory and then executed, the instruction execution cycle will become slow.

As we learned earlier, the CPU also has a three-level cache, so we can read multiple instructions in memory to L1 cache at a time, so that the addressing speed can keep up with the CPU instruction execution speed.

Meanwhile, in order to avoid data cache overwriting instruction cache affecting instruction execution, L1 cache can be divided into instruction area and data area.

Do L2 and L3 need to be divided into instruction and data areas? This is not necessary, as L2 and L3 do not need to assist with command prefetch.

How to execute commands faster

In order to execute instructions faster, we need to use the CPU’s instruction pipeline technology.

In the previous process, the operation unit was idle during decoding. In order to improve the instruction processing speed, it was necessary to keep the operation unit in operation. We can use the CPU’s pipelining technique to address the second instruction as soon as the first instruction completes its decoding, and so on, so that after the last instruction completes its execution, the next instruction completes its decoding and can be executed.

Pictures from the Internet

In one sentence

  1. The program is stored in memory, and the CPU reads the instructions and performs the calculations.

  2. Due to the CPU instruction execution speed is very fast, currently no memory can simultaneously meet the requirements of fast read and write speed, low heat dissipation, low energy consumption, large capacity, so the memory hierarchical strategy, using multi-level cache to match the CPU execution speed.

  3. Data transfer between CPU and memory is done through the bus on the motherboard. Through the address bus to operate memory address information to the memory; Issue command types through the control bus; Memory passes data from the data bus to the CPU.

  4. Registers are the memory where the CPU reads instruction and parameter data directly. Registers can be divided into several classes according to their use. For data, the data is read to the general purpose register, from which the CPU reads and writes data. For instructions, THE CPU will first obtain instructions according to the CS segment register and instruction pointer register point to the instruction memory address, and the instruction is stored in the instruction register, and then the CPU reads and executes from the instruction register.

  5. Instruction execution includes addressing, decoding, and execution. To prevent the CPU from obtaining an instruction from memory each time, you can preread the instruction to CPU L1-cache. At the same time, pipeline technology can be used to keep the CPU’s computing unit in operation.

Write at the end (pay attention, don’t get lost)

“White piao is not good, creation is not easy”, I hope friends can like the comments follow sanlian 🙏🙏

Finally, amway again a wave of public accounts “yongbu ingenuity”, as far as possible to speak in plain English about technology.