The preface

Some time ago, I continued to write a series of technical stories about the CPU bottom ten articles, many readers asked me to write about the CPU register.

Registers are too complex to write a story about. After a long time, this article will discuss in detail the complexity of registers in x86/ X64 cpus.

Long warning, fast speed, please fasten your seat belt ~ take-off ~

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/273f10c688ab472b94806e9bba4bd0ee?from=pc)

Since 1946, the world’s first general purpose electronic computer ENIAC was born under the leadership of Von Neumann, computer technology has been developed for more than 70 years.

From the original dedicated to the mathematical calculation of the behemoth, to the later mainframe server era, from the booming development of personal computer technology, to the Internet wave swept the world, and then to the mobile Internet, cloud computing rapidly changing at present, the computer changes in different forms, everywhere.

Over the past 70 years, countless programming languages have emerged, and countless applications have been developed through these languages.

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p1-tt.byteimg.com/origin/pgc-image/e28b73c43dbf437da1068498fc44c67e.png?from=pc)

But no matter what kind of application, what kind of programming language, the final program logic is to be delivered to the CPU to implement (of course, there are some loose here, in addition to the CPU, there are coprocessors, Gpus and so on). So understanding and learning the principles of CPU are of great benefit to the tamp of basic knowledge of the computer.

In the long course of more than 70 years, many architecture cpus have emerged.

MIPS

PowerPC

x86/x64

IA64

ARM

· · · · · ·

This article takes x86-X64 architecture, the most widely used architecture in the market, as the target, and elaborates the underlying working principle of CPU in series by learning the function of 100 registers inside it.

Through this article, you will learn:

CPU instruction execution principle

Memory addressing technology

Principle of software debugging technology

Interrupt and exception handling

The system calls

CPU multitasking technology

What is a register?

Registers are some small storage areas inside the CPU used to store data. They are used to temporarily store the data and results involved in calculation as well as some information required by CPU operation.

X86 cpus follow the complex instruction set (CISC) route, providing rich instructions for powerful functionality, while also providing a large number of registers for auxiliary functionality. This article will cover the following registers:

Universal register

Flag register

Instruction register

Segment register

Control register

Debug register

Descriptor register

Task register

MSR registers

Universal register

The first is the general register, which is the most common and basic register of the program execution code. In the process of program execution, most of the time is to operate these registers to realize the instruction function.

By generality, these registers are used for no special purpose and are left to the application to use “at will”. Notice, this is arbitrary, I put it in quotes, for some registers, the CPU has some unspoken rules, so be careful when using them.

Eax: Usually used to perform addition, the return value of the function call is also placed in this

Ebx: Data access

Ecx: Commonly used as a counter, such as for loops

Edx: edx is used to store the port number when reading or writing I/O ports

Esp: top of stack pointer pointing to the top of the stack

Ebp: A pointer to the bottom of the stack, usually in the form of an EBP + offset to locate local variables stored in the stack by a function

Esi: Address of the data source for string operations

Edi: used to store the destination address for string operations. Edi and ESI are used together to copy strings

In the X64 architecture, the general-purpose registers above are expanded to 64-bit versions, and the names have been updated. Of course, for compatibility with 32-bit mode programs, it is still accessible using the name above, equivalent to accessing the lower 32 bits of a 64-bit register.

rax rbx rcx rdx rsp rbp rsi rdi

In addition to extending the existing general-purpose registers, the X64 architecture introduces eight new general-purpose registers:

r8-r15

In the old 32-bit era, function calls were called with fewer general-purpose registers, and arguments were mostly passed through the thread stack (there were registers, too, such as the famous C++ this pointer that used ecx registers, but not many registers were available).

Enter the X64 era, register resources rich, parameter transfer is the vast majority of registers to pass. The advantage of register parameter passing is that it is fast and reduces the number of reads and writes to memory.

Of course, it is not up to the programming language to decide whether to use the stack or the register to pass parameters, but it is up to the compiler when compiling the CPU instructions. If the compiler must use the thread stack to pass parameters on the X64 architecture CPU, it is also ok. This is not aware of the high-level language.

Flag register

The flag register, which contains many flag bits, records a series of states during the execution of instructions by the CPU. These flags are mostly set and modified by the CPU automatically:

CF Carry flag

PF parity flag

ZF zero mark

SF symbol

OF complement overflow flag

TF tracking flag

IF interrupt flag

· · · · · ·

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/61770fc7c7d64e648f3d6fc270547ac3.png?from=pc)

Under the X64 architecture, the original EFlags register is upgraded to 64-bit RFlags, but its higher 32 bits are not new and reserved for future use.

Instruction register

Eip, instruction register can be said to be the most important CPU registers, it refers to the instructions to be executed next address deposited CPU work is continuously remove it points to instruction, and then perform this instruction, instruction register at the same time continue to point to an instruction below, so repeated, this is the basic CPU work everyday.

In a bug attack, the hacker goes to great lengths to change the address of the instruction register so that malicious code can be executed.

Similarly, under the X64 architecture, the 32-bit EIP is upgraded to the 64-bit RIP register.

Segment register

Segment registers are closely related to CPU memory addressing techniques.

Back in the 16-bit 8086CPU era, memory resources were precious, and cpus used segmented memory addressing technology:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/2ba520c77daf4b5fad9de0ddc313b6a4.png?from=pc)

The addressable range of 16-bit registers is 64KB. By introducing the concept of segments, the memory space is divided into different regions: segments, which are addressed by segment base address + offset segment within segment.

So where is the base address of the segment stored? The 8086CPU sets up several segment registers to hold the base address of the segment, this is the origin of the segment register segment.

Segment registers are also 16 bits.

Segment registers have the following six, the first four were introduced in the early 16-bit mode, in the 32-bit era, two new segment registers FS and GS.

Cs: code segment

Ds: data segment

Ss: the stack segment

Es: extension period

Fs: data segment

Gs: data segment

What is stored in the segment register is closely related to the memory addressing mode in which the CPU is currently operating.

When the CPU is in 16-bit real address mode, the segment register stores the base address of the segment. When addressing, the contents of the segment register are moved 4 bits to the left (multiplied by 16) to obtain the base address of the segment + the offset within the segment to obtain the final address.

When the CPU is in protected mode, the segment register stores the segment selector, which indicates which segment the current segment register “points to” instead of the segment base address.

The segment register stores not the direct address of the memory segment, but the segment selector, which has the following structure:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/7a0aad37b798491d9a977e962515fd01.png?from=pc)

The contents of the 16-bit segment register are divided into three fields:

PRL: The privilege request level is called ring0-Ring3 privilege levels.

TI: 0 indicates that the global descriptor table GDT is used, and 1 indicates that the local descriptor table LDT is used.

Index: This is the Index value of an entry in a table called the memory descriptor table. Each entry in the table describes a memory segment.

Here mentioned two tables, the global descriptor table GDT and local descriptor table LDT, introduction about the two tables, below is the descriptor register again, here just need to know, this is the CPU supports sectional form memory management need, in the memory, each item in the form is a descriptor, recorded the information of a memory block.

Segment registers and segment descriptors in protected mode to the last memory segment are linked together as shown below:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/92dea583919b4fff9f0f2d1946a19787.png?from=pc)

General register, segment register, flag register, instruction register, these four groups of registers together constitute a basic instruction execution environment, the context of a thread is basically these registers, when the execution thread switch, is to modify their content.

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/dddbc8f698594d9abec5d58d6f9fbb44.png?from=pc)

Control register

The control register is a very important set of registers in the CPU. We know that the EFlags register records a series of key information about the current running thread.

Where does the CPU keep some of its own key information during its run? The answer is the control register!

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/e661f2b4c3684fafbe86bca596b5c253.png?from=pc)

32-bit CPUS have cr0-CR4, 5 control registers, and 64-bit cpus have cr8 added. They each have different functions, but both store important information about how the CPU works:

Cr0: Stores CPU control flags and operating status

Cr1: reserved

Cr2: Save the address that caused the error when a page error occurs

Cr3: Stores important information about the virtual address space of the current process – the page directory address

Cr4: Also stores information about CPU work and current human tasks

Cr8: new 64-bit extension use

CR0 is particularly important because it contains so much important CPU information that it deserves a separate look:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p1-tt.byteimg.com/origin/pgc-image/091a8c48996745de8767960433d16a21.png?from=pc)

Some important flag bits have the following meanings:

PG: Whether memory paging is enabled

AM: indicates whether to enable automatic memory alignment check

WP: Whether memory write protection is enabled. If enabled, an exception will be triggered when attempting to write to a read-only page. This mechanism is often used for copy-on-write functionality

PE: Indicates whether to enable the protection mode

In addition to CR0, another register of interest is CR3, which holds the page directory address of the virtual address space used by the current process. It is the top-level baton in the translation of the virtual address. When the process space is switched, CR3 will also switch synchronously.

Debug register

Inside the x86/x64CPU, there is also a set of registers that support software debugging.

Debugging, for our programmers is routine, necessary skills. But have you ever wondered why your program can be debugged?

The program can be debugged, and the key is that it can be interrupted and resumed, where we set the breakpoint. So how does a program stop when it hits a breakpoint?

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/93f25f131c124ddca11b53f8c1a6f034?from=pc)

This is easy to do for high-level languages that interpret execution (PHP, Python, JavaScript) or virtual machine execution (Java) because their execution is under the control of the interpreter/virtual machine.

For “low-level” programming languages like C and C++, the program code is compiled directly into the CPU’s machine instructions for execution, which requires the CPU to provide debugging support.

On x86/ X64, the usual breakpoint, that is, the program stops at a certain point, is implemented using a soft interrupt instruction: int 3.

Note that int is not an integer in a high-level language. Instead, it means an assembly instruction. Int 3 means an interrupt with vector number 3.

When we use the debugger to set a breakpoint, the debugger will replace the original instruction at the corresponding position with an int 3 instruction with the machine code 0xCC. This action is transparent to us, we still see the original instruction in the debugger, but the actual memory is not the original instruction.

Incidentally, two 0xCC is the code for the Chinese character “hot”. In some compilers, the thread stack will be filled with a large number of 0xCC. If the program fails, we will often see a lot of hot.

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p6-tt.byteimg.com/origin/pgc-image/f0264a4550b743b589525ff5834314b5?from=pc)

When the INT 3 instruction is executed, the CPU automatically triggers an interrupt handler (although it is not really an interrupt), and the CPU executes the interrupt handler function by fetching item 3 of the IDT descriptor pointed to by the IDTR register.

The table of interrupt descriptors is scheduled in advance when the operating system boots up, so the operating system interrupt handlers step in to handle the event after executing this command.

In simple terms, the operating system freezes the process that triggered the event and sends the event to the debugger, which then knows that the target process triggered the breakpoint. At this time, we programmers can debug the target process through the debugger’S UI interface or command line debugging interface, view the stack, view memory, variables as you like.

If we continue, the debugger will restore the int 3 instruction and tell the operating system: I’m done, unfreeze the target process.

The above briefly describes the implementation principle of ordinary breakpoints. Now consider a scenario: we find a bug that the value of a global integer variable keeps changing for no reason, but you find that there are many threads and many functions that can change this variable. You want to find out who is doing this and what to do?

You need a new kind of breakpoint: a hardware breakpoint.

This is where the main character of this section debugs the register.

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/1cf74a43fd8044c6a2ad3039204114f1.png?from=pc)

The x86 CPU provides eight debug registers DR0 to DR7.

DR0 to DR3: These are the four registers used to store addresses

DR4 to DR5: these two are special, controlled by the flag bit DE bit in the CR4 register mentioned above. If the DE bit of CR4 is 1, DR4 and DR5 are not accessible, and the access will trigger an exception. If the DE bit of CR4 is 0, DR4 and DR5 will become aliases of DR6 and DR7, which is equivalent to a soft link. This is done to preserve DR4 and DR5 for future debugging extensions.

DR6: This register stores some state information after a hardware breakpoint has been triggered

DR7: Debug control register, which records the interrupt mode (whether to read, write, or execute the address), data length (1/2/4 bytes) and scope of the address stored in the four registers DR0-DR3

After a hardware breakpoint is set through the debugger interface, the CPU automatically interrupts the code execution process if the conditions are met.

To answer the previous question, to find out who secretly modified the global integer variable, all you need to do is set a hardware write breakpoint with the debugger.

Descriptor register

A descriptor is a data structure that records information and ‘describes’ something. A descriptor table is formed by grouping many descriptors together. Use another register to point to the table, which is the descriptor register.

In x86/ X64 cpus, there are three very important descriptor registers that store three addresses and point to three very important descriptor tables.

GDTR: Global descriptor register. As mentioned earlier, the CPU now uses a combination of segments and paging memory management. What are the total segments in the system? This is stored in a table called the Global Descriptor Table (GDT), which is pointed to with the GDTR register. Each entry in this table describes information about a memory segment.

LDTR: Local descriptor list register. This register, like GDTR above, also points to a segment descriptor list (LDT). The difference is that the GDT is globally unique, while the LDT is used locally and can be created multiple times, switching between task segments (as described in the task registers below).

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p1-tt.byteimg.com/origin/pgc-image/a29690895a9944139ce31994395f4e2b.png?from=pc)

Segment descriptors are entries in GDT and LDT that describe information about a segment of memory. Their structure is as follows:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/f2b12caa3c224edfb026ea8e3bf17723.png?from=pc)

An entry occupies 8 bytes (32-bit CPU) and stores information about a memory segment: base address, size, permissions, type, and so on.

In addition to these two segment descriptor registers, there is another very important descriptor register:

Idtr: register interrupt descriptor table, pointing to the interrupt descriptor table IDT, the table of each item is an interrupt descriptor, when hard interrupt, abnormal happened during the implementation of CPU, soft interrupt, will automatically from this location in the table corresponding table, recorded the interrupt, abnormal occurs inside the where to perform processing function.

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p1-tt.byteimg.com/origin/pgc-image/6bf9962bb8854f5cbc7028c5afeb58da.png?from=pc)

The entry in IDT is called Gate, which means Gate in Chinese, because it is the main entry point for an application into the kernel. Although the name of the table is interrupt descriptor table, not all interrupt descriptors are stored in the table. There are three types of entries in IDT, corresponding to three types of gates:

Task door

The trap door

The interrupt gate

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p1-tt.byteimg.com/origin/pgc-image/44ddf1656b534952884bb907dc713f22.png?from=pc)

All three descriptors store the address of where to handle the interrupt/exception/task. Three kinds of gates have different uses, among which interrupt gate is the true sense of interrupt, and like the debugging instruction int 3 mentioned above and the old system call instruction int 2E /int 80 are trap doors. Task gate is used less, to understand the task gate, first understand the task register.

Task register

Modern operating systems all support multi-task concurrent operation. In order to comply with the trend of The Times, x86 CPU provides a special mechanism to support multi-task switching on the hardware level, which is reflected in two aspects:

A dedicated register, task register TR, is set up inside the CPU, which points to the currently running task.

TSS defines a data structure that describes a task, which stores the context of a task (a set of register values).

The idea of x86CPU is that each task corresponds to a TSS, and then the TR register points to the current task. When performing the task switch, you can modify the TR register pointing, which is the multi-task switch mechanism at the hardware level.

This is a good idea, but the reality is that the main operating systems, including Linux and Windows, do not use this mechanism for thread switching, but use their own software to implement multi-thread switching.

So, most of the time, the TR register points to a fixed point, and the TR register does not change even if the thread switches.

Note that I’m talking about the vast majority of cases, not death. While the operating system does not rely on TSS for multitasking switching, this does not mean that the TSS operating system provided by the CPU is not used at all. There are still some special cases where exception handling uses TSS to perform the processing.

The following figure shows the overall picture of the control register, descriptor register and task register:

! [a sigh of relief after 45 registers, CPU core technology full disclosure] (https://p3-tt.byteimg.com/origin/pgc-image/c014f5c2090841f996598a6cfd96b3e5.png?from=pc)

Model specific register

From x86 cpus after the 80486, a new set of registers was added internally, collectively referred to as the MSR registers, which means that unlike the registers listed above, these registers are not fixed, and may change from version to version. These registers are mainly used to support some new functions.

As the x86CPU is updated, more and more MSR registers are changed, but at the same time, some MSR registers are solidified and become the unchanged part of the change. This part of MSR register is called Architected MSR by Intel. In the name, the prefix IA32 is added uniformly.

Here are three representative MSRS:

IA32_SYSENTER_CS

IA32_SYSENTER_ESP

IA32_SYSENTER_EIP

These three MSR registers are used to implement fast system calls.

On early x86 cpus, the system calls relied on soft interrupt implementation, similar to the int 3 instructions used in previous debugging. On Windows, the system calls used int 2E, and on Linux, int 80.

After all, soft interrupt is relatively slow, because the implementation of soft interrupt will need to memory lookup table, through IDTR location to IDT, and then take out the function to execute.

A system call is a frequently triggered action, which can have an impact on performance. After entering the Pentium era, the above three MSR registers were added, which respectively stored the segment registers, stack top and function address required by the kernel system call entry function after the execution of the system call. There was no need for the memory lookup table. Fast system calls also provide dedicated CPU instructions sysenter/sysexit for initiating and exiting system calls.

On 64-bit, this pair of instructions is upgraded to syscall/sysret.

conclusion

These are all the registers to be introduced, it needs to be noted that this is not all the registers of x86CPU, in addition to these, there are XMM, MMX, FPU floating point arithmetic and other registers.

This article takes x86/ X64 architecture CPU as the target, through the elaboration of the CPU internal registers, talks about the CPU execution code mechanism, memory addressing technology, interrupt and exception handling, multi-task management, system call, debugging principle and other basic computer knowledge.

Author: Regulus wind

Source: Programming Technology Universe