The background,
After playing with Linux for a while, do you realize that Linux is just an operating system, similar to Windows, except Windows has a graphical interface and Linux is mostly just a black window? Windows is mostly for individuals, while Linux is mostly for servers because of its superior performance.
The server is not intelligent, and Linux is not intelligent either. Therefore, the operation and maintenance personnel need to combine the server hardware and Linux features through debugging, so as to achieve the maximum performance output. This is my understanding of Linux performance tuning. Here is an overview of a server and Linux system together:
Two, basic knowledge
Tuning is not just a matter of tuning, but of understanding how Linux handles tasks and interacts with hardware resources. Performance tuning is based on a deep understanding of hardware resources, operating systems, and applications. Here are some of the areas where performance matters most in Linux.
Linux process management
- 3.1 What is a process?
A process is an instance of execution in a processor, and the Linux kernel schedules resources to meet the needs of the process. My understanding is that the application is to store data on the hard disk, through the program portal to load the application into memory after running, generated a process (running program). Processes use various resources, such as CPU, keyboard and mouse, processor, hard disk, etc., and these tasks are scheduled by the Linux kernel to meet the cooperation and competition among various processes.
Linux process management is similar to Unix process management, including process scheduling, interrupt handling, signals, process priority, process switching, process status, process memory, and so on
All processes running on Linux are managed by a task_struct structure, also known as the process descriptor. The process descriptor contains all the information a process needs to run, such as the process ID, process properties, and resources to build the process.
The following figure shows an overview of the structure of process information:
- 3.2 Process life cycle
Each process has its own life cycle, such as creation, execution, termination, and deletion. These phases are repeated thousands of times over the course of the system. Therefore, the process lifecycle is important from a performance perspective.
The following diagram shows the life cycle of a typical process:
When a process creates a new process, the process that created the process (the parent process) uses a system call called fork(). When fork() is called, it gets a process descriptor for the newly created process (child process) and sets the new process ID. Copy the parent process’s process descriptor to the child process. Instead of copying the address space of the parent process, the parent process uses the same address space.
The exec() system call copies the new program into the address space of the child process. Since the same address space is shared, data written to the new process raises page error exceptions. At this point, the kernel assigns a new physical page to the child process. This delayed operation is called Copy On Write. The child process usually executes a different program than the parent process and executes its own program. This avoids unnecessary overhead because copying the entire address space is slow and inefficient, and consumes a lot of processor time and resources.
When the program completes execution, the child process terminates with the exit() system call. Exit () frees most of the process’s data structure and notifies the parent process of the termination. In this case, the child process is called a zombie process. The child process is not completely cleared until the parent knows it has terminated via the wait() system call. Once the parent knows that the child has died, it clears all data structures and process descriptors for the child.
- 3.3 the thread
When it comes to processes, threads have to be mentioned. A thread is a unit of execution generated in a single process. Multiple threads run concurrently in the same process. They share memory, address space, open files, and so on. You also have access to the same application data set. Threads are also known as Light Weight processes. Because threads share resources, threads cannot simultaneously change the resources they share. Mutex, locking, serialization, and so on are implemented by the user application.
From a performance standpoint, creating a thread is less costly than creating a process, because no resources need to be replicated. On the other hand, processes and threads have similar behavior in terms of scheduling. The kernel handles them in a similar way.
Here is a simple comparison of processes and threads:
In current Linux implementations, threads are provided by the POSIX (Portable Operating System Interface for UNIX) compatible library (PThread). Linux supports multithreading.
- 3.4 Process Priority and NICE Level
Process priority is determined by dynamic priority and static priority, which is the number that determines the order in which a process executes in the CPU. A higher priority process has a greater chance of being executed by the processor.
Depending on the behavior of the process, the kernel uses heuristic algorithms to turn dynamic priority on or off. The nice level allows you to change the static priority of a process directly, and a process with a higher static priority gets a longer slice of time (the time slice is the process’s execution time in the processor).
Linux supports nice levels from 19 (lowest priority) to -20 (highest priority), with the default being just 0. Only root users can change the nice level of a process to negative (to give it higher priority).
- 3.5 Context Switching
During process execution, process information is stored in registers and caches of the processor. The data that the process stores in registers during execution is called context. In a switching process, the context of the process being processed is saved and the context of the next process to be executed is restored to the register. Context is typically stored in the process descriptor and kernel stack. Process switching is called context switching. Because the processor flushes registers and caches for the new process with each context switch, it can cause performance problems, so you should try to avoid too many context switches. The following image shows how context switching works:
- 3.6 Interruption Handling
Interrupt handling is one of the highest priority tasks. Interrupts are typically generated by I/O devices, such as network interfaces, keyboards, and disk controllers. The interrupt handler notifies the kernel of events such as keyboard input and the arrival of network frames. It tells the kernel to interrupt process execution as soon as possible because some devices need a quick response. This is a challenge to system stability. When the interrupt signal reaches the kernel, the kernel must switch the currently executing process to the new process to handle the interrupt. This means that context switches occur, and it also means that a lot of interrupts can degrade system performance.
There are two types of interrupts in Linux. Hard interrupts are interrupts generated by the device that need to be responded to, such as disk I/O interrupts, nic interrupts, and keyboard and mouse interrupts. Soft interrupts are used for task processing and can be postponed, such as TCP/IP operations and SCSI operations. Hard interrupt information can be seen in /proc/interrupts.
- 3.7 Process Status
TASK_RUNNING The state in which a process is executing on the CPU or waiting to run in a run queue.
The TASK_STOPPED process is in this state when it is suspended due to a specific signal (such as SIGINT, SIGSTOP), waiting for a recovery signal (such as SINCONT).
TASK_INTERRUPTIBLE In this state, the process is suspended and waits for a specific condition. If the process is in TASK_INTERRUPTIBLE state and receives a stop signal, the process state changes and the operation is interrupted. A typical example of TASK_INTERRUPTIBLE is waiting for a keyboard interrupt.
TASK_UNINTERRUPTIBLE Similar to TASK_INTERRUPTIBLE. When a process is in a TASK_INTERRUPTIBLE state it can be interrupted, sending a signal to the TASK_UNINTERRUPTIBLE does not respond. The most typical example of TASK_UNINTERRUPTIBLE is when a process waits for disk I/O operations.
After a TASK_ZOMBIE process exits using the exit() system call, the parent process should know that the process is terminated. In the TASK_ZOMBIE state, the process is waiting for the parent to receive notification and release all data structures.
The relationship between them is shown below:
Zombie progression:
When a process has been signaled to terminate, it normally has some time to complete all its tasks (such as closing open files) before it is completely finished. In this very short normal slice of time, the process is zombie.
When a process has completed all of its shutdown operations, it reports to the parent that it is terminating. Sometimes, a zombie process cannot terminate itself, and in that state, it displays a Z (zombie) state.
Because it is already dead, it is impossible to kill such a process with the kill command. If you cannot get rid of the zombie process, you can kill the parent of the zombie process, so that the zombie process will also disappear. And then, if the zombie’s parent is init, you don’t have to do that, because init is such an important process that you might have to reboot to get rid of the zombie.
- 3.8 Process Memory Segment
The process uses its own memory area to process tasks, and the type of task depends on the scenario and the purpose of the process. Processes have different working characteristics and different data size requirements. Processes must process data of various sizes. To meet this requirement, the Linux kernel uses a dynamic memory allocation mechanism for individual processes. The process memory allocation structure is shown as follows:
The process memory area contains the following segments:
Text: area where executable code is stored Data: A Data segment consists of the following three areas Data: stores initialized Data, such as static variables BSS: stores initialized 0 Data, Data is initialized to 0 Heap: allocates dynamic memory as needed using malloc(). The heap grows towards the high address space. Stack: This area stores local variables, function parameters, and the return address of the function. The stack grows to a lower address space.Copy the code
The address space allocation for user processes can be displayed using the pmap command. You can use the ps command to display the total segment size.
- 3.9 CPU Scheduling in Linux
The most basic function of a computer is computing. In order to achieve the computing function, there must be a way to manage computing resources, processors, and computing tasks, which are often referred to as processes and threads.
In Linux, the CPU calls processes using two priority progression groups:
1.active
2.expired
Copy the code
Because the scheduler allocates time slices in bits based on the priority and previous blocking rate of the process, the priority of the process is placed in an active array. When time slices expire, they are reassigned new time slices and placed in the Expired array. When all processes on the active array expire, the active and expired arrays are swapped and the algorithm is restarted. For general interactive processes (as opposed to real-time processes), a high-priority process will usually allocate more time slices than a low-priority process, but this does not mean that low-priority processes are not given a chance at all.
Here’s how the Linux CPU scheduler works:
Another big advantage of the new scheduler is its support for non-Uniform Memory Architecture (NUMA) and symmetric multithreaded processors, such as Intel’s hyper-threading technology. Support for NUMA ensures that load balancing will not normally occur unless a node is overburdened. This mechanism ensures that slower links are less loaded in NUMA systems. Although each process scheduled by the processor in a group is load balanced, the scheduler group is only created when the node load is too high and load balancing is required.
Four, the next section is??
With Linux process management behind us, the next section covers Linux’s memory architecture