A, processes,
When you write a program on Windows, you save it as a dot H or a dot C file, and it looks like it’s some kind of special format file, but it’s just plain old text files.
The process of tree
Since all processes fork from the parent process, there must be a parent process, which is the init process that our system starts.When parsing the Linux startup process, process 1 is /sbin/init. In centOS 7, we can see that this process is soft-linked to systemd.
sbin/init -> .. /lib/systemd/systemdCopy the code
After the system is started, init process will start many daemons to provide services for the system to run, and then start Getty to let users log in and run shell after login. All the processes started by users are run by shell, thus forming a process tree.
You can run the ps -ef command to view the processes started in the current system. You can find three types of processes.
[root@deployer ~]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 2018 ? 00:00:29 /usr/lib/systemd/systemd --system --deserialize 21
root 2 0 0 2018 ? 00:00:00 [kthreadd]
root 3 2 0 2018 ? 00:00:00 [ksoftirqd/0]
root 5 2 0 2018 ? 00:00:00 [kworker/0:0H]
root 9 2 0 2018 ? 00:00:40 [rcu_sched]
......
root 337 2 0 2018 ? 00:00:01 [kworker/3:1H]
root 380 1 0 2018 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 415 1 0 2018 ? 00:00:01 /sbin/auditd
root 498 1 0 2018 ? 00:00:03 /usr/lib/systemd/systemd-logind
......
root 852 1 0 2018 ? 00:06:25 /usr/sbin/rsyslogd -n
root 2580 1 0 2018 ? 00:00:00 /usr/sbin/sshd -D
root 29058 2 0 Jan03 ? 00:00:01 [kworker/1:2]
root 29672 2 0 Jan04 ? 00:00:09 [kworker/2:1]
root 30467 1 0 Jan06 ? 00:00:00 /usr/sbin/crond -n
root 31574 2 0 Jan08 ? 00:00:01 [kworker/u128:2]
......
root 32792 2580 0 Jan10 ? 00:00:00 sshd: root@pts/0
root 32794 32792 0 Jan10 pts/0 00:00:00 -bash
root 32901 32794 0 00:01 pts/0 00:00:00 ps -ef
Copy the code
You will notice that the PID 1 process is our init process systemd and PID 2 process is the kernel thread kthreadd, both of which we saw during kernel startup. The user mode is without brackets, and the kernel mode is with brackets.
Then the process number goes up, but you’ll see that all of the parenthesized kernel-state processes are descended from process 2. The ancestor of a user – mode process is process 1. The tty column is marked with a question mark, indicating that the service is not started by the foreground.
The parent of PTS is SSHD, the parent of bash is PTS, and the parent of ps -ef is bash. So the whole chain is clearer.
Second, the thread
Why threads?
In fact, any process has a main thread by default, even if we don’t actively create one. Threads are responsible for executing binary instructions. There are also two problems with parallel execution using processes. First, the creation process takes up too many resources. Second, communication between processes requires data to be passed around different memory Spaces that cannot be shared. In Linux, sometimes we want to separate foreground tasks from background tasks. Some tasks require immediate results. For example, if you type a character, it cannot be displayed five minutes later. Some tasks that can be performed silently, such as synchronizing data from the machine to the server, are less urgent. Therefore, the two tasks should be processed on different threads to ensure that they are not delayed.
Thread data
We subdivide the data accessed by threads into three categories.
- Local data on the thread stack, such as local variables during function execution. As mentioned earlier, function calls use the stack model, which is the same in threads. But each thread has its own stack space. The size of the stack can be viewed by using the ulimit -a command. By default, the thread stack size is 8192 (8MB). We can change it by using the command ulimit -s. The main thread has a stack space in memory, and other thread stacks have separate stacks. In order to avoid treading the stack space between threads, there are also small areas between threads that are used to isolate and protect their stack space. As soon as another thread steps into the quarantine, a segment error is raised.
- Global data shared throughout the process. Global variables, for example, are shared within a process, although they are isolated across processes.
- Thread Specific Data
Process data structure
In Linux,Whether it’s a process or a thread, in the kernel, we call it a Task.By a unified structuretask_structManage.First of all, there should be a list of all executed projects, so the Linux kernel should also have a linked list of all task_structs. Next, let’s look at what fields each task should contain.
Task ID
Each task should have an ID that uniquely identifies the task. Struct (task_struct);
pid_t pid;
pid_t tgid;
struct task_struct *group_leader;
Copy the code
You might be wondering, since it’s an ID, why does it seem so troublesome to have one that’s enough to make a unique identifier? This is because the processes and threads above get to the kernel and become tasks, which brings up two problems.
- Task presentation.
- Send commands to tasks.
So in the kernel, they are both tasks, but they should be separated. Pid is the process ID, and tgid is the Thread group ID. For any process that has only the main thread, the pid is itself, the tgid is itself, and the group_leader points to itself. However, if a process creates other threads, that changes. A thread has its own PID. Tgid is the PID of the main thread of the process, and group_leader points to the main thread of the process.
With tgid, we know whether tast_struct represents a process or a thread.
The signal processing
Task_struct ();
/* Signal handlers: */
struct signal_struct *signal;
struct sighand_struct *sighand;
sigset_t blocked;
sigset_t real_blocked;
sigset_t saved_sigmask;
struct sigpending pending;
unsigned long sas_ss_sp;
size_t sas_ss_size;
unsigned int sas_ss_flags;
Copy the code
This defines which signals are blocked (blocked), which are pending (pending), and which signals are being processed (SIGHand) through signal handlers. The result can be ignored, the process can be terminated, and so on. When sending signals, you need to distinguish between processes and threads.
Task status
In task_struct, the state of the task is related to the following variables:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
int exit_state;
unsigned int flags;
Copy the code
The possible values for state are defined in the include/ Linux /sched.h header file.
/* Used in tsk->state: */ #define TASK_RUNNING 0 #define TASK_INTERRUPTIBLE 1 #define TASK_UNINTERRUPTIBLE 2 #define __TASK_STOPPED 4 #define __TASK_TRACED 8 /* Used in tsk->exit_state: */ #define EXIT_DEAD 16 #define EXIT_ZOMBIE 32 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) /* Used in tsk->state again: */ #define TASK_DEAD 64 #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 #define TASK_NOLOAD 1024 #define TASK_NEW 2048 #define TASK_STATE_MAX 4096Copy the code
In task_struct, the state variables are defined in the include/ Linux /sched.h header file. As you can easily see from the value defined, flags is set in bitset mode. That is, the current state is set to one.TASK_RUNNING does not mean that the process is running, but rather that it is ready to run at any moment. A process in this state is running when it gets a time slice; If it does not get the slice, it is preempted by another process and is waiting to allocate the slice again.
Once a running process needs to perform some I/O operations, it needs to wait for the I/O completion. At this time, the CPU is released and the process enters the sleep state. In Linux, there are two sleep states.
- TASK_INTERRUPTIBLE indicates the interrupted sleep state. This is a light sleep state, which means that while sleeping, waiting for the I/O to complete, the process still has to wake up at a signal. But when you wake up, instead of continuing with what you just did, you do signal processing. Of course, programmers can write the signal processing function according to their own wishes, such as receiving some signals, give up waiting for the I/O operation to complete, directly exit, or receive some information, continue to wait.
- TASK_UNINTERRUPTIBLE Sleep status. This is a state of deep sleep that cannot be awakened by signals and must wait for the I/O operation to complete. Once the I/O operation fails to complete for some special reason, no one can wake up the process. You might say, what if I kill it? Remember, kill itself is a signal, and since this state cannot be awakened by a signal, the kill signal is ignored. There is no alternative but to restart the computer. Therefore, this is a dangerous thing and should not be set to TASK_UNINTERRUPTIBLE unless the programmer is absolutely sure.
- TASK_KILLABLE, a new sleep state that can be terminated. The process is in this state and operates like TASK_UNINTERRUPTIBLE, except that it responds to fatal signals.
- TASK_STOPPED occurs when the process receives SIGSTOP, SIGTTIN, SIGTSTP, or SIGTTOU signals.
- TASK_TRACED Indicates that a process is monitored by a debugger and its execution is stopped by a debugger. When a process is being monitored by another process, each signal puts the process into that state.
- EXIT_ZOMBIE: When a process terminates, it enters the EXIT_ZOMBIE state before its parent has called wait() to inform it of its termination.
- EXIT_DEAD is the final state of a process.
- EXIT_ZOMBIE and EXIT_DEAD can also be used for exit_state.
Process scheduling
State switching of a process usually involves scheduling, and the following fields are used for scheduling.
Operating statistics
u64 utime; U64 stime; // Unsigned long NVCSW; // Voluntary (voluntary) count unsigned long nivcsw; // Involuntary context switch counts u64 start_time; U64 real_start_time; // Process start time, including sleep timeCopy the code
Process kinship
As you can see from the process we created earlier, every process has a parent. So, the entire process is essentially a process tree. All processes with the same parent are siblings.
struct task_struct __rcu *real_parent; /* real parent process */
struct task_struct __rcu *parent; /* recipient of SIGCHLD, wait4() reports */
struct list_head children; /* list of my children */
struct list_head sibling; /* linkage in my parent's children list */
Copy the code
Parent points to its parent process. When it terminates, it must signal its parent process.
Children represents the head of the list. All elements in the linked list are its child processes.
Sibling is used to insert the current process into the sibling list.In general, real_parent is the same as parent, but there may be other cases. For example, if bash creates a process, both parent and real_parent of that process are bash. If you use GDB on bash to debug a process, GDB is real_parent and bash is the parent of the process.
Process permissions
In Linux, process permissions are defined as follows:
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
Copy the code
In Linux, process permissions are defined as: Objective and Subjective. In fact, access is who I can manipulate and who can manipulate me. “Who can manipulate me?” Obviously, I am the Objective, and the Subjective person who wants to manipulate me is the Objective. “Who can I manipulate?” at this time, I will be Subjective and Objectvie will be operated by me.
An operation is one object performing some action on another object. When an action is to be performed, permissions are checked, and when the two permissions match, the action can be performed. Where real_cred indicates who can operate on my process, and cred indicates who can operate on my process. Here creD is defined as follows:
struct cred {
......
kuid_t uid; /* real UID of the task */
kgid_t gid; /* real GID of the task */
kuid_t suid; /* saved UID of the task */
kgid_t sgid; /* saved GID of the task */
kuid_t euid; /* effective UID of the task */
kgid_t egid; /* effective GID of the task */
kuid_t fsuid; /* UID for VFS ops */
kgid_t fsgid; /* GID for VFS ops */
......
kernel_cap_t cap_inheritable; /* caps our children can inherit */
kernel_cap_t cap_permitted; /* caps we're permitted */
kernel_cap_t cap_effective; /* caps we can actually use */
kernel_cap_t cap_bset; /* capability bounding set */
kernel_cap_t cap_ambient; /* Ambient capability set */
......
} __randomize_layout;
Copy the code
According to the definition, most of the information is about the user and the user group to which the user belongs.
- Uid and GID, annotated as real user/group IDS. In general, the ID of the process started is the same as that of the process started. But when the authority review, often do not compare these two, that is to say, does not play a role.
- Euid and egid, annotated as effective user/group IDS. Look at the name, you know this is working. When the process operates on message queues, shared memory, semaphore, etc., it is comparing the permissions of the user and group.
- Fsuid and fsgid (filesystem user/group ID). This is an approved permission for file operations.
In general, fsuID, euID, and UID are the same, as are fsgid, egid, and GID. Because who started the process, you should check whether the user who started the process has this permission.
But there are special cases.For example, user A wants to play A game that user B has installed. The game program file has permissions of RWXR — r–. User A does not have permission to run the program, so user B has to give user A permission. User B says no problem, we’re all friends, so user B gives this application permission that all users can execute, rwxr-xr-x, and says, “Dude, have fun.” User A then gets permission to run the game. When the game is running, the uid, EUID, and fSUID of the game progress are all user A. Looks like no problem. Had a good time.
When user A tried to save the data after passing A level, he found that it was broken. The player data of this game was saved in another file. The file permission rw——- only gives user B the write permission, and the euID and fsuID of the game process are both user A, of course, can not write into the file. It’s over. This game is over.
So how do you solve this problem? We can set set-user-ID for the game program by chmod u+ S program, and change the game permission to RWSR-XR-x. When user A starts the game again, the process uid created by user A is of course the same as user A’s, but the euID and fsuID are not user A’s. Both euID and fSUID have been changed to user B so that the result of the pass can be saved.
In Linux, a process can set the user ID via setuID at any time, so the game’s user B’s ID is also saved in one place: suID and sgid (saved UID and Save GID). This makes it easy to use setuID and change permissions by setting uid or suID.
In addition to controlling permissions by user and user group, Linux has another mechanism called capabilities.
The root user has too much permission and the common user has too little permission to control processes. Sometimes an ordinary user who wants to do something with high privileges must be given full root privileges. It’s not safe. So we introduce capabilities, a bitmap that represents permissions defined in capability.h. Let me name a few.
#define CAP_CHOWN 0
#define CAP_KILL 5
#define CAP_NET_BIND_SERVICE 10
#define CAP_NET_RAW 13
#define CAP_SYS_MODULE 16
#define CAP_SYS_RAWIO 17
#define CAP_SYS_BOOT 22
#define CAP_SYS_TIME 25
#define CAP_AUDIT_READ 37
#define CAP_LAST_CAP CAP_AUDIT_READ
Copy the code
Common user processes can perform these operations when they have this permission. If you don’t, you can’t do it, so the granularity is much smaller.
Cap_permitted Permission of a process But what really works is cap_effective. Cap_permitted can contain permissions that are not in cap_effective. It is safer for a process to relinquish some of its privileges if necessary. Let’s say you’ve been broken because of a bug in your code, but if you can’t do anything, you can’t break further. Cap_inheritable indicates that when the inheritable bit is set in the executable’s extended attribute, executing the program using exec inheritable sets the inheritable set of the caller and adds it to the permitted set. However, when executing exec as a non-root user, the Inheritable collection is usually not retained, but it is often a non-root user who wants to retain permissions.
Cap_bset, also known as capability bounding set, is the permission reserved for all processes in the system. If a privilege does not exist in this collection, then no process in the system has that privilege. This is true even for processes executed with superuser privileges. This has many advantages. For example, if you remove the permission to load kernel modules after the system is started, no process can load kernel modules. That way, even if the machine is breached, you can’t do much harm. Cap_ambient is a relatively new addition to the kernel to address the cap_inheritable problem of how to retain permissions when non-root processes use exec to execute a program. When exec is executed, cap_ambient is added to cap_permitted and set to cap_effective.
Memory management
Each process has its own independent virtual memory space, which is represented by a data structure called MM_struct.
struct mm_struct *mm;
struct mm_struct *active_mm;
Copy the code
Files and file systems
Each process has a data structure for the file system and a data structure for opening files.
/* Filesystem information: */
struct fs_struct *fs;
/* Open file information: */
struct files_struct *files;
Copy the code
A task_struct process management structure looks like this: