Author: Roast chicken prince

Source: Hang Seng LIGHT Cloud Community

background

Recently in the group to explain some new docker container, we can understand that the Docker container is actually running a process, but because docker helps us to wrap this process, to this process to be a runnable microLinux environment, let us feel like “like” virtual machine. So how does he achieve resource control? We all know that namespace and cgroup are used. I am also curious, so I took some time to study, and also made some notes, so I can study with you

Linux namespace

Linux Namespace is a kernel-level resource isolation mechanism used to keep processes running on the same operating system from interfering with each other.

The purpose of a namespace is isolation. If there are processes running in a namespace, they can only see information about the namespace, not anything outside the namespace.

Let’s think about it: what does a process know when it’s running?

  • See the system hostname
  • Available network resources (bridge, interface, network port…)
  • Process relationships (which processes exist, parent-child relationships between processes, etc.),
  • User information of the system (which users and groups are there, and what are their permissions)
  • File systems (which file systems are available and how they are used)
  • IPC (How to implement interprocess communication)
  • … That is, if isolation is to be implemented, you must ensure that processes in different namespaces see these things differently.

If I were to do it, the first idea would be to isolate a whole set of the above resources for each namespace, but in reality Linux implementations isolate all of the above resources individually.

Currently, the Linux kernel mainly implements several different namespace resources, which are described on the official website as follows:

The name of the Macro definition Macro definition
IPC CLONE_NEWIPC System V IPC, POSIX message queues (since Linux 2.6.19)
Network CLONE_NEWNET network device interfaces, IPv4 and IPv6 protocol stacks, IP routing tables, firewall rules, the /proc/net and /sys/class/net directory trees, sockets, etc (since Linux 2.6.24)
Mount CLONE_NEWNS Mount points (since Linux 2.4.19)
PID CLONE_NEWPID Process IDs (since Linux 2.6.24)
User CLONE_NEWUSER User and group IDs (started in Linux 2.6.23 and completed in Linux 3.8)
UTS CLONE_NEWUSER Hostname and NIS domain name (since Linux 2.6.19)
Cgroup CLONE_NEWCGROUP Cgroup root directory (since Linux 4.6)

These namespaces basically cover the environment a program needs to run, ensuring that running in a separate namespace will not be disturbed by other programs receiving a namespace. Not all system resources can be isolated, but time is an exception. There is no namespace, so the same Linux startup container has the same time.

However, the problem that Namespace solves is mainly the problem of environmental isolation, which is only the most basic step in virtualization. We also need to solve the problem of isolation of computer resource usage. That is, although you have added me to a specific environment through Namespace, my processes in the environment can use CPU, memory, disk, and other computing resources as they please. Therefore, we want to limit or control the resource utilization of the process. That’s where Linux CGroups come in.

Linux cgroup

The Linux CGroup (full name of the Linux Control Group) is a function of the Linux kernel. It is used to restrict, Control, and separate the resources (such as CPU, memory, disk input and output) of a process Group.

Linux CGroupCgroup lets you allocate resources — such as CPU time, system memory, network bandwidth, or a combination of these resources — to user-defined groups of tasks (processes) running on your system. You can monitor your configured Cgroups, deny Cgroups access to certain resources, and even dynamically configure your Cgroups on a running system.

It mainly provides the following functions:

  • Resource limitation: Limits the use of resources, such as memory limitations and file system cache limitations.
  • Control, such as CPU utilization and disk I/O throughput.
  • Accounting: some auditing or statistics, mainly for the purpose of Accounting.
  • Control: Suspends the process and resumes the execution process. Using Cgroups, system administrators have more specific control over the allocation, prioritization, denial, management, and monitoring of system resources. Better allocation of hardware resources by task and user improves overall efficiency.

Cgroups are used for:

  • Resource Limit Cgroups limits the total amount of resources (such as memory, CPU, and disk) that a task can use. For example, if the upper limit of memory used by the application is set, an OOM prompt will be sent once the memory quota is exceeded
  • Priority allocation The number of CPU time slices and disk I/O bandwidth allocated control the task priority
  • Resource Statistics Cgroups can measure system resource usage, such as CPU usage and memory usage. This function is suitable for accounting
  • Task control Cgroups allows you to suspend and restore tasks

conclusion

The essence of a Docker container is a process on the host. Docker implements resource isolation through namespace and resource restriction through cgroups. Namespace technology actually modifies the application process’s “view” of the entire computer. That is, its “view” is limited by the operating system, and it can only “see” the specified content. The amount of resources that the process can use is limited by the Cgroup configuration. These isolated and restricted processes are not much different from other processes.

In addition, namespace and cgroup have done a good job, but they still have a lot of imperfections, such as the inability to do time isolation, such as /proc file system problems, these can be discussed later, but this chapter will not do a detailed introduction.