“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

What are Cgroups?

Cgroups, short for Control groups, is a mechanism provided by the Linux kernel to limit, record, and isolate the physical resources (such as CPU,memory,IO, etc.) used by progress groups. Originally developed by Google engineers, it was later incorporated into the Linux kernel.

What can Cgroups do?

The original goal of Cgroups was to provide a unified framework for resource management, both integrating required subsystems such as CPUSET and providing an interface for future development of new subsystems. Cgroups today can be used in a variety of scenarios, from resource control for a single process to OS Level Virtualization.

Cgroups Subsystem (Subsystem)

It works by setting parameters to the various Linux subsystem and binding processes to those subsystems.

The Linux subsystem has the following types:

  • blkio
  • cpu
  • Cpuacct Collects statistics on the CPU usage of cGroup processes
  • cpuset
  • devices
  • The freezer user suspends and resumes processes from the group
  • Memeory controls the memory usage of cGroup processes
  • net_cls
  • net_prio
  • ns

By installing the Cgroup tool

$ apt-get install cgroup-tools
$ lssubsys -a
cpuset
cpu,cpuacct
blkio
memory
devices
freezer
net_cls,net_prio
perf_event
hugetlb
pids
rdma
Copy the code

Cgroups Hierarchy

Hierarchy organizes a group of Cgroups into a tree structure so that cgroups can implement inheritance

Cgroup1 limits the CPU usage of processes P1, P2, and P3. If you want to limit the CPU usage of process P2, you can create cgroup2 under cgroup1 and make it inherit from Cgroup1. Memory limits can also be set without affecting other processes.

The kernel uses the Cgroups structure to indicate the resource limit of one or several Cgroups subsystems, which is organized in the form of a tree, called hierarchy.

Cgroups and process

Hierarchy, the relationship between subsystem and Cgroup process groups only realizes the inheritance relationship. The real resource limitation is still dependent on attaching subsystem to hierarchy. Process groups are added to hierarchy (Task) to limit resources

It can be seen from this picture that:

  • A subsystem can only be attached to one hierarchy
  • A hierarchy can attach more than one subsystem
  • A process can be a member of multiple Cgroups, but these Cgroups must reside in different hierarchies.
  • When a process forks out a child process, the child process is in the same Cgroup as the parent process or can be moved to another Cgroup as required.

Cgroups file system

The underlying implementation of Cgroups is hidden by the Virtual File System (VFS) of the Linux kernel, exposing the uniform File System API excuse for the user mode. Let’s take a look at how this filesystem works:

  1. First, create and mount a Hierarchy (Cgroup tree)
$ mkdir cgroup-test
$ sudo mount -t cgroup -o none,name=cgroup-test cgrout-test ./cgroup-test
$ ls ./cgrpup-test
cgroup.clone_children  cgroup.sane_behavior  release_agent
cgroup.procs           notify_on_release     tasks
Copy the code

These files are the configuration items of the cGroup root node in the Hierarchy

Cgroup. clone_children will be read by the CPUSET’s subsystem. If the value is 1, the child Cgroup will inherit the cPUSET configuration from the parent Cgroup.

Notify_on_release and RELEase_agent manage operations to be performed when the last process exits

Tasks identifies the process ID under the CGroup and associates cgroup process members with the Hierarchy

2. Create two sub-hierarchies. Create two cgroups extended from the cgroup root node on the newly created Hierarchy

$ cd cgroup-test
$ sudo mkdir cgroup-1
$ sudo mkdir cgroup-2
$ tree.├ ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ─ │ ├── ├─ cgroup.procs │ ├─ notify_on_release │ ├─ cgroup.procs ├ ─ ─ cgroup. Sane_behavior ├ ─ ─ notify_on_release ├ ─ ─ release_agent └ ─ ─ the tasks 2 directories and filesCopy the code

As you can see, when creating folders in a cgroup directory, the Kernel marks the folders as children of the cgroup, and they inherit the attributes of the parent Cgroup.

  1. Add and move processes to and from cGroups

A process in a Cgroups hierarchy can only exist on one Cgroup node. All processes in the system exist on the root node by default. You can move a process to another Cgroup node by writing the process ID into the Tasks file of the cGroup node to which the process is moved.

# cgroup-test
$ ehco $$
3444
$ cat /proc/3444/cgroup 
13:name=cgroup-test:/
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/[email protected]
6:memory:/user.slice/user-1000.slice/[email protected]
...
Copy the code

You can see that the current terminal’s process is under the root cgroup. We now move it to the child CGroup

$ cd cgroup-1
$ sudo sh -c "echo $$ >> tasks"
$ cat /proc/3444/cgroup
13:name=cgroup-test:/cgroup-1
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/[email protected]
6:memory:/user.slice/user-1000.slice/[email protected]
...
Copy the code

Cgroup-1: The cgroup that the terminal process belongs to has changed to Cgroup-1

$ cd cgroup-test
$ cat tasks | grep "3444"
#Returns null
Copy the code
  1. Subsystem to restrict the resources of processes in the Cgroup.

The OS has already created a default hierarchy for each subsystem under the sys/fs/cgroup/ directory by default

$ ls /sys/fs/cgroup
blkio    cpu,cpuacct  freezer  net_cls           perf_event  systemd
cpu      cpuset       hugetlb  net_cls,net_prio  pids        unified
cpuacct  devices      memory   net_prio          rdma
Copy the code

You can see that the hierarchy of the memory subsystem also creates a sub-Cgroup in it

$ cd /sys/fs/cgroup/memory
$ sudo mkdir test-limit-memory && cd test-limit-memorysudo
#Set the maximum memory usage to 100MB
$ sudo sh -c "echo "100m" > memory.limit_in_bytes"sudo sh -c "echo $$ > tasks"
sudo sh -c "echo $$ > tasks"
$ sudo sh -c "echo $$ > tasks"
#Running stress often takes up 200MB of memory
$ stress --vm-bytes 200m --vm-keep -m 1
Copy the code

You can compare the amount of memory left before and after running, and it is only about 100MB less

#Before running
$ topTop-12:04:12 up 6:45, 1 User, Load Average: 1.87, 1.29, 1.06 348 total, 1 running, 346 sleeping, 0 stopped, 1 zombie%Cpu(S): 1.3US, 0.9SY, 0.0Ni, 97.7 ID, 0.0wa, 0.0hi, 0.1Si, 0.0st
MiB Mem :   5973.4 total,    210.8 free,   2820.9 used,   2941.8 buff/cache
MiB Swap:    923.3 total,    921.9 free,      1.3 used.   2746.3 avail Mem 

#After the operation
$ topTop-12:04:57 UP 6:45, 1 User, Load Average: 2.25, 1.44, 1.12 351 total, 3 running, 347 sleeping, 0 stopped, 1 zombie%Cpu(S): 34.3US, 32.8SY, 0.0Ni, 21.1ID, 4.9wa, 0.0hi, 6.9Si, 0.0STMiB Mem: 5973.4 Total, 118.6 Free, 2956.7 used, 2898.1 Buff/Cache MiB Swap: 923.3 Total, 817.7 Free, 105.5 Used. 2604.5 Avail MemCopy the code

Cgroup restriction is in effect

How does the cgroup restriction work in Docker

Start by running a container with limited memory

$ sudo docker pull redis:4
$ sudo docker run -tid -m 100m redis:4
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d
Copy the code

Check the original cgroup bound to the memory subsystem, you can see that there are more sub-cgroups, docker

$ ls /sys/fs/cgroup/memory. docker ...$ ls /sys/fs/cgroup/memory/docker
cgroup.clone_children                                             memory.max_usage_in_bytes
cgroup.event_control                                              memory.memsw.failcnt
cgroup.procs                                                      memory.memsw.limit_in_bytes
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d  memory.memsw.max_usage_in_bytes
memory.failcnt                                                    memory.memsw.usage_in_bytes
memory.force_empty                                                memory.move_charge_at_immigrate
memory.kmem.failcnt                                               memory.numa_stat
memory.kmem.limit_in_bytes                                        memory.oom_control
memory.kmem.max_usage_in_bytes                                    memory.pressure_level
memory.kmem.slabinfo                                              memory.soft_limit_in_bytes
memory.kmem.tcp.failcnt                                           memory.stat
memory.kmem.tcp.limit_in_bytes                                    memory.swappiness
memory.kmem.tcp.max_usage_in_bytes                                memory.usage_in_bytes
memory.kmem.tcp.usage_in_bytes                                    memory.use_hierarchy
memory.kmem.usage_in_bytes                                        notify_on_release
memory.limit_in_bytes                                             tasks
Copy the code

Can see dockercgroup d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8dcgroup happens to be inside the container we just created ID, so have a look at it

$ cd /sys/fs/cgroup/memory/docker/d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d
$ cat memory.limit_in_bytes
104857600cat
#Is 100 MB
Copy the code

conclusion

It is organized and associated by three concepts (Cgroup, subsystem and Hierarchy), which can be understood as a three-layer structure. The process is associated in the Cgroup, and then the Cgroup is associated with hierarchy. Subsystem is then associated with hierarchy to achieve a certain reuse capability on the basis of limiting process resources.

Describes the specific implementation of Docker, in the use of Docker, but also from the heart of how to do it when the container use resources limit.