“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
What are Cgroups?
Cgroups, short for Control groups, is a mechanism provided by the Linux kernel to limit, record, and isolate the physical resources (such as CPU,memory,IO, etc.) used by progress groups. Originally developed by Google engineers, it was later incorporated into the Linux kernel.
What can Cgroups do?
The original goal of Cgroups was to provide a unified framework for resource management, both integrating required subsystems such as CPUSET and providing an interface for future development of new subsystems. Cgroups today can be used in a variety of scenarios, from resource control for a single process to OS Level Virtualization.
Cgroups Subsystem (Subsystem)
It works by setting parameters to the various Linux subsystem and binding processes to those subsystems.
The Linux subsystem has the following types:
- blkio
- cpu
- Cpuacct Collects statistics on the CPU usage of cGroup processes
- cpuset
- devices
- The freezer user suspends and resumes processes from the group
- Memeory controls the memory usage of cGroup processes
- net_cls
- net_prio
- ns
By installing the Cgroup tool
$ apt-get install cgroup-tools
$ lssubsys -a
cpuset
cpu,cpuacct
blkio
memory
devices
freezer
net_cls,net_prio
perf_event
hugetlb
pids
rdma
Copy the code
Cgroups Hierarchy
Hierarchy organizes a group of Cgroups into a tree structure so that cgroups can implement inheritance
Cgroup1 limits the CPU usage of processes P1, P2, and P3. If you want to limit the CPU usage of process P2, you can create cgroup2 under cgroup1 and make it inherit from Cgroup1. Memory limits can also be set without affecting other processes.
The kernel uses the Cgroups structure to indicate the resource limit of one or several Cgroups subsystems, which is organized in the form of a tree, called hierarchy.
Cgroups and process
Hierarchy, the relationship between subsystem and Cgroup process groups only realizes the inheritance relationship. The real resource limitation is still dependent on attaching subsystem to hierarchy. Process groups are added to hierarchy (Task) to limit resources
It can be seen from this picture that:
- A subsystem can only be attached to one hierarchy
- A hierarchy can attach more than one subsystem
- A process can be a member of multiple Cgroups, but these Cgroups must reside in different hierarchies.
- When a process forks out a child process, the child process is in the same Cgroup as the parent process or can be moved to another Cgroup as required.
Cgroups file system
The underlying implementation of Cgroups is hidden by the Virtual File System (VFS) of the Linux kernel, exposing the uniform File System API excuse for the user mode. Let’s take a look at how this filesystem works:
- First, create and mount a Hierarchy (Cgroup tree)
$ mkdir cgroup-test
$ sudo mount -t cgroup -o none,name=cgroup-test cgrout-test ./cgroup-test
$ ls ./cgrpup-test
cgroup.clone_children cgroup.sane_behavior release_agent
cgroup.procs notify_on_release tasks
Copy the code
These files are the configuration items of the cGroup root node in the Hierarchy
Cgroup. clone_children will be read by the CPUSET’s subsystem. If the value is 1, the child Cgroup will inherit the cPUSET configuration from the parent Cgroup.
Notify_on_release and RELEase_agent manage operations to be performed when the last process exits
Tasks identifies the process ID under the CGroup and associates cgroup process members with the Hierarchy
2. Create two sub-hierarchies. Create two cgroups extended from the cgroup root node on the newly created Hierarchy
$ cd cgroup-test
$ sudo mkdir cgroup-1
$ sudo mkdir cgroup-2
$ tree.├ ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ─ │ ├── ├─ cgroup.procs │ ├─ notify_on_release │ ├─ cgroup.procs ├ ─ ─ cgroup. Sane_behavior ├ ─ ─ notify_on_release ├ ─ ─ release_agent └ ─ ─ the tasks 2 directories and filesCopy the code
As you can see, when creating folders in a cgroup directory, the Kernel marks the folders as children of the cgroup, and they inherit the attributes of the parent Cgroup.
- Add and move processes to and from cGroups
A process in a Cgroups hierarchy can only exist on one Cgroup node. All processes in the system exist on the root node by default. You can move a process to another Cgroup node by writing the process ID into the Tasks file of the cGroup node to which the process is moved.
# cgroup-test
$ ehco $$
3444
$ cat /proc/3444/cgroup
13:name=cgroup-test:/
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/[email protected]
6:memory:/user.slice/user-1000.slice/[email protected]
...
Copy the code
You can see that the current terminal’s process is under the root cgroup. We now move it to the child CGroup
$ cd cgroup-1
$ sudo sh -c "echo $$ >> tasks"
$ cat /proc/3444/cgroup
13:name=cgroup-test:/cgroup-1
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/[email protected]
6:memory:/user.slice/user-1000.slice/[email protected]
...
Copy the code
Cgroup-1: The cgroup that the terminal process belongs to has changed to Cgroup-1
$ cd cgroup-test
$ cat tasks | grep "3444"
#Returns null
Copy the code
- Subsystem to restrict the resources of processes in the Cgroup.
The OS has already created a default hierarchy for each subsystem under the sys/fs/cgroup/ directory by default
$ ls /sys/fs/cgroup
blkio cpu,cpuacct freezer net_cls perf_event systemd
cpu cpuset hugetlb net_cls,net_prio pids unified
cpuacct devices memory net_prio rdma
Copy the code
You can see that the hierarchy of the memory subsystem also creates a sub-Cgroup in it
$ cd /sys/fs/cgroup/memory
$ sudo mkdir test-limit-memory && cd test-limit-memorysudo
#Set the maximum memory usage to 100MB
$ sudo sh -c "echo "100m" > memory.limit_in_bytes"sudo sh -c "echo $$ > tasks"
sudo sh -c "echo $$ > tasks"
$ sudo sh -c "echo $$ > tasks"
#Running stress often takes up 200MB of memory
$ stress --vm-bytes 200m --vm-keep -m 1
Copy the code
You can compare the amount of memory left before and after running, and it is only about 100MB less
#Before running
$ topTop-12:04:12 up 6:45, 1 User, Load Average: 1.87, 1.29, 1.06 348 total, 1 running, 346 sleeping, 0 stopped, 1 zombie%Cpu(S): 1.3US, 0.9SY, 0.0Ni, 97.7 ID, 0.0wa, 0.0hi, 0.1Si, 0.0st
MiB Mem : 5973.4 total, 210.8 free, 2820.9 used, 2941.8 buff/cache
MiB Swap: 923.3 total, 921.9 free, 1.3 used. 2746.3 avail Mem
#After the operation
$ topTop-12:04:57 UP 6:45, 1 User, Load Average: 2.25, 1.44, 1.12 351 total, 3 running, 347 sleeping, 0 stopped, 1 zombie%Cpu(S): 34.3US, 32.8SY, 0.0Ni, 21.1ID, 4.9wa, 0.0hi, 6.9Si, 0.0STMiB Mem: 5973.4 Total, 118.6 Free, 2956.7 used, 2898.1 Buff/Cache MiB Swap: 923.3 Total, 817.7 Free, 105.5 Used. 2604.5 Avail MemCopy the code
Cgroup restriction is in effect
How does the cgroup restriction work in Docker
Start by running a container with limited memory
$ sudo docker pull redis:4
$ sudo docker run -tid -m 100m redis:4
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d
Copy the code
Check the original cgroup bound to the memory subsystem, you can see that there are more sub-cgroups, docker
$ ls /sys/fs/cgroup/memory. docker ...$ ls /sys/fs/cgroup/memory/docker
cgroup.clone_children memory.max_usage_in_bytes
cgroup.event_control memory.memsw.failcnt
cgroup.procs memory.memsw.limit_in_bytes
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d memory.memsw.max_usage_in_bytes
memory.failcnt memory.memsw.usage_in_bytes
memory.force_empty memory.move_charge_at_immigrate
memory.kmem.failcnt memory.numa_stat
memory.kmem.limit_in_bytes memory.oom_control
memory.kmem.max_usage_in_bytes memory.pressure_level
memory.kmem.slabinfo memory.soft_limit_in_bytes
memory.kmem.tcp.failcnt memory.stat
memory.kmem.tcp.limit_in_bytes memory.swappiness
memory.kmem.tcp.max_usage_in_bytes memory.usage_in_bytes
memory.kmem.tcp.usage_in_bytes memory.use_hierarchy
memory.kmem.usage_in_bytes notify_on_release
memory.limit_in_bytes tasks
Copy the code
Can see dockercgroup d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8dcgroup happens to be inside the container we just created ID, so have a look at it
$ cd /sys/fs/cgroup/memory/docker/d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d
$ cat memory.limit_in_bytes
104857600cat
#Is 100 MB
Copy the code
conclusion
It is organized and associated by three concepts (Cgroup, subsystem and Hierarchy), which can be understood as a three-layer structure. The process is associated in the Cgroup, and then the Cgroup is associated with hierarchy. Subsystem is then associated with hierarchy to achieve a certain reuse capability on the basis of limiting process resources.
Describes the specific implementation of Docker, in the use of Docker, but also from the heart of how to do it when the container use resources limit.