LXCFS is a small FUSE filesystem written with the intention of making Linux containers feel more like a virtual machine. It started as a side-project of LXC but is useable by any runtime.
“
In human terms to explain it is:
XCFS is an open source FUSE (user-mode file system) implementation to support LXC containers, which can also support Docker containers. Let the application in the container read memory and CPU information through the LXCFS mapping, to read their own virtual data through the cgroup container-related definition information.
“
What is resource view isolation?
Container technology provides an environment isolation approach that is different from traditional virtual machine technology. Normal Linux containers speed up container packaging and startup, but also reduce container isolation. The most well-known problem of Linux containers is resource view isolation.
A container can use cgroup to limit the usage of resources, including memory and CPU. Note that if a process in the container uses some common monitoring commands, such as free and top, it still sees the data of the physical machine instead of the container. This is because the container does not isolate the resource view of /proc, /sys and other file systems.
Why do resource view isolation for containers?
-
From the perspective of containers, some service developers are accustomed to using the top and free commands on traditional physical machines and virtual machines to check the resource usage of the system. However, the container does not isolate the resource view, so the data in the container is still the data of the physical machine.
-
From an application perspective, running processes in containers is different from running processes on physical virtual machines. And some applications running processes in containers have some security risks:
For many JVM-based Java programs, the application starts with the allocation of the JVM heap and stack sizes based on the system resource ceiling. Running a JAVA application in a container will cause the application to fail to start because the memory data obtained by the JVM is still the data of the physical machine, and the resource quota allocated by the container is smaller than the amount of resources required by the JVM to start.
For programs that require host CPU info, For example, golang server development needs to obtain runtime.gomaxprocs (Runtime.numCPU ()) in Golang server development or operation when setting the number of service start processes (such as worker_processes in nginx configuration) Auto), like to automatically determine the number of CPUS in the operating environment through the program. However, processes in the container will always get the number of CPU cores from /proc/cpuinfo, and the /proc file system in the container is still physical, thus affecting the health of the services running in the container.
How do I isolate resource views for containers?
LXCFS came out of nowhere to solve this problem.
LXCFS reads system information from cgroup through file mounting and mounts it to the proc system inside the container through the volume of docker. The application in docker is then instructed to read the proc as if it were the real proc from the host.
Here is an architecture diagram of how LXCFS works:
Explain this picture, when we put the host machine/var/lib/LXCFS/proc/memoinfo file mount to the Docker container/proc/meminfo position after container process reads the corresponding file content, The /dev/fuse implementation of LXCFS reads the correct memory limit from the container’s corresponding Cgroup. This allows the application to obtain the correct resource constraints. CPU limitations work the same way.
Resource view isolation is implemented through LXCFS
Install LXCFS
wget https://copr-be.cloud.fedoraproject.org/results/ganto/lxc3/epel-7-x86_64/01041891-lxcfs/lxcfs-3.1.2-0.2.el7.x86_64.rpm; RPM -ivh lxcfs-3.2.2-0.2.el7.x86_64. RPM --force --nodepsCopy the code
Check that the installation is successful
[root@ifdasdfe2344 system]# lxcfs -h Usage: lxcfs [-f|-d] [-p pidfile] mountpoint -f running foreground by default; -d enable debug output Default pidfile is /run/lxcfs.pid lxcfs -h Copy the code
Start the LXCFS
Direct background boot
lxcfs /var/lib/lxcfs & Copy the code
Boot from Systemd (recommended)
touch /usr/lib/systemd/system/lxcfs.service cat > /usr/lib/systemd/system/lxcfs.service <<EOF [Unit] Description=lxcfs [Service] ExecStart=/usr/bin/lxcfs -f /var/lib/lxcfs Restart=on-failure #ExecReload=/bin/kill -s SIGHUP $MAINPID [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start lxcfs.service Copy the code
Check whether the startup is successful
[root@ifdasdfe2344 system]# ps aux | grep lxcfs Root 3276 0.0 0.0 112708 980 PTS /2 S+ 15:45 0:00 grep --color=auto LXCFSRoot 18625 0.0 0.0 234628 1296? Ssl 14:16 0:00 /usr/bin/lxcfs -f /var/lib/lxcfsCopy the code
The startup succeeded.
Verify the LXCFS effect
Not open LXCFS
We first run a container on a machine without LXCFS enabled, and observe CPU and memory information in the container. To see the difference, we used a high-configuration server (32C128G).
# Do the following systemctl stop lxcfs docker run -it ubuntu /bin/bash Enter the nginx container free -h Copy the code
From the above result, we can see that the memory information is viewed in the container, but the host’s meminfo is displayed.
# Look at the number of CPU cores cat /proc/cpuinfo| grep "processor"| wc -l Copy the code
It turns out that without LXCFS enabled, the container sees CPUInfo as the host.
Open LXCFS
systemctl start lxcfs LXCFS /proc file mapped to the /proc file in the container with memory set to 256M: docker run -it -m 256m \\ -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \\ -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \\ -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \\ -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \\ -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \\ -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \\ ubuntu:latest /bin/bash free -h Copy the code
You can see that the container’s own memory was correctly retrieved and that resource view isolation of the memory was successful.
# --cpus 2, limiting containers to a maximum of two logical cpus docker run -it --rm -m 256m --cpus 2 \\ -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \\ -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \\ -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \\ -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \\ -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \\ -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \\ ubuntu:latest /bin/sh Copy the code
Cpuinfo also limits the number of logical cpus a container can use. Specifying that a container can only run on a specified number of cpus should do more good than harm, requiring a little extra work to allocate the CpusET when creating the container.
Kubernetes practice for LXCFS
Using LXCFS in Kubernetes requires solving two problems:
The first problem is that LXCFS needs to be started on each node;
The second problem is to mount the /proc files maintained by LXCFS into each container;
DaemonSet to run the LXCFS FUSE filesystem
For the first problem, we installed LXCFS on each K8S node using DaemOnset.
Use the following YAML file directly:
apiVersion: apps/v1 kind: DaemonSet metadata: name: lxcfs labels: app: lxcfs spec: selector: matchLabels: app: lxcfs template: metadata: labels: app: lxcfs spec: hostPID: true tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: lxcfs Image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs:3.0.4 imagePullPolicy: Always securityContext: privileged: true volumeMounts: - name: cgroup mountPath: /sys/fs/cgroup - name: lxcfs mountPath: /var/lib/lxcfs mountPropagation: Bidirectional - name: usr-local mountPath: /usr/local volumes: - name: cgroup hostPath: path: /sys/fs/cgroup - name: usr-local hostPath: path: /usr/local - name: lxcfs hostPath: path: /var/lib/lxcfs type: DirectoryOrCreate Copy the code
kubectl apply -f lxcfs-daemonset.yaml Copy the code
You can see that the DAemonset of LXCFS has been deployed on each node.
Map LXCFS proc files to containers
For the second problem, we can solve it in two ways.
The first is simply to declare mount of the host /var/lib/lxcf/proc series of files in the YAML file of K8S Deployment.
The second method uses Kubernetes’ extension mechanism Initializer to automatically mount LXCFS files. However, the functionality of InitializerConfiguration is no longer supported after K8S 1.14 and will not be described here. But we can implement admission-webhook (Admission Control) to further validate requests after authorization or add default parameters, https://kubernetes.feisky.xyz/extension/auth/admission) to achieve the same purpose.
Verify whether your K8S cluster supports Admission $ kubectl api-versions | grep admissionregistration.k8s.io/v1beta1 admissionregistration.k8s.io/v1beta1 Copy the code
The preparation of admission-Webhook is beyond the scope of this article. You can read more about it in the official documentation.
Here is an implementation LXCFS admission webhook example, you can refer to: https://github.com/hantmac/lxcfs-admission-webhook
conclusion
This article describes a way to provide container resource view isolation through LXCFS, which can help some container applications better identify container runtime resource constraints.
At the same time, we introduced the deployment of LXCFS FUSE using container and DaemonSet, which not only greatly simplifies deployment, but also makes easy use of Kubernetes’ container management capability to support automatic recovery when LXCFS process fails. This ensures node deployment consistency during cluster scaling. This technique is applicable to other similar monitoring or system extensions.
In addition, we introduce the application of Kubernetes Admission Webhook to realize the automatic mounting of LXCFS files. The entire process is transparent to the application deployer, greatly simplifying operation and maintenance.