This is the 21st day of my participation in Gwen Challenge
Zero, preamble (the container is a special process)
Why use containers?
Container virtualization makes building applications more efficient and easier to manage and maintain.
On the left is how the virtual machine works and on the right is Docker, as shown:
The core technologies for containers are Cgroup and Namespace, on top of which there are several other tools that make up the container technology.
A container is a process on a host:
- Container technology passed
Namespace
Implementing resource isolation - through
Cgroup
Implementing Resource Control - through
rootfs
Implement file system isolation - The container engine itself has features to manage the lifecycle of the container
Supplement: Docker is similar to LXC management engine in the early stage. LXC is the management tool of Cgroup, and Cgroup is the user space management interface of Namespace. Namespace is the basic mechanism used by the Linux kernel to manage processes in task_struct.
A,Namespace
Resource isolation
One of the main goals of Namespace development was to enable lightweight virtualization services
Resource isolation comes to mind with the chroot command, which enables file system isolation.
As shown in figure:
(1) 6 kindsNamespace
isolation
Containers require six basic quarantines:
As shown in figure:
IPC
Interprocess communication is realized through shared memory. If the two processes can go straight throughIPC
Visits? That’s not isolation. PrimitiveLinux ENV
Different processes can be passed directlyIPC
communication- There are two forms of process tree and file system tree. One is user-space
INIT
One is subordinate to a process. Child processes are created, terminated, and reclaimed by the parent process.INIT
Before the end, you need to terminate all processes (PID mapping) in the user space. - Use a User to run a process (User). Each user space needs its own
root
Pretend to beroot
A user [who cannot process content in other user space, such as deleting files] is a normal user on the host. Mount
, file mount system,cd /usr/bin/
Compare to the host. Found to be an independent file systemUTS-hostname
Network
:netstat -an | grep 22
(2)Namespace
operation
Operations on the Namespace are performed through clone, setNS, and unshare system calls.
clone
Can be used to create new onesNamespace
unshare
The called process will be put into a new oneNamespace
setns
Put the process into an existing oneNamespace
Example Query the Namespace of the current process
donald@donald-pro:~$ ls -l /proc/$$/ns total 0 lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 cgroup -> 'cgroup:[4026531835]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 ipc -> 'ipc:[4026531839]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 mnt -> 'mnt:[4026531840]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 net -> 'net:[4026532009]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 pid -> 'pid:[4026531836]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 pid_for_children -> 'pid:[4026531836]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 user -> 'user:[4026531837]' lrwxrwxrwx 1 donald donald 0 Apr 22 00:00 uts -> 'uts:[4026531838]' donald@donald-pro:~$Copy the code
Second,Cgroup
Cgroup is a mechanism provided by the Linux kernel to limit, record, and isolate the physical resources (such as CPU, memory, IO, and so on) used by process groups
Cgroup has a process grouping framework, and different resources are controlled by different subsystems. A subsystem is a resource controller, such as the CPU subsystem, which controls the allocation of CPU time.
By Linux namespace for the newly created process isolation between the file system, network and host machine process isolation from each other, but the namespace is not able to provide us with the isolation on the physical resources, such as CPU or memory, if on the same machine running multiple [container] knew nothing about each other and the host machine, Together, these containers occupy the physical resources of the host machine.
When Docker is installed on Linux, you will find a directory named Docker in the directories of all subsystems.
The contents of the cpu.cfs_quota_us file can limit CPU usage.
donald@donald-pro:/sys/fs/cgroup$ ll
total 0
drwxr-xr-x 15 root root 380 Apr 22 18:05 ./
drwxr-xr-x 9 root root 0 Apr 22 18:05 ../
dr-xr-xr-x 4 root root 0 Apr 22 18:05 blkio/
lrwxrwxrwx 1 root root 11 Apr 22 18:05 cpu -> cpu,cpuacct/
lrwxrwxrwx 1 root root 11 Apr 22 18:05 cpuacct -> cpu,cpuacct/
dr-xr-xr-x 4 root root 0 Apr 22 18:05 cpu,cpuacct/
dr-xr-xr-x 2 root root 0 Apr 22 18:05 cpuset/
dr-xr-xr-x 5 root root 0 Apr 22 18:05 devices/
dr-xr-xr-x 3 root root 0 Apr 22 18:05 freezer/
dr-xr-xr-x 2 root root 0 Apr 22 18:05 hugetlb/
dr-xr-xr-x 4 root root 0 Apr 22 18:05 memory/
lrwxrwxrwx 1 root root 16 Apr 22 18:05 net_cls -> net_cls,net_prio/
dr-xr-xr-x 2 root root 0 Apr 22 18:05 net_cls,net_prio/
lrwxrwxrwx 1 root root 16 Apr 22 18:05 net_prio -> net_cls,net_prio/
dr-xr-xr-x 2 root root 0 Apr 22 18:05 perf_event/
dr-xr-xr-x 4 root root 0 Apr 22 18:05 pids/
dr-xr-xr-x 2 root root 0 Apr 22 18:05 rdma/
dr-xr-xr-x 5 root root 0 Apr 22 18:05 systemd/
dr-xr-xr-x 5 root root 0 Apr 22 18:05 unified/
Copy the code
donald@donald-pro:/sys/fs/cgroup/cpu/docker$ ll total 0 drwxr-xr-x 3 root root 0 Apr 25 14:28 ./ dr-xr-xr-x 5 root root 0 Apr 25 14:28 .. / drwxr-xr-x 2 root root 0 Apr 25 14:28 c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904/ -rw-r--r-- 1 root root 0 Apr 25 14:28 cgroup.clone_children -rw-r--r-- 1 root root 0 Apr 25 14:28 cgroup.procs -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.stat -rw-r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_all -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu_sys -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_percpu_user -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_sys -r--r--r-- 1 root root 0 Apr 25 14:28 cpuacct.usage_user -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.cfs_period_us -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.cfs_quota_us -rw-r--r-- 1 root root 0 Apr 25 14:28 cpu.shares -r--r--r-- 1 root root 0 Apr 25 14:28 cpu.stat -rw-r--r-- 1 root root 0 Apr 25 14:28 notify_on_release -rw-r--r-- 1 root root 0 Apr 25 14:28 tasksCopy the code
donald@donald-pro:/sys/fs/cgroup/cpu/docker/c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c988e6a0567c mobz/elasticsearch-head:5 "/bin/sh -c 'grunt s..." 5 months ago Up 3 minutes 0.0.0.0:9100->9100/ TCP loving_albattani 1 said not Donald @ Donald - pro: the limit/sys/fs/cgroup/CPU/docker/c988e6a0567ccc350b18e3e2eb96cfe0dbff4edd202ab4132012916b019c2904 $ cat cpu.cfs_quota_us -1Copy the code
The container creation process
(1) System callclone
Create a new process and have your ownNamespace
This process has its own PID, mount, User, NET, IPC, uts namespace
root@docker:~# pid = clone(fun, stack, flags, clone_arg);
Copy the code
(2)pid
writecgroup
Subsystem is subjected tocgroup
Subsystem control
root@docker:~# echo$pid > /sys/fs/cgroup/cpu/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/cpuset/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/bikio/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/memory/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/devices/tasks
root@docker:~# echo$pid > /sys/fs/cgroup/feezer/tasks
Copy the code
(3) Passpivot_root
The system calls
Through the pivot_root system call, the process enters a new rootfs, and then executes /bin/bash in the new Namespace, Cgroup, and rootfs through the exec system call
fun() {
pivot_root("path_of_rootfs/", path);
exec("/bin/bash");
}
Copy the code