The overall architecture

Docker uses C/S architecture and uses REST apis, UNIX sockets, or network interfaces to communicate. Generally, the client will run on the same machine as the Docker service. Commands such as Docker build, pull and run that we usually use are sent to the local client, and then sent to the Docker server by the local client. Alternatively, the client can also be deployed independently, like Docker Compose.

Docker services are generally run in the form of a daemon process, which will listen to client requests, and construct, run and distribute containers. The overall architecture of Docker is as follows:

  • Docker daemon: listens for Docker API requests and manages Docker objects, such as images, containers, networks, and volumes. Daemons can also communicate with other daemons to manage Docker services.
  • The Docker clientSend commands to the Docker daemon via the Docker API (dockerd), let the daemon execute the corresponding command action, such as sending the Docker run command.
  • Docker Registry: Stored the Docker image. Like Docker Hub, which is a public registry that anyone can use, Docker will look up images from the Docker Hub by default. Of course, we can also build a Docker Registry by ourselves.

Evolution of containers

At the beginning, Docker is based on the technology provided by the Linux kernel for container management. It simplifies the complex container management of Linux and forms its own unique command system. Later, Docker abstracts the underlying technology and defines a set of interfaces. Once these interfaces are implemented, containers can be managed, which is called Libcontainer.

With the popularity of Docker, more and more companies have joined in the development of container technology. In 2015, Google, Microsoft, Docker and other companies established OCI to customize consistent container standards. The container engine runC is based on Libcontainer.

Maybe you will be curious about Windows container architecture is what? In fact, CGroup and Namespace are abstracted from Windows, which also comply with the OCI container standard, as shown in the following figure:

(Image from Black Belt’s DockerCon presentation: Docker and Windows Containers Revealed)

The underlying technology

Docker is written in the Go language, so it naturally supports this kind of cross-platform deployment. So let’s take a look at Linux’s container infrastructure: Namespaces (resource isolation), CGroups (resource restriction), UnionFS (mirroring and container layering).

Namespaces

Namespaces are a concept introduced in Linux kernel after version 2.4.19. Namespaces abstract global resources into a system, making processes in the same namespace appear to have their own global resources. Currently, Linux supports the following six types of namespaces

namespace Isolated system resources
Mount namespaces File system mount point
IPC namespaces A specific interprocess communication resource
UTS namespaces Nodename and domainname
PID namespaces The process ID
Network namespaces Network-related system resources
User namespaces User and group ID space

We can see that there is a Network Network, there is also a User isolation. When a container is created, an instance of the Namespace above is created, and then the container processes are divided into the Namespace to achieve isolation.

CGroups (Resource Limits)

The Namespace above provides us with environment isolation, but this is not enough because there is no limit to the resources used by each process, such as CPU, memory, etc. Once a container exceeds the upper limit, it may be killed. Therefore, limiting the use of resources is important, and CGroups in the Linux kernel provides this capability.

Docker run nginx:test, docker run nginx:test, /sys/fs/cgroup Will be in the/sys/fs/cgroup/memory/docker/nginx container ID have corresponding resource description file directory:

This can be done with the docker run –memory 1024M nginx:test command. Other resource limit commands are similar.

UnionFS (Image and container layering)

Linux’s UnionFS (Federated file System) technology is used to merge directories from different physical locations into the same directory. Actually UnionFS has different implementations on different systems, now the mainstream is AUFS, Devicemapper and OverlayFS. AUFS is the most commonly used in Docker. Let’s mainly look at the relevant knowledge of AUFS.

First, by default, AUFS has a feature that the first file to be federated is readable and writable, while subsequent file directories are read-only. For example, we merge the teacher and student directories into the MNT directory:

│ ├── ─ ├── # ├── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── ── Student ├ ─ ─ B └ ─ ─ C # view. / MNT $tree. / MNT ├ ─ ─ A ├ ─ ─ B └ ─ ─ CCopy the code

When we make changes to the C directory under MNT, we will see the changes synchronously in the teacher directory, but the student directory will not be modified because it is read-only. How does this work in Docker?

First, Docker divides the file system into a container layer and a mirror layer, where the container layer corresponds to the teacher directory above and the mirror layer corresponds to the student directory. The container layer is readable and writable, while the image layer is read-only. This makes it easier for multiple containers to share an image file.

In addition, Docker does not create the container layer at the beginning, but uses the image layer files first. Only when the files in the container are modified, can the container layer be truly created to ensure that the image layer files are not affected. And this kind of similar write copy technology, saves a lot of unnecessary storage files for the system.

The safety of the Docker

When reviewing the security of Docker, the following four aspects should be considered:

(1) Security of Namespaces and CGroups

Docker containers are very similar to LXC containers in that they share the same security features. Namespaces provide the first and most direct form of isolation, preventing processes running in a container from seeing processes running in another container or host system. Each container also has its own network stack, which means that one container cannot gain privileged access to another’s socket or interface.

CGroups is another key component of Linux containers, accounting and limiting resources, providing a number of metrics that ensure that each container gets a fair share of resource usage (for example, memory, CPU, disk I/O) so that individual containers cannot exhaust system resources. This is especially important on multi-tenant platforms, such as PaaS, to ensure consistent uptime performance for users.

(2) Docker daemon security

Running Docker daemons requires root privileges, so only trusted users can run Docker daemons. But because Docker allows hosts and containers to share folders, if we map system files to Docker containers, it must also be able to break through system protection. However, this depends largely on the host files we associate with, and is generally more manageable.

Docker also needs to prevent some illegal requests from creating destructive containers. After 0.5.2, to protect against cross-site scripting attacks by malicious users, Docker uses native UNIX sockets instead of TCP sockets bound to 127.0.0.1, which allows users to perform local permission checks for secure access.

(3) The security of Linux kernel

By default, Docker starts a set of containers with limited functionality, which gives the “root” in the container fewer privileges than the real “root”, for example:

  • Disable any mount operation;
  • Disable access to the local socket (to prevent packet spoofing);
  • Prohibit operations on certain file systems, such as changing file owners or properties.
  • Disable module loading;

This makes it difficult for an intruder to upgrade to root in a container and cause serious damage to the host.

(4) Other kernel security features

  • Allows you to configure a mirror warehouse that can only pull the signature of the specified secret key
  • Run the kernel with GRSEC and PAX, adding many security checks at compile and run time
  • Use container templates with security features
  • User-defined access control policies

Interested friends can search the public account “Read new technology”, pay attention to more pushed articles.

Thank you for your support!

Read new technology, read more new knowledge.