background

Recently, I happened to see containerd source code in some community issues. In the early years, most people use docker/ Podman and other containers, including me, so I don’t have a deep understanding of the structure of real container runtime. The k8S community recently decided to remove the docker-shim interaction logic from Kubelet in 1.21, and the components are likely to go to Containerd-shim, so we need to start to understand container runtime better.

With the question as an introduction,

Which component of the Docker Daemon does this?

1. Background knowledge

First we need to understand how the Docker Daemon produces containers.

The current entire Docker call chain architecture can be summarized in the following figure

Since Docker 1.11, Docker Daemons have been split into modules to accommodate the OCI standard. After the split, the structure is divided into the following parts:

In December 2016, Docker announced that containerd would be separated from Docker Engine and donated to the open source community for independent development and operation.

An industry-standard container runtime emphasizes simplicity, robustness, and portability

In fact, Docker itself has been stripped clean, only Docker itself as the CLI features, real container control is implemented in Containerd.

In 2017, Docker was renamed Moby, gradually breaking away from the relationship between container and Docby, while Moby was more like “Lego”. It is a hotchpock of building, logging, volume management, networking, image management, Containerd, SwarmKit, etc.

In February 19, Containerd officially graduated from the CNCF community, becoming the fifth graduate program after Kubernetes, Prometheus, Envoy, and CoreDNS.

2. Ecological architecture

The Docker ecosystem alone is very large, so today we focus on the running Containerd ecosystem architecture. The surrounding ecology is as follows:

  • OCI
  • CRI
  • kubelet
  • dockerd
    • docker.sock (/var/run/docker.sock)
  • dockershim
    • dockershim.sock (/var/run/dockershim.sock)
  • containerd-cri
    • containerd.sock (/var/run/docker/containerd/containerd.sock|/run/containerd/containerd.sock)
  • containerd-shim
  • Runc
  • RunV
  • Kata
  • gVisor

Containerd’s role in the container ecosystem, as the carrier for the container when it runs

Containerd architecture

In brief, it can be divided into:

  • containerd-shim
  • runC
  • LXC call encapsulation

A more refined into

The GRPC module provides service interfaces to the upper layer, while metrics provides monitoring data (cgroup-related data). Both provide services to the upper layer. Containerd contains a daemon that exposes the GRPC interface over a local UNIX socket.

Storage Manages metadata of containers and images and stores them on disks through BooTIO task manages the logical structure of containers and interacts with low-level events Events that operate on containers. Runtimes — Low-level Runtime (interworking with runC)

containerd-shim

Containerd-shim is a component of Containerd. It is used to separate containerd daemons and container processes. Containerd uses shim to call runc’s package function to start the container.

Each component is registered as a plug-in

cri /run/containerd/containerd.sock

  • containers/tasks/event/snapshots/namespace/tasks/image /run/containerd/containerd.sock
  • Low memory/run/containerd/containerd sock. TTRPC
  • debug /run/containerd/debug.sock /debug/pprof
  • metrics, metrics.sock, /v1/metrics
  • Bolt metadata store uses the same structure as etCD underlying

Enable containerd and Moby startup modes directly

  • Detection/run/containerd containerd. The existence of the sock and determine whether to start the containerd
  • Start Containerd with the Supervisor and make a direct binary call

The container’s namespace

Container NS is mainly used to divide namespaces on the Linux level, so it is divided into the following three mainstream namespaces.

Currently, the most common namespace types are divided into the following two types:

  • io.kubernetes.cri.container-type

  • io.kubernetes.docker.type

Containerd-shim Manages the container life cycle

Containerd-shim is similar to Docker-Shim, which is also a gRPC interaction layer and provides standardized APIS to the underlying runtimes. Its features are

  • Allows runC to exit after creating &running the container
  • Using shim as the parent of the container, rather than containerd as the parent of the container directly, is intended to prevent the shim from being left when containerd dies, thus ensuring that the container’s open file descriptor will not be closed
  • Relying on shim to collect and report container exit status eliminates the need for Containerd to wait child processes

The main purpose of using shim is to decouple containerd from the real container.

Returning to the first feature point, why should RUNc be allowed to exit? Because the binaries compiled by Go are statically linked by default, a machine with N containers will consume M*N memory, where M is the amount of memory consumed by a runc. However, for the reasons described above, you don’t want containerd to be the parent of the container directly, so you need something smaller than runc to be the parent: a shim.

Interaction with OCI components

RunC is the product of standardization, and to prevent one commercial company from dominating the containerization standard, the OpenContainers organization was formed.

With runC, the OCI standard for container Runtime defines the operating state of the specified container and the commands that the Runtime needs to provide.

RunC has the following features:

  • Build out the binary direct call
  • Used to update dockerd configuration files config.v2.json, hostconfig.json files
  • K8s Runtime class is provided
  • Interaction standardization with OCI

docker-init

On UNIX systems, process 1 is the init process and the parent of all orphan processes. When docker is used, process 1 in the container is the ENTRYPOINT given without the –init argument, such as sh in the following example. With — init, process 1 will be tini:

Therefore, when starting a container in docker, the docker-init process is used by default

  • Avoid zombie processes
  • Default signal processing