Today, Zouyee’s friend Duan Brings us Containerd Mirror Lazy-pulling Interpretation, in which Kuberneter scheduling from shallow to deep: Framework is in the works, please look forward to it.

The background,

As we know, the container takes a very fast time to run, but if the image of the container does not exist on the node, then the image must be pulled before running the container. The pull image takes a long time during the container startup process, and this process requires all the image layers of the container to be pulled to the local disk. According to statistics, the pull image operation takes up 76% of the container startup time. This is not a problem with a small number of containers, but a large number of containers and a cold start can be very slow.

How to solve the problem that the image is pulled slowly during the cold startup of a container? One solution is that during container startup, instead of pulling down all the layers of the image, the image to be used by the container is read from the image repository on demand over a high-speed network. Stargz-snapshotter is a Proxy Plugin for Containerd that extends containerd’s functionality. It is a remote-Snapshotter implementation of Containerd.

Version changes

Second, the use of

Stargz-snapshotter is easier to use in Kuberentes: Use Containerd as the KUberentes CRI runtime. Start a stargz-Snapshotter service locally as a remote Snapshotter for Containerd.

Image conversion

Before using it, we need to convert our common image to an image that can be recognized by Stargz-Snapshotter, and use the cDR-remote tool to convert. The following example is to convert a local centos image and push it to the image repository after the conversion is complete:

$ ctr-remote image optimize --plain-http --entrypoint='[ "sleep" ]' --args='[ "3000" ]' centos:7 centos:7-eg
Copy the code

contrast

Using crictl to pull local images before and after the cast, it is faster to pull images using lazy:

$time crictl pull centos:7 Image is up to date for Sha256: ef8f4eaacef5da519df36f0462e280c77f8125870dbb77e85c898c87fdbbea27real 0 m5. 967 suser 0 m0. 009 ssys 0 m0. # 012 s pull the optimized image  $ time crictl pull centos:7-eg Image is up to date for Sha256:36 edf0c0bb4daca572ba284057960f1441b14e0d21d5e497cb47646a22f653d6real 0 m0. 624 suser 0 m0. 012 ssys 0 m0. The 010 sCopy the code

Viewing the Mirror Layer

/.stargz-snapshotter/; /.stargz-snapshotter/; /.stargz-snapshotter/

$ cat /.stargz-snapshotter/*{"digest":"sha256:857949cb596c96cc9e91156bf7587a105a2e1bc1e6db1b9507596a24a80f351a","size":8000584 5, 3055845, "fetchedSize" : "fetchedPercent" : 3.819527185794988} {" digest ":" sha256:8 c57b1a6bef1480562bc27d145f6d371955e1f1901e Bdea590f38bfedd6e17d0 ", "size" : 33614550, "fetchedSize" : 64550, "fetchedPercent", 0.19202993941611593}Copy the code

Third, the principle of

This is an overview of the stargz-Snapshotter implementation. Normally, when you pull an image, you pull off each layer of the image. With Stargz-Snapshotter, containerd is no longer the layer to pull the image. Instead, create a directory on the container run node for each layer of images stored in the mirror repository and mount it remotely to each directory. Overlay mount each directory before the container is started to provide a rootFS for the container. When a file needs to be read, the file in the mirror layer of the mirror warehouse is read over the network.

Let’s take a look at how the mirror layer mounts remotely and reads files on demand from the mirror layer.

User-mode file system

Stargz-snapshotter implements user-mode file systems using FUSE. FUSE (Filesystem in userspace) framework is a kernel module that enables users to implement file systems in userspace and mount them to a directory just as they would in the kernel.

As shown above, Stargz-Snapshotter is an application that implements a user-mode file system (golang language, using Go-FUSE as the implementation’s dependency). When a mirror is pulled, the stargz-snapshotter creates a directory at ${stargz-root}/snapshotter/snapshots/ for each layer of the mirror, performs the logic for implementing a file system, and mounts the file system to the newly created directory. For example, in the figure/dcos/snapshotter/snapshots / 1. When a user reads a file in a directory, the request flows like this:

① The operation request is routed from VFS to FUSE

The FUSE kernel module invokes the stargz-Snapshotter logic based on the request type, and stargz-Snapshotter reads the files in this layer from the mirror repository

③ Stargz-snapshotter returns the file contents to the system call through the VFS

(e) stargz format

A. stargz format

Usually the image layers stored in the image repository are compressed using gzip, and we cannot extract individual files from this compressed file. How does stargz-Snapshotter read a single file from a single image layer?

Stargz-snapshotter uses another compressed image layer format, which is also a Gzip package, a seekable Gzip package. Figure 3 shows targz vs. Stargz. The files in the compressed package can be retrieved and extracted, but are still in ZIP format. Each file in the image layer is grouped into a zip package, which then forms a large ZIP package. The entire ZIP package has a TOC file that records the offsets of each file in the package; Footer takes up the last 47 bytes and records the offset of the TOC throughout the ZIP package.

This allows you to find the offset of the TOC by looking at the last 47 bytes of the Footer of the mirror layer, and then reading the content of the TOC to see what files are in the entire mirror layer and what the offset of each file is. Stargz-snapshotter is using this TOC file to retrieve files from the entire image layer on demand.

B. estartgz format

By default, stargz-Snapshotter creates a background task to cache the image layer after a layer of the image is remotely mounted to the target host. During container startup, stargz-Snapshotter needs to read files from the mirror repository over the network if the files required for container startup are not cached locally, which results in slow container startup.

Estargz is an optimization of the Stargz format, as shown in the figure above. It has a Landmark file, which divides the files in the image layer into two categories: those that are most likely to be used when the container is running, and those that are likely not. This gives background tasks priority to cache files that the container needs to run, which increases the local cache hit ratio and speeds up container startup.

The following figure shows the comparison of the three image layers. Legacy is a normal image layer, Stargz is a StarGZ image layer, and Estargz is an optimized StarGZ image layer.

C. Pull mirrors by layers

The image layer uses estargz format to retrieve files from compressed packages. How does Stargz retrieve all or part of the file by shard from the image repository?

In the OCI specification, there is a description of how to retrieve some of the data from the repository, and the Docker Registry also has an interface implementation.

The interfaces in Registry to get mirroring layer deployment data are as follows:

Where name is the name of the target repository, digest is the digest value of the image layer bloB, Host is the address of the image repository, and Range describes the bloB fragment to fetch.

The following response is returned:

Where, start is the beginning byte, end is the end byte, size is the layer size, length is the layer fragment of the request.

Lazy – pulling process

Containerd uses stargz-snapshotter to pull out an image as follows:

① Parse the manifest digest value based on the image name and tag

② Download the manifest from the manifest repository according to the value of the manifest digest and save it in the Content Store

③ Obtain the digest value of the image config according to the contents of the manifest, download the config from the image repository, and save it in the Content Store

If containerd uses stargz-snapshotter, it will return an error that already exists in snapshot. The logic of stargz-Snapshotter PrepareSnapshot is the process of preparing a file system for the current layer and mounting it locally.

⑤ After all mirror layers are parsed, metadata of the mirror is saved

Four, summary

When creating a container, the pull image process takes up a high proportion of the container startup time. Usually, we use various methods to make the image as small as possible, or distribute the image through P2P network. Mirror lazy pull is another way to speed up mirror distribution.

When using stargz-Snapshotter, only the manifest and config of the image are downloaded and each layer of the image is remotely mounted to the current host. The container can read files on demand when running. The traditional way is to download each layer of the image locally for decompression. In contrast, the former can speed up the drawing speed of the image and the cold start speed of the container. Note, however, that files are loaded on demand and depend on a good network environment.

For follow-up information, please check the official account:

The resources

1. www.usenix.org/conference/…

2. github.com/google/crfs

3. Github.com/opencontain…

4. Docs.docker.com/registry/sp…