OverlayFS introduction
OverlayFS is a stack of file system, it relies on and based on other file system (such as ext4fs and XFS etc.) are not directly involved in the structure of disk space, only the original different directory in the underlying file system “merge”, and then to the user, which is jointly mount technology, compared to AUFS, OverlayFS is faster and simpler to implement. There are two OverlayFS drivers provided by Linux kernel for Docker: overlay and Overlay2. Overlay2 is an improvement over overlays and is more efficient than overlays in inode utilization. However, overlays have environment requirements: Docker version 17.06.02+, host file system needs to be ext4 or XFS format.
Joint mount
Overlayfs passes through three directories: The lower directory, upper directory, and work directory are implemented. The lower directory can be multiple, and the Work directory is the basic directory. The content of the work directory is deleted after mounting, and is not visible to users during use.
demo
The syntax for mounting overlayfs using the mount command is as follows:
mount -t overlay overlay -o lowerdir=lower1:lower2:lower3,upperdir=upper,workdir=work merged_dir
mkdir -p /tmp/test A A/aa B C worker
echo "from A" > A/a.txt
echo "from B" > B/b.txt
echo "from C" > C/c.txt
mount -t overlay overlay -o lowerdir=A:B:lower3,upperdir=C,workdir=worker /tmp/test
ll /tmp/test
total 16
drwxr-xr-x 2 root root 4096 Oct 21 20:24 aa
-rw-r--r-- 1 root root 7 Oct 21 20:23 a.txt
-rw-r--r-- 1 root root 7 Oct 21 20:24 b.txt
-rw-r--r-- 1 root root 7 Oct 21 20:24 c.txt
mount |grep test
overlay on /tmp/test type overlay (rw,relatime,lowerdir=A:B,upperdir=C,workdir=worker)
Copy the code
Overlay driver in Docker
After introducing the principle of overlay driver, let’s take a look at Docker overlay storage driver. The following is a schematic diagram from Docker’s official website.
Merged three layer structures can be seen in the figure above, namely: lowerDIR, uperDIR, merged,
- Rootdir = “image layer”; rootfs = “image layer”; rootfs = “image layer”;
- Upperdir is a layer above lowerdir. This is the read/write layer that is created when a container is started. All changes to the container data occur at this layer.
- The merged directory is the mount point of the container, exposing the user to the unified view, compared to/TMP /test in the example.
And these directories layers are stored in the/var/lib/docker overlay2 / or/var/lib/docker/overlay (if using overlay).
demo
docker run -it --name busybox busybox:latest sh
mount |grep overlay
ls
bin dev etc home proc root sys tmp usr var
mount |grep overlay
overlay on / type overlay (rw,relatime,lowerdir=/export/App/docker/overlay2/l/ZADXGS7J7SCVNZFOPBVA5DGDKA:/export/App/docker/overlay2/l/2NFGMN2NYKTZAIGDJOHC3RB64C,upperdir=/export/App/docker/overlay2/a8866f1095d5d24b04cfa39a600a68a2e027590090381e661561b40b043c56be/diff,workdir=/export/App/docker/overlay2/a8866f1095d5d24b04cfa39a600a68a2e027590090381e661561b40b043c56be/work)
Copy the code
How to work
How does the OverlayFS storage driver work when data changes occur in the container? The reading and writing process is described below:
Read:
- If the file is in the container layer (upperdir), read the file directly.
- If the file is not in the container layer (upperdir), it is read from the mirror layer (lowerdir).
Modification:
- First write: Overlay and overlay2 perform copy_up to copy the file from lowdir to upperdir if it doesn’t exist. Overlayfs is file-level (even if the file is only slightly modified, copy_up will occur). Subsequent writes to the same file here will be performed on copies of the files that have already been copied to the container. This is often called copy-on-write.
- Delete files and directories: When files in the container are deleted, create whiteout files in the container layer (upperdir). Files in the mirror layer (lowerdir) are not deleted because they are read-only, but without files prevent them from being displayed. When directories are deleted in the container, An opaque directory in the container layer (upperdir), which, like whiteout above, prevents the user from continuing access even though the mirror layer is still present.
Matters needing attention
- The copy_UP operation only happens when the file is first written, and thereafter only the copy is modified,
- Overlayfs only works with two levels of directories, and searches are faster than AUFS.
- The image layer is not deleted by the Whiteout file. This is why the image layer will be bigger and bigger with the Docker commit. No matter how much data is deleted at the container layer, the image layer will not change.
Overlay2 Mirrors the storage structure
demo
docker pull ubuntu:latest
latest: Pulling from library/ubuntu
22e816666fd6: Pull complete
079b6d2a1e53: Pull complete
11048ebae908: Pull complete
c58094023a2e: Pull complete
Digest: sha256:a7b8b7b33e44b123d7f997bd4d3d0a59fafc63e203d17efedf09ff3f6f516152
ll /export/docker/overlay2/
total 20
drwx------ 3 root root 4096 Oct 24 17:33 838141f4c7c9149154205bac27d4ae0ae086dfdc4f413407c353a97f3d43ae02
drwx------ 4 root root 4096 Oct 24 17:33 9b82d3a2c058b274dd1051fc0e2c4c9ca1f2793cbdf297aff0fcd10716e991ef
drwx------ 4 root root 4096 Oct 24 17:33 a907b9c7b7bfd96c1441b1a9edc3bea3bc61f2b33fc58e86e3ab68cde9e77e2f
drwx------ 4 root root 4096 Oct 24 17:33 ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd
drwxr-xr-x 2 root root 4096 Oct 24 17:33 l
# Add an "L" directory that contains all layers of soft links. Use a short name for the short links to prevent the parameters from reaching the page size limit when mounting:
ll /export/docker/overlay2/l/ total 16 lrwxrwxrwx 1 root root 72 Oct 24 17:33 AO2U6ITJIKIGXU7M2IJYXDD6N6 -> .. /838141f4c7c9149154205bac27d4ae0ae086dfdc4f413407c353a97f3d43ae02/diff lrwxrwxrwx 1 root root 72 Oct 24 17:33 JKGV2VMSBGJHYRN5XLAB5NVKOH -> .. /ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd/diff lrwxrwxrwx 1 root root 72 Oct 24 17:33 VYRTWUOWCPSALLE5VBODX5GJO4 -> .. /a907b9c7b7bfd96c1441b1a9edc3bea3bc61f2b33fc58e86e3ab68cde9e77e2f/diff lrwxrwxrwx 1 root root 72 Oct 24 17:33 YIAHNEKOMA24I4LWRIDEODBZ2H -> .. /9b82d3a2c058b274dd1051fc0e2c4c9ca1f2793cbdf297aff0fcd10716e991ef/diffThe lower file is used to record the short name of the parent layer, and the work directory is used to jointly mount the specified working directory.
ll ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd/
total 16
drwxr-xr-x 3 root root 4096 Oct 24 17:33 diff
-rw-r--r-- 1 root root 26 Oct 24 17:33 link
-rw-r--r-- 1 root root 86 Oct 24 17:33 lower
drwx------ 2 root root 4096 Oct 24 17:33 work
Copy the code
How do these directories and mirrors fit together? The answer is through metadata association. Metadata is classified into image metadata and Layer metadata.
Image metadata
The image metadata is stored in the /var/lib/docker/image/<storage_driver>/imagedb/content/sha256/ directory. The file name is the image ID. The image ID can be viewed through docker images. These files save the image rootFS information in the form of JSON, when the image was created, build history, the container used, including Entrypoint and CMD launched, and so on. For example, the Ubuntu image id is CF0f3CA922e0:
Docker images REPOSITORY TAG IMAGE ID CREATED SIZE Ubuntu Latest CF0f3CA922E0 5 days ago 64.2MB cat /export/docker/image/overlay2/imagedb/content/sha256/cf0f3ca922e08045795f67138b394c7287fbc0f4842ee39244a1a1aaca8c5e1c
{
"architecture": "amd64"."config": {
"Hostname": ""."Domainname": ""."User": ""."AttachStdin": false."AttachStdout": false."AttachStderr": false."Tty": false."OpenStdin": false."StdinOnce": false."Env": [
"PATH=/usr/local/sb
in:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]."Cmd": [
"/bin/bash"]."ArgsEscaped": true."Image": "sha256:79762efc126691a84ceec9ff37eb51494597a4be3dfb55bb28319edf7d029f04"."Volumes": null,
"WorkingDir": ""."Entrypo
int": null,
"OnBuild": null,
"Labels": null
},
"container": "9cd5aac672039c24c1fc7578b06045a02fdeb97141f53851909df81f5bdf2627"."container_config": {
"Hostname": "9cd5aac67203"."Domainname": ""."User": ""."AttachStdin": false."AttachStdout": false."AttachStderr": false."Tty": false."OpenStdin": false."StdinOnce": false."Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]."Cmd": [
"/bin/sh"."-c"."#(nop) "."CMD [\"/bi
n/bash\"]"]."ArgsEscaped": true."Image": "sha256:79762efc126691a84ceec9ff37eb51494597a4be3dfb55bb28319edf7d029f04"."Volumes": null,
"WorkingDir": ""."Entrypoint": null,
"OnBuild": null,
"Labels": {}},"created": "The 2019-10-1 8 t18:48:51. 632346407 z"."docker_version": "18.06.1 - ce"."history": [{"created": "The 2019-10-18 T18:48:49. 35320434 z"."created_by": "/bin/sh -c #(nop) ADD file:d13b09e8b3cc98bf0868e2af7a49b14622d2111e2a4e10341859902e43bd87
2a in / "
},
{
"created": "The 2019-10-18 T18:48:50. 103034647 z"."created_by": "/bin/sh -c [ -z \"$(apt-get indextargets)\" ]"
},
{
"created": "The 2019-10-18 T18:48:50. 787856146 z"."created_by": "/bin/sh -c set -xe \t\t\u0026\u0026 echo '#! /bin/sh' \u003e /usr/sbin/policy-rc.d \t\u0026\u0026 echo 'exit 101' \u003e\u003e /usr/sbin/policy-rc.d \t\u0026\u0026 chmod +x /usr/sbin/policy-rc.d \t\t\u0026\u0026 dpkg-divert --local --rename --add /sbin/initctl \t\u0026\u0026 cp -a /usr/sbin/policy-rc.d /sbin/initctl \t\u0026\u0026 sed -i 's/^exit.*/exit 0/' /sbin/initctl \t\t\u0026\u0026 echo 'force-unsafe-io' \u003e /etc/dpkg/dpkg.cfg.d/docker-apt-spe edup \t\t\u0026\u0026 echo 'DPkg::Post-Invoke { \"rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true\"; }; ' \u003e /etc/apt/apt.conf.d/docker-clean \t\u0026\u 0026 echo 'APT::Update::Post-Invoke { \"rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true\"; }; ' \u003e\u003e /etc/apt/apt.conf.d/docker-clean \t\u0026\u0026 echo 'Dir::Cache::pkgcache \"\"; Dir::Cache::srcpkgcache \"\"; ' \u003e\u003e /etc/apt/apt.conf.d/docker-clean \t\t\u0026\u0026 echo 'Acquire::Languages \"none\"; ' \u003e /etc/apt/apt.conf.d/docker-no-languages \t\t\u0026\u0026 echo 'Acquire::GzipIndexes \"true\"; Acquire::CompressionTypes::Order:: \"gz\"; ' \u003e /etc/apt/apt.conf.d/docker-gzip-indexes \t\t\u0026\u0026 echo 'Apt::AutoRemove::SuggestsImportant \"fals e\"; ' \u003e /etc/apt/apt.conf.d/docker-autoremove-suggests"
},
{
"created": "The 2019-10-18 T18:48:51. 466908877 z"."created_by": "/bin/sh -c mkdir -p /run/systemd \u0026\u0026 echo 'docker' \u003e /run/systemd/container"
},
{
"created": "The 2019-10-18 T18:48:51. 632346407 z"."created_by": "/bin/sh -c #(nop) CMD [\"/bin/bash\"]"."empty_layer": true}]."os": "linux"."rootfs": {
"type": "layers"."diff_ids": [
"sha256:a090697502b8d19fbc83afb24d8fb59b01e48bf87763a00ca55cfff42423ad36"."sha256:97e6b67a30f1efeb050ada13c2afa1afd748e175ae744027dd0cce1f2931a594"."sha256:100ef12ce3a46c3242d186dbbadedff1638dc1f69cab4e1fbf73489049c01c25"."sha256:19331eff40f01dd084a3f966cc6939e828d617d777163706b8a13d0f972704d1"]}}Copy the code
The above diff_id corresponds to a mirrored layer, and it is ordered from top to bottom, representing the lowest to the highest levels of the mirrored layer: how does diff_id relate to the layers? Specifically, Docker uses each Diff_id and history information in RootFS to calculate the corresponding content addressing index (chainID), and chaiID is associated with layer, which in turn is associated with the image file of each image layer.
Layer metadata
The image layer contains only one specific image layer file package. After users download an image layer on the docker host, Docker will build local layer metadata on the host based on the image layer file package and image metadata, including diff, parent, size, etc. And when Docker upload the new image layer generated on the host to Registry, the metadata on the host related to the new image layer will not be packaged and uploaded together with the image layer.
Docker defines two kinds of interfaces: Layer and RWLayer, which are used to define some operations of the read-only Layer and the read-write Layer respectively. RoLayer and mountedLayer are also defined to implement the above two kinds of interfaces respectively. RoLayer is used to describe the immutable mirror layer, and mountedLayer is used to describe the read-write container layer.
roLayer
To be specific, the roLayer stores chainID, diffID, parent, storage_driver, cacheID, size of the current mirror layer, etc. The metadata is stored in the /var/lib/docker/image/
/layerdb/sha256/
/ folder. As follows:
ll /export/docker/image/overlay2/layerdb/sha256
total 16
drwx------ 2 root root 4096 Oct 24 17:33 a090697502b8d19fbc83afb24d8fb59b01e48bf87763a00ca55cfff42423ad36
drwx------ 2 root root 4096 Oct 24 17:33 b9997ded97a1c277d55be0d803cf76ee6e7b2e8235d610de0020a7c84c837b93
drwx------ 2 root root 4096 Oct 24 17:33 c808877c0adcf4ff8dcd2917c5c517dcfc76e9e8a035728fd8f0eae195d11908
drwx------ 2 root root 4096 Oct 24 17:33 cdf75cc6b4d28e72a9931be2a88c6c421ad03cbf984b099916a74f107e6708ff
Copy the code
ll /export/docker/overlay2/
total 20
drwx------ 3 root root 4096 Oct 24 17:33 838141f4c7c9149154205bac27d4ae0ae086dfdc4f413407c353a97f3d43ae02
drwx------ 4 root root 4096 Oct 24 17:33 9b82d3a2c058b274dd1051fc0e2c4c9ca1f2793cbdf297aff0fcd10716e991ef
drwx------ 4 root root 4096 Oct 24 17:33 a907b9c7b7bfd96c1441b1a9edc3bea3bc61f2b33fc58e86e3ab68cde9e77e2f
drwx------ 4 root root 4096 Oct 24 17:33 ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd
Copy the code
There are three files in each chainID directory: cache-id, diff, size: cache-id: Docker randomly generated uuid, content is to save the image layer indexes, namely the/var/lib/the docker overlay2 / directory, which is why can find the corresponding layer through chainID directory.
For chainID d801a12f6af7beff367268f99607376584d8b2da656dcd8656973b7ad9779ab4 for corresponding directory 130 ea10d6f0ebfafc8ca260992c8d0bef63a1b5ca3a7d51a5cd1b1031d23efd5, there would be stored in the/var/lib/ea10d6f0ebfafc8ca26099 docker overlay2/130 2c8d0bef63a1b5ca3a7d51a5cd1b1031d23efd5
# cat a090697502b8d19fbc83afb24d8fb59b01e48bf87763a00ca55cfff42423ad36/cache-id
838141f4c7c9149154205bac27d4ae0ae086dfdc4f413407c353a97f3d43ae02
# cat b9997ded97a1c277d55be0d803cf76ee6e7b2e8235d610de0020a7c84c837b93/cache-id
9b82d3a2c058b274dd1051fc0e2c4c9ca1f2793cbdf297aff0fcd10716e991ef
# cat c808877c0adcf4ff8dcd2917c5c517dcfc76e9e8a035728fd8f0eae195d11908/cache-id
ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd
# cat cdf75cc6b4d28e72a9931be2a88c6c421ad03cbf984b099916a74f107e6708ff/cache-id
a907b9c7b7bfd96c1441b1a9edc3bea3bc61f2b33fc58e86e3ab68cde9e77e2f
Copy the code
Diff file: Saves the Diff_ID in the image metadata (corresponding to the UUID in the DIFF_IDS in the metadata JSON). Among all the attributes of the layer, the diffID is calculated based on the contents of the image layer packet using the SHA256 algorithm. ChainID is the index based on the content store, which is calculated based on the diffID between the current layer and all the ancestor mirror layers, as follows: If the mirror layer is the lowest level (no parent mirror layer), the diffID of that layer is chainID. The chainID calculation formula of the mirror layer is chainID(n)=SHA256(chain(n-1) diffID(n)), that is, the SHA256 verification code is calculated based on the chainID of the parent mirror layer plus a space and the diffID of the current layer.
# cat a090697502b8d19fbc83afb24d8fb59b01e48bf87763a00ca55cfff42423ad36/diff
sha256:a090697502b8d19fbc83afb24d8fb59b01e48bf87763a00ca55cfff42423ad36
# cat b9997ded97a1c277d55be0d803cf76ee6e7b2e8235d610de0020a7c84c837b93/diff
sha256:97e6b67a30f1efeb050ada13c2afa1afd748e175ae744027dd0cce1f2931a594
# cat c808877c0adcf4ff8dcd2917c5c517dcfc76e9e8a035728fd8f0eae195d11908/diff
sha256:19331eff40f01dd084a3f966cc6939e828d617d777163706b8a13d0f972704d1
# cat cdf75cc6b4d28e72a9931be2a88c6c421ad03cbf984b099916a74f107e6708ff/diff
sha256:100ef12ce3a46c3242d186dbbadedff1638dc1f69cab4e1fbf73489049c01c25
Copy the code
Size file: Saves the size of the mirror layer
mountedLayer
The mountedLayer information stores the readable init layer and the container mount point information: the container init layer ID(init-id), the ID used for federated mount (mount-id), and the chainID(parent) of the parent image of the container layer. The files are stored in the /var/lib/docker/image/<storage_driver>/layerdb/mounts/<container_id>/ directory. Can see initID is added one after mountID – init, at the same time initID is stored in the/var/lib/docker/overlay2 / directory name:
View mountID can also directly through the mount command to check the corresponding mount mountID, corresponds to the/var/lib/docker overlay2 / directory, that is merged overlayfs present directory:
The init layer is defined as a uUID +-init layer, which is located between the read/write layer and the read/write layer. The init layer is used to store information such as /etc/hosts, /etc/resolv.conf, etc. The init layer is required because when the container is started, these files or directories should be in the image layer. For example, the user needs to change the hostname, but the image layer is not allowed to change, so the init layer is mounted separately at startup, and the files in the init layer are modified. These changes usually take effect only in the current container, and the init layer is not committed when the Docker commit is mirrored. The layer file storage directory to/var/lib/docker overlay2 / < init_id > / diff
docker run -it --name test ubuntu:latest /bin/bash
root@079d1e912224:/# echo "test" > /tmp/test
root@079d1e912224:/# mount |grep overlay
overlay on / type overlay (rw,relatime,lowerdir=/export/docker/overlay2/l/W5IG7GJ5GPVE2KH6U6QF227O6C:/export/docker/overlay2/l/JKGV2VMSBGJHYRN5XLAB5NVKOH:/export/docker/overlay2/l/VYRTWUOWCPSALLE5VBODX5GJO4:/export/docker/overlay2/l/YIAHNEKOMA24I4LWRIDEODBZ2H:/export/docker/overlay2/l/AO2U6ITJIKIGXU7M2IJYXDD6N6,upperdir=/export/docker/overlay2/b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1/diff,workdir=/export/docker/overlay2/b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1/work)
ll /export/docker/image/overlay2/layerdb/mounts/079d1e912224e0a00f77612d4f5731027055d3d02e8f6912298d67a852900d21/ init-id mount-id parent cat /export/docker/image/overlay2/layerdb/mounts/079d1e912224e0a00f77612d4f5731027055d3d02e8f6912298d67a852900d21/init-id
b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1-init
cat /export/docker/image/overlay2/layerdb/mounts/079d1e912224e0a00f77612d4f5731027055d3d02e8f6912298d67a852900d21/mount-id
b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1
cat /export/docker/image/overlay2/layerdb/mounts/079d1e912224e0a00f77612d4f5731027055d3d02e8f6912298d67a852900d21/parent
sha256:c808877c0adcf4ff8dcd2917c5c517dcfc76e9e8a035728fd8f0eae195d11908
ll /export/docker/overlay2/
total 28
drwx------ 3 root root 4096 Oct 24 17:33 838141f4c7c9149154205bac27d4ae0ae086dfdc4f413407c353a97f3d43ae02
drwx------ 4 root root 4096 Oct 24 17:33 9b82d3a2c058b274dd1051fc0e2c4c9ca1f2793cbdf297aff0fcd10716e991ef
drwx------ 4 root root 4096 Oct 24 17:33 a907b9c7b7bfd96c1441b1a9edc3bea3bc61f2b33fc58e86e3ab68cde9e77e2f
drwx------ 5 root root 4096 Oct 24 18:06 b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1
drwx------ 4 root root 4096 Oct 24 18:06 b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1-init
drwx------ 4 root root 4096 Oct 24 17:33 ba3c23adebb270971f52a2a2d21c6893f11526e57528364bf45313d6278719dd
drwxr-xr-x 2 root root 4096 Oct 24 18:06 l
cat b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1/diff/tmp/test
test
cat b65d22f08f3a34bc7fd2afb4f2e534980f17f549bfeed6a40933396e9de665f1/merged/tmp/test
test
Copy the code
Init layer: it is used to modify some files in the container such as /etc/hostname, /etc/resolv.conf, etc. Use federated mount to provide users with read and write directories
mount
FROM Ubuntu :14.04: Set the base image. In this case, the base image layer of Ubuntu :14.04 is used. For simplicity, it is shown as a whole.
Sh /: ADD the file run.sh in the directory where Dockerfile resides to the root directory of the image. At this time, the image of the new layer has only one item, that is, run.sh in the root directory.
VOLUME /data: specifies the VOLUME of the mirror. The path of the VOLUME in the container is /data. Note that at this point, no files have been added to the image at the new layer, but the IMAGE’s JSON file has been updated to get this information when the container is started with this image.
CMD [“./run.sh”]: sets the default entry for the image. This command does not add any files to the new image, but only updates the JSON file of the new image based on the json file of the previous image.
The top two layers in the figure are the contents created by Docker for Docker containers, and these two layers belong to the category of containers. The two layers are the Init Layer and read-write Layer of the Docker container.
Initial layer: This layer is used to initialize environment information related to the container, such as container host name, host information, and domain name service file.
Read/write layer: Processes in the Docker container have write permissions Only to the Read/write layer. Other layers are read-only to the process. In addition, the hosts, hostname, resolv.conf files about the VOLUME and the container will be mounted here.
reference
overlayfs
Docker Union Filesystem