This article will mainly talk about the knowledge about images in Docker, which will involve the following aspects:

  • Use of the Docker images command
  • What does the pull command do when docker and Registry interact
  • docker storage driver
  • The format and actual organizational structure of AUFS
  • Relationship between Dockerfile primitives and Docker images

Author: CiziXS Time: 2016-04006 Link: cizixs.com/2016/04/06/…

Introduction to the

  • Docker images represent the contents of the container’s file system and are the basis of the container. Images are generally generated by Dockerfile
  • Docker’s images are layered, and all images (except the base images) are generated by adding their own layer of content to the previous images
  • The metadata for each layer is stored in JSON files, which contain dynamic data in addition to static file systems

Use the image: docker image command

Docker Client provides a variety of commands to interact with daemons to complete various tasks, including mirroring commands:

  • docker images: lists the images on the Docker host machine that can be used-ffiltering
  • docker build: Builds an image from the Dockerfile
  • docker history: Lists the history of a mirror
  • docker import: Creates a new file system image from the tarball
  • docker pull: Pull images from Docker Registry
  • docker push: Pushes the local image to Registry
  • docker rmi: Deleting a mirror
  • docker save: Save the image as a tar file
  • docker search: Search for images on docker Hub
  • docker tag: Tags the image

From the above so many commands, we can see the importance of docker image in the whole system.

Download images: What exactly do pull and push images do?

Docker is a typical C/S architecture if you know the docker architecture. Docker pull and Docker run, which are often used by the client, are sent to the server (Docker Daemon starts docker Server when it is started) for processing. Download image will also deal with Registry, let’s talk about using docker pull, docker exactly what to do!

The Docker client organizes the configuration and parameters and sends the pull instruction to the Docker server. After receiving the instruction, the server will hand it to the corresponding handler. The handler will create a new CmdPull job and run it. This job is registered when the Docker Daemon starts, so control is transferred to the Docker Daemon. How does the Docker Daemon find the image to download based on the registry address, repo name, image name and tag passed in? The specific process is as follows:

  1. Get all mirror ids under the repO:GET /repositories/{repo}/images
  2. Get information for all the tags below the repO:GET /repositories/{repo}/tags
  3. Locate the image UUID based on the tag and download the image
    • Get the image’s history information and download each image layer in turn:GET /images/{image_id}/ancestry
    • If these mirror layers already exist, skip; if they don’t, continue
    • Get json information for the mirror layer:GET /images/{image_id}/json
    • Download image content:GET /images/{image_id}/layer
    • After downloading, save the downloaded content to the local UnionFS system
    • Add the image information you just downloaded to the TagStore

Storage image: Introduction to Docker Storage

As mentioned in the previous section, downloaded images are saved. This section explains how.

UnionFS and aufs

If you know anything about Docker, you’ve heard of the UnionFS concept, which is the basis for docker to implement hierarchical mirroring. Wikipedia explains it this way:

Unionfs is a filesystem service for Linux, FreeBSD and NetBSD which implements a union mount for other file systems. It allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. Contents of directories which have the same path within the merged branches will be seen together in a single merged directory, within the new, virtual filesystem.

In simple terms, multiple folders and files (these are the concept of a system file system) store content and provide virtual file access to the upper (application layer). For example, there is the concept of mirror in Docker. The application layer is just a file that can be read and deleted, but the content and relationship of each mirror layer are managed by UnionFS system at the bottom.

The module responsible for docker’s mirroring is Graph, which provides a consistent and convenient interface on the interface and is implemented at the bottom by calling different drivers. Common drivers include AUFS and Devicemapper. Users can select or even implement their own drivers.

Aufs a storage structure mirrored on a machine

NOTE:

  • Only the Ubuntu :14.04 image was downloaded
  • Docker version 1.6.3
  • Image driver: aufs

Use Docker history to view image history:

root@cizixs-ThinkPad-T450:~# docker imagesREPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE 172.16.1.41:5000/ Ubuntu 14.04 2D24f826cb16 13 months ago 188.3 MB root@cizixs-ThinkPad-T450:~# docker history 2d24
IMAGE               CREATED              CREATED BY                                      SIZE
2d24f826cb16        13 months ago        /bin/sh -c #(nop) CMD [/bin/bash] 0 B
117ee323aaa9        13 months ago        /bin/sh -c sed -i $$$$$$$$$$$$$$$$$$$$$$$$#! /bin/sh' > /usr/sbin/polic 194.5 kB
fa4fd76b09ce        13 months ago        /bin/sh -c #(nop) ADD file:0018ff77d038472f52 188.1 MB
511136ea3c5a        2.811686 years ago                                                   0 B
Copy the code

As you can see, Ubuntu :14.04 has five mirrors. /var/lib/docker/aufs

root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# tree -L 1.Heavy Exercises ── heavy exercises ─ MNTCopy the code

A total of three folders, each folder under the image ID command folder, save the information of each image. Let’s start with the three folders

  • Layers: shows which layers are made up of each image
  • Diff: The difference between each mirror and the previous mirror is the content of this layer
  • MNT: The mount point provided by UnionFS externally, because UnionFS is a plurality of folders and files at the bottom, to provide unified file service to the upper layer, is achieved through the form of mount. Each running container will have a folder in this directory

For example, the diff folder looks like this:

root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# ls diff/2d24f826cb16146e2016ff349a8a33ed5830f3b938d45c0f82943f4ab8c097e7/
root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# ls diff/117ee323aaa9d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c/
etc
root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# ls diff/1c8294cc516082dfbb731f062806b76b82679ce38864dd87635f08869c993e45/
etc  sbin  usr  var
root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# ls diff/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b/
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@cizixs-ThinkPad-T450:/var/lib/docker/aufs# ls diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/
Copy the code

In addition to the actual data, docker still preserved the json format layer for each image metadata, stored in the/var/lib/docker/graph / / json, such as:

root@cizixs-ThinkPad-T450:/var/lib/docker# cat graph/2d24f826cb16146e2016ff349a8a33ed5830f3b938d45c0f82943f4ab8c097e7/json | jq '.'
{
  "id": "2d24f826cb16146e2016ff349a8a33ed5830f3b938d45c0f82943f4ab8c097e7"."parent": "117ee323aaa9d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c"."created": "The 2015-02-21 T02:11:06. 735146646 z"."container": "c9a3eda5951d28aa8dbe5933be94c523790721e4f80886d0a8e7a710132a38ec"."container_config": {
    "Hostname": "43bd710ec89a"."Domainname": ""."User": ""."Memory": 0."MemorySwap": 0."CpuShares": 0."Cpuset": ""."AttachStdin": false."AttachStdout": false."AttachStderr": false."PortSpecs": null,
    "ExposedPorts": null,
    "Tty": false."OpenStdin": false."StdinOnce": false."Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]."Cmd": [
      "/bin/sh"."-c"."#(nop) CMD [/bin/bash]"]."Image": "117ee323aaa9d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c"."Volumes": null,
    "WorkingDir": ""."Entrypoint": null,
    "NetworkDisabled": false."MacAddress": ""."OnBuild": []."Labels": null
  },
  "docker_version": "Garbage"."config": {
    "Hostname": "43bd710ec89a"."Domainname": ""."User": ""."Memory": 0."MemorySwap": 0."CpuShares": 0."Cpuset": ""."AttachStdin": false."AttachStdout": false."AttachStderr": false."PortSpecs": null,
    "ExposedPorts": null,
    "Tty": false."OpenStdin": false."StdinOnce": false."Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]."Cmd": [
      "/bin/bash"]."Image": "117ee323aaa9d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c"."Volumes": null,
    "WorkingDir": ""."Entrypoint": null,
    "NetworkDisabled": false."MacAddress": ""."OnBuild": []."Labels": null
  },
  "architecture": "amd64"."os": "linux"."Size": 0}Copy the code

In addition to json, and a file/var/lib/docker/graph / / layersize preserving the size of the image layer.

Creating a mirror: Cache mechanism of a mirror

When docker builds new images, Docker uses the cache mechanism to improve execution efficiency. To understand this, let’s take a look at what the build command does.

Here’s a simple Dockerfile:

FROM Ubuntu :14.04 RUN apt-get update ADD run.sh/VOLUME /data CMD ["./run.sh"]  
Copy the code

This file is simple, but it contains a lot of commands: RUN, ADD, VOLUME, CMD cover a lot of concepts.

Typically, Docker generates a layer of images for each command. It is also easy to guess what the cache does. If you are building a mirror layer and find that the mirror layer already exists, you can use it instead of rebuilding it. The most important question here is: how do you know that the mirror layer you want to build already exists? The following focuses on explaining this problem.

When the Docker daemon reads the FROM command, it looks for the image locally. If it does not find the image, it retrives the image FROM Registry, as well as the JSON file containing the metadata. Then comes the RUN command. What does the command do if there is no cache?

We already know that each layer of mirroring is made up of file system content and metadata.

The contents of the file system, is apt to get the update command to cause a change in the file, will be saved to the/var/lib/docker/aufs/diff / /, such as the command here mainly modifies the/var/lib/var/cache and below and apt related content:

root@cizixs-ThinkPad-T450:/var/lib/docker# tree -L 2 aufs/diff/e7ae26691ff649c55296adf7c0e51b746e22abefa6b30310b94bbb9cfa6fce63/Aufs/diff/e7ae26691ff649c55296adf7c0e51b746e22abefa6b30310b94bbb9cfa6fce63 / ├ ─ ─ TMP └ ─ ─ var ├ ─ ─ cache └ ─ ─ libCopy the code

If we look at the contents of the JSON file, the most important change is that container_config.cmd has become:

"Cmd": [
  "/bin/sh"."-c"."apt-get update"].Copy the code

In other words, if the next time we build the image, we find that the new image layer is parent and Ubuntu :14.04, and the contents of CMD in the JSON file are the same, then the two images are considered the same and do not need to be rebuilt. Well, when it builds, daemons must run through all the local images, and if they are consistent, they use the images they have already built.

ADD and COPY files

If the Dockerfile has ADD or COPY, how do you determine whether the image is the same? The first thought must be the filename, but even if the filename doesn’t change, then the file can change; Then add the file size, but two files with the same name and size are not necessarily the same content ah! The safest way to do this is to hash, HMM! This is what docker does. Let’s take a look at the json file changes in the ADD layer image:

"Cmd": [
  "/bin/sh"."-c"."#(nop) ADD file:9fb96e5dd9ce3e03665523c164bbe775d64cc5d8cc8623fbcf5a01a63e9223ab in /"].Copy the code

See, ADD is just a hash string, the implementation of the hash algorithm, if you’re interested, you can look at it yourself.

A: hello! Is that really safe?

After reading the above, most students will think that the cache mechanism is a great time saver and space saver. However, there is another problem. Some commands rely on external commands, such as apt-get update or curl http://some.url.com/. If external content changes, Docker cannot detect it and handle it accordingly. So it provides the –no-cache parameter to force the cache mechanism not to be used, so this part is maintained by the user.

In addition, you need to consider cache when writing dockerfiles. This is also mentioned in the official Dockerfile Best Practice.

Run images: Docker images and Docker containers

We all know that docker container is a running Docker image, but there is a problem: Docker image is stored in static things, and the container is dynamic things, so how to manage these dynamic things? Such as:

  • Which processes should be run in the Docker container?
  • How to convert docker image into Docker container?
  • How to dynamically generate IP and hostname in docker container?

This is what json files do. What information is stored in json files? The answer is: everything except the contents of the file system, such as:

  • ENV FOO=BAR: environment variable,
  • VOLUME /some/path: The VOLUME used by the container. At first glance, it appears to be part of the file system. In fact, this part of the content is not determined. Therefore, this part of the content can not be placed in the image layer file
  • EXPOSE 80: The EXPOSE command records the ports to which the container is exposed when it runs, which is also the runtime state and not part of the file system
  • CMD [“./myscript.sh “] : CMD records the entry to the docker container, which is not part of the file system

Ok, so now that we know how these things are stored, how do they actually get loaded into the container when we run the container? The answer is the Docker Daemon, the guy that actually manages the container implementation.

We know that each container is a child of the Docker Daemon when it is actually running:

Root 3249 0.1 6.6 985212 33288? Ssl 04:53 0:19 /usr/bin/docker daemon --insecure-registry 172.16.1.41:5000 --exec-opt native. Cgroupdriver =cgroupfs - the BJP = 10.12.240.1/20 - mtu = 1500 - IP - masq =falseRoot 3597 0.0 0.1 3816 632? Ssl 04:55 0:00 \_ /pause root 3633 0.0 0.1 3816 504? Ssl 04:55 0:00 \_ / Pause root 3695 0.0 0.1 3816 516? Ssl 04:55 0:00 \_ /pause root 3710 0.0 0.1 3816 528? Ssl 04:55 0:00 \_ /pause root 3745 0.0 0.1 3816 504? Ssl 04:55 0:00 \_ / Pause Polkitd 3793 0.0 0.2 36524 1280? Ssl 04:55 0:07 \_ redis-server *:6379 root 3847 0.0 0.0 4184 184? Ss 04:55 0:00 \_ /bin/sh -c /run.sh root 3872 0.0 0.0 17668 360? S 04:55 0:00 | \ _ / bin/bash/run. The sh root 3873 42824 1752 0.0 0.3? Sl 04:55 0:01 | \ _ redis - server * : 6379 root 3865 166256 8024 0.0 1.5? Ss 04:55 0:00 \_ apache2-dforeground 33 3881 0.0 1.0 16628452? S 04:55 0:00 | \ _ apache2 - DFOREGROUND 33 3882 166280 5140 0.0 1.0? S 04:55 0:00 | \ _ apache2 - DFOREGROUND 33 3883 166280 5140 0.0 1.0? S 04:55 0:00 | \ _ apache2 - DFOREGROUND 33 3884 166280 5140 0.0 1.0? S 04:55 0:00 | \ _ apache2 - DFOREGROUND 33 3885 166280 5140 0.0 1.0? S 04:55 0:00 | \ _ apache2 - DFOREGROUND root 3939 90264 4016 0.0 0.7? Ss 04:55 0:00 \_ Nginx: Master Process nginx 33 3947 0.0 0.3 90632 1660? S 04:55 0:00 \_ nginx: Worker process 33 3948 0.0 0.3 90632 1660? S 04:55 0:00 \_ nginx: Worker process 33 3949 0.0 0.3 90632 1660? S 04:55 0:00 \_ nginx: Worker process 33 3950 0.0 0.3 90632 1660? S 04:55 0:00 \_ nginx: worker processCopy the code

That is, the Docker Daemon reads the image information as the container rootfs and then reads the dynamic information in the JSON file as the runtime state.

Deleting a mirror: Cleans a mirror

The mirror is stored locally according to UnionFS format, and deletion is easy to understand, that is, the local file (folder) of the corresponding mirror layer is deleted. Docker also provides the docker rmI command to handle.

However, it is important to note that a mirror has a “reference” and can only be deleted if the mirror layer is not referenced. A “reference” is tagged, and a mirror image of the same UUID can be tagged differently. Here’s an official example:

$ docker images
REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
test1                     latest              fd484f19954f        23 seconds ago      7 B (virtual 4.964 MB)
testLatest FD484F19954F 23 seconds ago 7 B (Virtual 4.964 MB)test2 latest FD484f19954F 23 seconds ago 7 B (Virtual 4.964 MB) $docker rmi fd484f19954f Error: Conflict, cannot delete image fd484f19954f because it is taggedin multiple repositories, use -f to force
2013/12/11 05:47:16 Error: failed to remove one or more images

$ docker rmi test1
Untagged: test1:latest
$ docker rmi test2
Untagged: test2:latest

$ docker images
REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
testLatest FD484F19954F 23 seconds ago 7 B (Virtual 4.964 MB) $Docker RMitest
Untagged: test:latest
Deleted: fd484f19954f4920da7ff372b5067f5b7ddb2fd3830cecd17b96ea9e286ba5b8
Copy the code

Before deleting a mirror with a tag, an untag operation is performed. If the image that you want to delete has other tags, you must remove all tags before you can continue. You can also use the -f parameter to force the deletion.

Another thing to note is that if a mirror has many layers and the middle layer is not referenced, all unreferenced mirrors will be deleted when the mirror is deleted.