Although Dockerfile simplifies the image building process and makes it version-controlled, many people feel the urge to package everything they might need into an image. This improper use of Dockerfiles can also cause a number of problems:
- Docker image is too large. If you use or build a lot of images, you’re bound to run into very large images, some of which can exceed 2 gigabytes
- The docker image takes too long to build. Each build can take a long time, which can be a big problem in places where you need to build mirrors frequently (such as unit tests)
- Repeat the work. Most of the content is identical and repetitive between multiple mirror builds, but it wastes time and resources to do it all over again
This article will address some of these issues.
Author: cizixs
Date: 2017-03-28
Original link: cizixs.com/2017/03/28/…
Hope readers can have a certain understanding of Docker image, reading this article requires at least the following prerequisite knowledge:
- Understand the basic concepts of Docker and have run containers
- Familiar with the basic knowledge of Docker images and know the hierarchical structure of images
- Preferably responsible for building a docker image (using the docker build command to create your own image)
Dockerfile and image build
A Dockerfile is made up of instructions, each corresponding to a layer of the final image. The first word of each line is the command, and all the strings that follow are arguments to the command. For details about the commands supported by Dockerfile and their usage, please refer to the official documentation.
When running the docker build command, the entire build process looks like this:
- Read the Dockerfile file and send it to the Docker Daemon
- Read all the context files in the current directory and send them to the Docker daemon
- The Dockerfile is parsed and processed into the structure of commands and corresponding parameters
- Loop through all commands in sequence, calling the corresponding handler for each command
- Each command (except FROM) is executed in a container, and a new image is generated as a result
- Label the resulting image
Some best practices for writing Dockerfiles
1. Use a unified Base image
Some articles, such as Busybox or Alpine, advocate using the smallest base image possible. I prefer to use familiar basic images, such as Ubuntu and centos, because basic images only need to be downloaded once and can be shared without wasting too much storage space. The advantage is that the ecology of these images is relatively complete, which makes it easy for us to install software and debug in addition to problems.
2. Separation of static and static
Content that changes frequently is separated from content that doesn’t change much, and content that doesn’t change much is placed on the lower layer, creating different base images for the upper layer to use. For example, you can create base images for a variety of languages, python2.7, Python3.4, GO1.7, Java7, and so on. These images contain the most basic language libraries on which each group can proceed to build application-level images.
3. Rule of minimum: Only install what is essential
When many people build mirrors, they have the urge to pack everything they can possibly use into them. To curb this, the mirror should contain only what is necessary and nothing that may or may not be included. Because the image is easy to extend, it is also easy to modify when the container is running. This ensures that the image is as small as possible, that it is built as quickly as possible, and that it ensures faster transmission and less network resources in the future.
4. One rule: Each mirror has only one function
Do not run multiple processes with different functions in the container. Install only one application package and file in each image. Applications that need to interact communicate via POD (a feature provided by Kubernetes) or the network between containers. This ensures modularity, separate maintenance and upgrades for different applications, and reduces the size of a single image.
5. Use fewer layers
Although it may seem as though different commands are separated as possible, writing in multiple commands is easy to read and understand. However, this can result in too many mirror layers to manage and analyze, and the number of mirror layers is limited. Try to put related content in the same layer, using the newline to separate, this can further reduce the size of the image, and easy to view the image history.
RUN apt-get update\
&& apt-get install -y --no-install-recommends \
bzr \
cvs \
git \
mercurial \
subversion \
&& apt get clean
Copy the code
6. Reduce the content of each layer
Although only the required content is installed, additional content or temporary files may be generated in the process, and we want to keep the installation of each layer to a minimum.
- Such as the use
--no-install-recommends
Parameters to tellapt-get
Do not install recommended packages - Install the software package, clear
/var/lib/apt/list/
The cache - Delete intermediate files: for example, downloaded zip packages
- Delete temporary files: If the command produces temporary files, delete them as soon as possible
7. Do not change file permissions in Dockerfile alone
Because Docker images are layered, any changes create a new layer, as do changes to file or directory permissions. If a single command changes the permissions of large files or directories, it will make a copy of those files, which can easily lead to a large image.
The solution is as simple as either setting the file’s permissions and users before adding them to the Dockerfile, making these changes in the container startup script (EntryPoint), or copying the file and changing the permissions together (which ultimately adds just one more layer).
8. Use cache to speed up builds
If Docker finds that a layer already exists, it simply uses the existing layer without re-running it. If you run the Docker build multiple times in a row, you’ll find that the second run ends quickly.
However, starting with version 1.10, the introduction of Content Addressable Storage has led to the inefficiency of caching, with the –cache-from parameter now being introduced to manually specify an image to use for its cache.
9. Version control and automated builds
It is better to put the Dockerfile in version control with the corresponding application code, and then automatically build the image. The advantage of this is that you can trace the contents of each version of the image, so that you can easily understand the differences between different images, which is good for debugging and rollback.
In addition, if there are many parameters or environment variables to run the image, there should be a corresponding document to explain, and the document should be updated as the Dockerfile changes, so that anyone can refer to the document to use the image easily, rather than download the image without knowing how to use it.
The resources
- Best Practices for writing Dockerfiles
- Refactoring a Dockerfile for Image Size
- How to Not Be the Engineer Running 3.5GB Docker Images