Docker image slimming

Docker is a development platform for developing, delivering and running applications. It separates the application from the infrastructure, ensuring that the environment for development, testing, and deployment is identical, resulting in rapid delivery. However, in actual projects, modules or services in the project will be subdivided, resulting in too many images (50+), too large (50G+ after packaging and compression), which brings great hidden dangers to deployment, especially private deployment (deployment by copying images on mobile media). This paper starts with several articles on image thinning, and carries out practical verification. Combined with the official best practices of Dockerfile, it summarizes four methods of image compression and many skills in daily practice.

Image building

Build a way

There are two ways to build images. One is to build images by executing the instructions in Dockerfile through a Docker build, and the other is to package existing containers into images through a Docker commit. Usually we use the first method to build containers, and the difference between the two is like batch and single step.

Volumetric analysis

Docker image is made up of multiple Layers (maximum 127 Layers). Each entry in the Dockerfile creates a Docker image layer, but only RUN, COPY, and ADD increase the size of the image. You can use the docker history image_id command to see the size of each layer. Here’s a look at the official Alpine :3.12 mirror layer.

FROM scratch
ADDAlpine - minirootfs - 3.12.0 - x86_64. Tar. Gz /
CMD ["/bin/sh"]
Copy the code

Compare Dockerfile and image history layer discoveryADDThe command layer occupies a 5.57m size, whereasCMDThe command layer takes no space.

The mirrored layer is likeGitFor every submissionCommitTo save the differences between the previous version and the current version of the image. So when we usedocker pullWhen the command pulls an image from a public or private Hub, it only downloads the layers that we don’t already own. This is a very efficient way to share images, but it is sometimes misused, such as repeatedly committing.In the figure above, the base image Alpine :3.12 takes up 5.57m and the idps_SM.tar. gz file takes up 4.52m. But the commandRUN rm -f ./idps_sm.tar.gzThere is no reduction in the mirror size, the mirror size consists of one base mirror and twoADDFile composition.

Thin body method

Now that you know why the volume increases in mirror builds, you can do the right thing: reduce the number of layers or reduce the size of each layer.

There are several ways to reduce the number of layers:
1. RUN instruction merge
2. Multistage build
There are several ways to simplify each layer:
1. Use an appropriate base image (Alpine preferred)
2. Delete the cache file of RUN

Mirror thin body

As for the actual operation of the image thinning section, take the packaging of Redis image as an example. Before packaging, we first pulled the official Redis image, and found that the image with the label 6 was 104M in size, and the image with the label 6-Alpine was 31.5m in size. The packaging process is as follows:

Select the base image, update the software source, and install the packaging tool
Download the source code and package it for installation
Clean up unnecessary installation files

Following the process above, let’s write a Dockerfile with the command docker build –no-cache -t optimize/redis:multiline -f redis_multiline. The image size is 441M.

FROM ubuntu:focal

ENV REDIS_VERSION=6.0.5
ENV REDIS_URL=http://download.redis.io/releases/redis-$REDIS_VERSION.tar.gz

# update source and install tools
RUN sed -i "s/archive.ubuntu.com/mirrors.aliyun.com/g; s/security.ubuntu.com/mirrors.aliyun.com/g" /etc/apt/sources.list 
RUN apt update 
RUN apt install -y curl make gcc

# download source code and install redis
RUN curl -L $REDIS_URL | tar xzv
WORKDIR redis-$REDIS_VERSION
RUN make
RUN make install
 
# clean up
RUNrm -rf /var/lib/apt/lists/*

CMD ["redis-server"]
Copy the code

RUN instruction merge

Instruction merge is the simplest and most convenient way to reduce the number of mirror layers. This operation saves space by clearing the cache and tooling software in the same layer. The Dockerfile merged by the directive is as follows, and the image size after packaging is 292M.

FROM ubuntu:focal

ENV REDIS_VERSION=6.0.5
ENV REDIS_URL=http://download.redis.io/releases/redis-$REDIS_VERSION.tar.gz

# update source and install tools
RUN sed -i "s/archive.ubuntu.com/mirrors.aliyun.com/g; s/security.ubuntu.com/mirrors.aliyun.com/g" /etc/apt/sources.list &&\
    apt update &&\
    apt install -y curl make gcc &&\

# download source code and install redis
    curl -L $REDIS_URL | tar xzv &&\
    cd redis-$REDIS_VERSION &&\
    make &&\
    make install &&\

# clean up
    apt remove -y --auto-remove curl make gcc &&\
    apt clean &&\
    rm  -rf /var/lib/apt/lists/* 

CMD ["redis-server"]
Copy the code

usedocker history Analysis optimize/redis: multiline and optimize/redis: singleline image, get the following situations:The image optimize/ Redis: Multiline layer that cleans data does not reduce the size of the mirror. This is the shared mirror layer problem. So the instruction merge method is to reduce the size of the mirror by clearing the cache and the unused tool software in the same layer.

Multistage build

The multi-phase build approach is the official best practice for packaging images, and it is a way to maximize the number of layers. In general terms, it divides the packaging image into two phases, one for development, one for packaging, which contains everything you need to build the application; One is for a production run, which contains only your application and what you need to run it. This is called the builder model. The relationship between the two phases is a bit like the RELATIONSHIP between JDK and JRE. Using multi-stage builds will certainly reduce the size of the image, but the slimmer granularity is language dependent and works better for compiled languages because it removes unwanted dependencies from the compiled environment and uses compiled binaries or JARS directly. The effect is less obvious for interpretive language.

According to the requirements of packaging Redis image above, the multi-stage Dockerfile is used, and the size of the packaged Dockerfile is 135M.

FROM ubuntu:focal AS build

ENV REDIS_VERSION=6.0.5
ENV REDIS_URL=http://download.redis.io/releases/redis-$REDIS_VERSION.tar.gz

# update source and install tools
RUN sed -i "s/archive.ubuntu.com/mirrors.aliyun.com/g; s/security.ubuntu.com/mirrors.aliyun.com/g" /etc/apt/sources.list &&\
    apt update &&\
    apt install -y curl make gcc &&\

# download source code and install redis
    curl -L $REDIS_URL | tar xzv &&\
    cd redis-$REDIS_VERSION &&\
    make &&\
    make install

FROM ubuntu:focal
# copy
ENV REDIS_VERSION=6.0.5
COPY --from=build /usr/local/bin/redis* /usr/local/bin/

CMD ["redis-server"]
Copy the code

Compared to optimize/redis: singleline changes have the following three points:

“As build” is added in the first line to prepare for the following COPY
There is no cleanup in phase 1 because the image built in phase 1 is useful only for the compiled object file (binary or JAR) and nothing else
Phase 2 copies the target file directly from phase 1

Again, usedocker historyView the mirror volume:

Comparing the image we used for multi-stage construction with the official redis:6 (no comparison to Redis: 6-Alpine, because Both Redis :6 and Ubuntu: Focal are based on The image of Debain), we found that both have 30M of space. Redis :6 Dockerfile

serverMd5="$(md5sum /usr/local/bin/redis-server | cut -d' ' -f1)"; export serverMd5; \
find /usr/local/bin/redis* -maxdepth 0 \
		-type f -not -name redis-server \
		-exec sh -eux -c ' \ md5="$(md5sum "$1" | cut -d" " -f1)"; \ test "$md5" = "$serverMd5"; \ ' -- '{}' '; ' \
		-exec ln -svfT 'redis-server' '{}' '; ' \
Copy the code

The binaries redis-server and Redis-check-aof (aof persistent), redis-check-rdb (RDB persistent), and Redis-sentinel (Redis sentinel) are identical files. The size is 11M. The last three images are generated by ln using the script above.

Use the appropriate base image

Base mirroring, recommended Alpine. Alpine is a lightweight Linux distribution that is highly streamlined and includes basic tools. The base image is only 4.41 MB. Every development language and framework has a base image based on Alpine, which is highly recommended. Further, try building the base image using Scratch and BusyBox images. The official images Redis :6 (104M) and Redis: 6-Alpine (35.5m) show that Alpine’s image is only 1/3 of the debian based image.

One thing to note about using Alpine images is that it is based on MUSLC, the glibc replacement standard library, and both libraries implement the same kernel interface. Glibc is more common and faster, while Muslic uses less space and focuses on security. When you compile an application, most of it is compiled for a particular LIBC. If we were to use them with another LIBC, we would have to recompile them. In other words, building containers based on Alpine base images can lead to unexpected behavior because the standard C libraries are different. However, this kind of situation is more difficult to encounter, even if encounter also has a solution.

Delete the cache file of RUN

Most subcontract management software on Linux requires updating the source, which brings with it some cache files, and the common cleanup methods are documented here.

Debian based image

# Change the source and updateSed -i "s/deb.debian.org/mirrors.aliyun.com/g"/etc/apt/sources list && apt update# -- No-install-recommends
apt install -y --no-install-recommends a b c && rm -rf /var/lib/apt/lists/*
Copy the code

Alpine mirror

# Change the source and update
sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories
# --no-cache indicates no cache
apk add --no-cache a b c && rm -rf /var/cache/apk/*
Copy the code

Centos image

# Change the source and updatecurl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo && yum makecache yum install -y a  b c && yum clean alCopy the code

Dockfile practice

Best practices

Write a.dockerIgnore file
A container runs a single application
Do not use the latest label for the base and production mirrors
Set WORKDIR and CMD
Use ENTRYPOINT and (optional) start the command with exec
COPY is preferred over ADD
Set default environment variables, mapping ports, and data volumes
Use LABEL to set the image metadata
Add HEALTHCHECK

Build the sample in multiple stages

FROM golang:1.11-alpine AS build

Install the tools required for the project
# Run `docker build --no-cache .` to update dependencies
RUN apk add --no-cache git
RUN go get github.com/golang/dep/cmd/dep

Install project dependencies (GO uses gopkg.toml and gopkg.lock)
# These layers are only re-built when Gopkg files are updated
COPY Gopkg.lock Gopkg.toml /go/src/project/
WORKDIR /go/src/project/
# Install library dependencies
RUN dep ensure -vendor-only

Copy the project and build it
# This layer is rebuilt when a file changes in the project directory
COPY . /go/src/project/
RUN go build -o /bin/project

# Compact build environment
FROM scratch
COPY --from=build /bin/project /bin/project
ENTRYPOINT ["/bin/project"]
CMD ["--help"]
Copy the code

Q&A

Alpine Base image used

Solve the GliBC problem

ENV ALPINE_GLIBC_VERSION="2.31 - r0"
ENV LANG=C.UTF-8

RUN set -x \
    && sed -i 's/dl-cdn.alpinelinux.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apk/repositories \
    && apk add --no-cache wget \
    && wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub \
    && wget -O https://github.com/sgerrand/alpine-pkg-glibc/releases/download/$ALPINE_GLIBC_VERSION/glibc-$ALPINE_GLIBC_VERSION.apk \
    && wget -O https://github.com/sgerrand/alpine-pkg-glibc/releases/download/$ALPINE_GLIBC_VERSION/glibc-$ALPINE_GLIBC_VERSION.apk \
    && wget -O https://github.com/sgerrand/alpine-pkg-glibc/releases/download/$ALPINE_GLIBC_VERSION/glibc-bin-$ALPINE_GLIBC_VERSION.apk \
    && wget -O https://github.com/sgerrand/alpine-pkg-glibc/releases/download/$ALPINE_GLIBC_VERSION/glibc-i18n-$ALPINE_GLIBC_VERSION.apk \
    && apk add --no-cache glibc-$ALPINE_GLIBC_VERSION.apk  \
                    glibc-bin-$ALPINE_GLIBC_VERSION.apk \
                    glibc-i18n-$ALPINE_GLIBC_VERSION.apk \
    && /usr/glibc-compat/bin/localedef --force --inputfile POSIX --charmap UTF-8 "$LANG" || true \
    && echo "export LANG=$LANG" > /etc/profile.d/locale.sh \
    && apk del glibc-i18n \
    && rm glibc-$ALPINE_GLIBC_VERSION.apk glibc-bin-$ALPINE_GLIBC_VERSION.apk glibc-i18n-$ALPINE_GLIBC_VERSION.apk
Copy the code

reference

Dockerfile best practices
Docker multi-phase build
Three tips to reduce Docker image size by 90%
Five general ways to streamline Docker images
Optimize best practices for Dockerfile
Alpine3.12 mirror

If this article has helped you, or you are interested in technical articles, you can pay attention to the wechat public number: Technical tea Party, can receive the relevant technical articles in the first time, thank you!

This article was automatically published by ArtiPub, an article publishing platform