This article series will be divided into three parts:
The first part focuses on multi-stage builds as they are an essential part of the mirror lite journey. In this section, I’ll explain the difference between static and dynamic links, their effects on mirroring, and how to avoid those bad effects. Interspersed with an introduction to Alpine’s mirror image. Docker Image Making Tutorial: Reduce image volume
Part 2 will focus on choosing appropriate streamlining strategies for different languages, focusing on Go, but also covering Java, Node, Python, Ruby, and Rust. This section also details Alpine Mirror’s pothole avoidance guide. What? You don’t know what pits Alpine mirror has? I’ll tell you. Link: Docker Image Making Tutorial: Streamlining Strategies for different languages
Part 3 will explore common streamlining strategies that apply to most languages and frameworks, such as using common base images, extracting executables, and reducing the size of each layer. More exotic or radical tools like Bazel, Distroless, DockerSlim, and UPX will also be introduced, and while these tools work wonders in certain scenarios, most of the time they don’t.
This article introduces the second part.
1. Simplified Go language mirroring
The Go program compiles all the necessary dependencies into binaries, but it is not entirely certain that it uses static links because some of the Go packages depend on the system’s standard libraries, such as those that use DNS resolution. Whenever these packages are imported into the code, the compiled binaries need to call some system library. For this purpose, Go implements a mechanism called CGO to allow Go to call C code so that the compiled binaries can call the system library.
That is, if the Go program uses net packages, it will generate a dynamic binary file, and if you want the image to work, you must either copy the required library files into the image, or use the Busybox :glibc image directly.
Of course, you can also disable CGO so that Go does not use the system library and uses a built-in implementation instead (such as using a built-in DNS parser), in which case the resulting binary is static. Cgo can be disabled by setting the environment variable CGO_ENABLED=0, for example:
FROM golang
COPY whatsmyip.go .
ENV CGO_ENABLED=0
RUN go build whatsmyip.go
FROM scratch
COPY --from=0 /go/whatsmyip .
CMD ["./whatsmyip"]
Copy the code
Since compilation produces static binaries, you can run directly in the Scratch image 🎉
You can also use the -tags parameter to specify which built-in libraries you want to use. For example, -tags netgo indicates the use of built-in NET packages, independent of system libraries:
$ go build -tags netgo whatsmyip.go
Copy the code
When specified this way, if no other package imported uses the system library, the resulting compilation is a static binary. That is, whenever there is another package that uses the system library, cGO is turned on and you end up with a dynamic binary. To do this once and for all, set the environment variable CGO_ENABLED=0.
Alpine Mirror Exploration
The last article covered Alpine mirroring briefly, and I promise to spend a lot of time discussing Alpine mirroring in future articles. Now is the time!
Alpine is one of many Linux distributions, just a distribution name like CentOS, Ubuntu, and Archlinux, boasting a small, secure package management tool of its own, APK.
Unlike CentOS and Ubuntu, Alpine doesn’t have maintenance support from major companies like Red Hat or Canonical, and the number of packages is much smaller than those distributions (if you just look at the default software repository out of the box, Alpine has only 10,000 packages, compared with more than 50,000 for Ubuntu, Debian, and Fedora.)
Before the rise of the container, Alpine was relatively unknown, perhaps because people didn’t care much about the size of the operating system itself. After all, they cared only about business data and documentation. The size of programs, library files, and the system itself was often negligible.
As container technology swept through the software industry, everyone noticed that container images were too big, wasted disk space, and took too long to pull. So people started looking for smaller mirrors for containers. For familiar distributions (such as Ubuntu, Debian, Fedora), the only way to keep the image size below 100M is by removing tools such as ifconfig and netstat. For Alpine, nothing needs to be deleted and the mirror size is just 5M.
Another advantage of Alpine mirroring is that the package management tool is very fast and the installation experience is very smooth. Of course, on a traditional virtual machine, you don’t need to worry too much about the installation speed of software packages. You only need to install the same package once instead of repeatedly installing it. Containers are different. You might build new images on a regular basis, or you might temporarily install some debugging tools in a running container, and if the installation of a package is slow, it will wear out our patience.
To make this intuitive, let’s do a simple comparison test to see how long it takes to install tcpdump on different distributions:
🐳 → time docker run <image> <packagemanager> install tcpdump
Copy the code
The test results are as follows:
Base image Size Time to install tcpdump -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- alpine: 3.11 5.6 MB 1-2s ARCHLinux :20200106 409 MB 7-9s centos:8 237 MB 5-6s Debian :10 114 MB 5-7s Fedora :31 194 MB 35-60s Ubuntu :18.04 64 MB 6-8sCopy the code
If you want to learn more about Alpine, check out Natanel Copa’s talk.
Well, if Alpine is so great, why not use it as the base mirror for all mirrors? Take it one step at a time. In order to level all pits, you need to consider two scenarios:
- Use Alpine as the second build phase (
run
Phase) - Use ALpine for all construction phases (
run
Phase andbuild
Phase)
Alpine is used in the run phase
With excitement, add Alpine mirror to Dockerfile:
FROM gcc AS mybuildstage
COPY hello.c .
RUN gcc -o hello hello.c
FROM alpine
COPY --from=mybuildstage hello .
CMD ["./hello"]
Copy the code
The first pothole comes with an error starting the container:
standard_init_linux.go:211: exec user process caused "no such file or directory"
Copy the code
This error was seen in the last article, when a Scratch image was used as the base image of a C program. The reason for the error was that the Scratch image lacked a dynamic library file. But why use Alpine mirror error, does it also lack dynamic library file?
Not really, Alpine uses dynamic libraries, too, since one of its design goals was to take up less space. But Alpine’s standard library differs from most distributions in that it uses Musl Libc, which is smaller, simpler, and more secure than Glibc but isn’t compatible with the popular standard.
Again, you may be asking, “If Musl Libc is smaller, simpler, and f * * king more secure, why do other distributions still use Glibc?”
MMM… Because glibc has many additional extensions, and many programs use them, musl libc does not include them. Refer to the DOCUMENTATION for MUSL for details.
That is, if you want your program to run in Alpine mirrors, you must use Musl liBC as a dynamic library at compile time.
Use Alpine for all stages
To generate a binary linked to musl libc, there are two paths:
- Some official mirrors offer Alpine versions that you can use directly.
- There are some official images that don’t offer Alpine, so we’ll have to build them ourselves.
Golang mirrors fall into the first category. Golang: Alpine offers a Go toolchain built on Alpine.
To build the Go program, use the following Dockerfile:
FROM golang:alpine
COPY hello.go .
RUN go build hello.go
FROM alpine
COPY --from=0 /go/hello .
CMD ["./hello"]
Copy the code
The resulting image is 7.5m in size, which is a bit large for a program that simply prints “Hello World”, but let’s look at it another way:
- Even if the program is complex, the resulting image will not be large.
- Contains many useful debugging tools.
- Even if some special debugging tools are missing at run time, they can be quickly installed.
Go is done. What about C? There’s no mirror image of GCC: Alpine. You can only use Alpine image as the base image, install C compiler, Dockerfile as follows:
FROM alpine
RUN apk add build-base
COPY hello.c .
RUN gcc -o hello hello.c
FROM alpine
COPY --from=0 hello .
CMD ["./hello"]
Copy the code
Build-base must be installed, if GCC is installed, there is only compiler, no standard library. Build-base is the equivalent of Ubuntu’s Build-Essentials, introducing compilers, standard libraries, and tools like make.
Finally, compare the size of the “Hello World” image generated by different build methods:
- Using base images
golang
Construction: 805 MB - Multi-stage builds, the build phase uses base images
golang
The run phase uses the base imageubuntu
: 66.2 MB - Multi-stage builds, the build phase uses base images
golang:alpine
The run phase uses the base imagealpine
: 7.6 MB - Multi-stage builds, the build phase uses base images
golang
The run phase uses the base imagescratch
: 2 MB
In the end, the mirror volume was reduced by 99.75%, which is pretty amazing. For a more practical example, the final image size comparison for an application using NET mentioned in the previous section:
- Using base images
golang
Construction: 810 MB - Multi-stage builds, the build phase uses base images
golang
The run phase uses the base imageubuntu
: 71.2 MB - Multi-stage builds, the build phase uses base images
golang:alpine
The run phase uses the base imagealpine
: 12.6 MB - Multi-stage builds, the build phase uses base images
golang
The run phase uses the base imagebusybox:glibc
: 12.2 MB - Multi-stage builds, the build phase uses base images
golang
With the parameter CGO_ENABLED=0, the run phase uses the base imageubuntu
: 7 MB
The mirror volume is still 99% reduced.
3. Java language image simplification
Java is a compiled language, but the runtime still runs in the JVM. So how do you use multi-phase builds for the Java language?
Static or dynamic?
Conceptually, Java uses dynamic linking because Java code calls Java apis provided by the JVM that are outside of the executable, usually JAR files or WAR files.
However, these Java libraries are not completely independent of the system libraries, and certain Java functions will eventually call the system libraries, such as open(), fopen(), or their variants, to open a file, so the JVM itself may dynamically link to the system libraries.
This means that Java programs can theoretically be run using any JVM, regardless of whether the system standard library is Musl libc or Glibc. Therefore, any base image with the JVM can be used to build Java programs, or any image with the JVM can be used as the base image to run Java programs.
Class file format
The format of Java class files (bytecodes generated by the Java compiler) varies from version to version, and most of the changes are changes to the Java API. Some of the changes are related to the Java language itself, such as the addition of generics in Java 5, which can lead to changes in class file formats that break compatibility with older versions.
So by default, classes compiled with a given version of the Java compiler are not compatible with older JVMS, But you can specify the compiler’s -target (Java 8 and below) argument or –release (Java 9 and above) argument to use the older class file format. The –release parameter can also specify the path to the class file to ensure that the program is running on the specified JVM version (for example, Java 11) and does not accidentally call Java 12 apis.
JDK vs JRE
If you’re familiar with Java packaging on most platforms, you probably know the JDK and JRE.
JRE is the Java Runtime Environment. It contains the Environment required by Java programs, that is, the JVM.
The JDK, or Java Development Kit, contains both the JRE and the tools needed to develop Java programs, namely the Java compiler.
Most Java images provide both JDK and JRE tags, so you can use JDK as the base image during the build phase and JRE as the base image during the Run phase.
Java vs OpenJDK
Openjdk is recommended because it is open source and updates frequently
You can also use Amazon Corretto, a patched version of the Amazon Fork OpenJDK that claims to be enterprise-class.
Begin to build
With all that said, which mirror image should I use? Here are a few references:
- Its: 8 – jre – alpine (85 MB)
- Openjdk: 11-JRE (267MB) or OpenJDK :11-jre-slim (204MB)
- Its: 14 – alpine (338 MB)
If you want more intuitive data, you can look at my example and use the tried-and-true “Hello World” again, only this time in Java:
class hello {
public static void main(String [] args) {
System.out.println("Hello, world!"); }}Copy the code
Image sizes obtained by different build methods:
- Using base images
java
Construction: 643 MB - Using base images
openjdk
Construction: 490 MB - Multi-stage builds, the build phase uses base images
openjdk
The run phase uses the base imageopenjdk:jre
: 479 MB - Using base images
amazoncorretto
Construction: 390 MB - Multi-stage builds, the build phase uses base images
openjdk:11
The run phase uses the base imageopenjdk:11-jre
: 267 MB - Multi-stage builds, the build phase uses base images
openjdk:8
The run phase uses the base imageopenjdk:8-jre-alpine
: 85 MB
All dockerfiles can be found in this repository.
4. Interpreted language mirror simplification
For interpreted languages such as Node, Python, and Rust, the situation is a little more complicated. Start with the Alpine mirror.
Alpine mirror
For interpreted languages, using Alpine as the base image is generally fine if the program only uses the standard library or the dependencies are in the same language as the program itself, and there is no need to call the C library and external dependencies. Once your program needs to invoke external dependencies, things get complicated, and you’ll need to install them to continue using Alpine mirroring. There are three levels of difficulty:
- Simple: The dependency library has installation instructions for Alpine, which generally explain which packages need to be installed and how to establish dependencies. But this is very rare, for the reasons mentioned earlier, Alpine has a much smaller number of packages than most popular distributions.
- Medium: The dependency library does not have installation instructions for Alpine, but there are instructions for other distributions. Compare and find the Alpine package (if one exists) that matches packages from other distributions.
- Difficulty: The dependency library doesn’t have installation instructions for Alpine, but there are instructions for other distributions, and Alpine doesn’t have a package. In this case, you must build from source!
The last is least recommended for Alpine as a base image. Not only does it not reduce the size, it can be counterproductive because you need to install compilers, dependency libraries, headers, etc… What’s more, the build time would be long and inefficient. If you have to think about multi-phase builds, it’s even more complicated, and you have to figure out how to compile all of your dependencies into binaries, which is pretty big to think about. Therefore, the use of multistage builds in interpreted languages is generally not recommended.
One special case runs into most of Alpine’s problems: Using Python for data science. Packages such as Numpy and PANDAS are precompiled into wheel, Python’s new packaging format, which is compiled into binaries to replace Python’s traditional egg files and can be installed directly through PIP. But these wheels are tied to a specific C library, which means they will work on most gliBC images, but Alpine images won’t, for reasons you’ve already seen. Installing a Pyhton image in Alpine requires a lot of dependencies and a lot of time and effort to build from top to bottom. There’s an article that explains this: Building Pyhton images in Alpine slows your build up to 50 times! .
Since Alpine mirroring is so bad, is it not recommended to use Alpine mirroring for any program written in Python? Not entirely, at least for Python data science, Alpine is not recommended. Otherwise, it’s a case by case study, and you should try Alpine if you can.
: slim mirror
If you really don’t want to mess around, you can choose a compromise mirror XXX: Slim. Slim mirroring is generally based on Debian and Glibc, removing many non-essential packages and optimizing volume. If a compiler is required during the build process, slim images are not suitable, but otherwise slim can be used as the base image in most cases.
Here’s a comparison of Alpine and Slim mirror sizes for the dominant interpreted languages:
Image Size
---------------------------
node 939 MB
node:alpine 113 MB
node:slim 163 MB
python 932 MB
python:alpine 110 MB
python:slim 193 MB
ruby 842 MB
ruby:alpine 54 MB
ruby:slim 149 MB
Copy the code
For a special case where the base image is installed, the size of the base image is as follows:
Image and technique as the Size -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- python 1.26 GB python: slim 407 MB python: alpine 523 MB python:alpine multi-stage 517 MBCopy the code
You can see that using Alpine doesn’t help in this case, even with a multi-phase build.
Not to write Alpine off, though, as in a case that includes a Django application with a lot of dependencies.
Image and technique as the Size -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- python 1.23 GB python: alpine 636 MB python: alpine multi-stage 391 MBCopy the code
In conclusion, it’s not clear which base image to use. Sometimes Alpine works better, while Slim works better. If you’re looking for the ultimate mirror size, try both. With time, we’ll have enough experience to know which case calls for Alpine and which calls for Slim without having to try again and again.
5. Compact image in Rust
Rust is a modern programming language originally designed by Mozilla and increasingly popular in the Web and infrastructure world. Rust-compiled binaries, dynamically linked to the C library, run well in images like Ubuntu, Debian, and Fedora, but not in Busybox :glibc. Because Rust binaries require calls to the libdl library, busybox:glibc does not include this library.
There is also a Rust: Alpine image, where rust-compiled binaries also run properly.
If you are considering compiling static links, refer to the official Rust documentation. On Linux, a special version of the Rust compiler needs to be built, and the dependent library is Musl libc, the musl libc in Alpine. If you want a smaller image, follow the instructions in the documentation and throw the resulting binary into the Scratch image.
6. Summary
The first two articles in this series covered common methods for optimizing the volume of Docker images and how to apply them for different types of languages. The final section will show you how to reduce I/O and memory usage while reducing image size, as well as some techniques that are container independent but helpful for optimizing images.
Kubernetes 1.18.2 1.17.5 1.16.9 1.15.12 Offline installation package release address http://store.lameleg.com, welcome to experience. The latest SEALos V3.3.6 is used. Host name resolution configuration optimization, LVSCARE mount /lib/module to solve ipvS loading problems on startup, fix LVSCARE community netlink incompatibility with 3.10 kernel, SealOS generate 100 year certificate and other features. More features github.com/fanux/sealo… . Join the SealOS group by scanning the QR code below. The sealOS robots that have integrated sealOS can see sealOS in real time.