[toc]

It’s 2020, and the concept of containers, or Docker containers, should be familiar to any Developer working in the Internet industry. Application deployments from both large and small factories are now the preferred Docker container.

But Docker, while good, is not a panacea. Docker itself simply provides a sandbox mechanism for isolating different applications. Mirroring is one of its best features, allowing developers to quickly deploy applications. But this is not enough for large application management. When developers realized this, they introduced the concept of choreography, which created a new controversy…

Swarm and K8S are the most popular systems in the world. Swarm and K8S are the most popular systems in the world. Swarm and K8S are the most popular systems in the world

PS: There is more to a container than docker, but this article has omitted their differences and in most cases the two terms are equivalent.

Start with containers

background

When virtual machines (VMS) and cloud computing were mature, companies often deployed applications on cloud servers using scripts or manually, just as they do on physical machines. However, small problems occurred due to the inconsistency between the local and cloud environments.

There is a project called Paas that focuses on solving the problem of inconsistency between the local environment and the cloud environment, and provides application hosting. To put it simply, deploy the server side corresponding to Paas on the cloud server, and then the local machine can push the local application to the cloud machine with one click. However, since a Paas server on the cloud server receives applications submitted by multiple users, it provides a set of isolation mechanisms at the bottom, creating a sandbox for each submitted application, and each sandbox is isolated from each other without interference.

Is this sandbox similar to docker? In fact, container technology is not exclusive to Docker, Docker is just one of many container technology implementation. So why did Docker become so popular? Let’s start with Paas.

The essence of Paas is a packaging (local) and distribution (cloud) mechanism to help users distribute applications to large-scale clusters, of which container technology is a relatively low-level part. It sounds perfect, but the problem is the packaging feature. The packaging function is quite tedious. We need to make a package for each application, language and version. The key point is that there are often problems in the packaging process. In other words, Paas can give you the thrill of one-click deployment, but first you have to experience the pain of packaging.

This makes users extremely painful packaging, docker can solve a small innovation, that is, mirroring. The image itself is also a packaging mechanism, and the image usually contains the full operating system, which can be restored to the local environment as much as possible, while also containing your application.

Through such thing as a mirror, you can easily get in local development, and then upload image to the cloud server deployment, and basic don’t need or only need small changes can make the cloud server has the same application environment and the local, which can then be established through the mirror of isolated sandbox environment, to deploy their application.

However, although Docker solves the problem of difficult Paas packaging, the original large-scale cluster deployment capability of Paas is the weakness of Docker, even docker itself does not have such functions.

Swarm and K8s were all about this, but that’s a story for another day.

Docker implementation principle

Docker implementation principle, the next to see how docker implementation sandbox isolation mechanism.

When it comes to Docker, a lot of people compare it to virtual machines, and almost always refer to the following graph:

On the left is the structure of the virtual machine and on the right is the structure of the Docker container, but it’s not that accurate. In a VM, the Hypervisor virtualizes hardware resources and installs an operating system on these hardware resources to isolate upper-layer VMS from lower-layer hosts. However, Docker does not have this function. The sandbox environment (file system, resources, process environment, etc.) isolated from the host in docker container is essentially the Namespace mechanism of Linux. CGroups (Control Groups) and Chroot functions to achieve. In fact, Docker is still a process (process group) running on the host, but Docker thinks it is an independent environment through some tricks. Let’s take a brief look at this part.

If you use the ps command to view a process in a Docker container, you might see only the following output:

/ # ps
PID  USER   TIME COMMAND
  1 root   0:00 /bin/bash
  10 root   0:00 ps
Copy the code

When you execute ps in the container, you only see process /bin/bash 1 and process ps 10. The pid of /bin/bash may be 100 or 1000. The pid of /bin/bash may be 1. The answer is the Namespace mechanism provided by Linux, which isolates /bin/bash from the process space.

To create a process, add an optional parameter, such as the following:

int pid = clone(main_function, stack_size, CLONE_NEWPID | SIGCHLD, NULL); 
Copy the code

After that, the thread created will have a new namespace where its PID will be 1, although in the real world of the host, its PID will still be the same. In this example, only the PID Namespace is used. In addition to the network Namespace, the mount Namespace is used to mount the root directory of the entire container to a new directory. And then put a kernel file in it to look like a new system), to isolate the entire container from the actual host. And this is really the basic implementation of container base.

However, the Namespace mentioned above is not enough. There is also a big problem, that is, the isolation of system resources. For example, to control the CPU resource usage and memory usage of a container, otherwise a container will eat up system resources, how to do with other containers.

The Linux implementation of resource isolation method is Cgroups, the specific use method is not introduced. Cgroups provide a file interface to limit the resources used by a container by modifying the file information under /sys/fs/cgroup/, such as pid, CPU usage time limit, etc.

So, Docker itself is just a process in Linux, separated into separate sandboxes by Namespace and cgroup. If you understand this, you will understand some features of Docker. For example, programs that rely too much on the kernel may have problems when they are executed on Docker. For example, it is impossible to install the high version of Docker on the host of the lower version, because it is still executed on the kernel of the host in essence.

Oh, and MAC and Windows. How does that work? Quite simply, their Dockers are built on virtualized Linux, so it’s still Linux.

With containers out of the way, it’s time to introduce choreography.

The battle for orchestration

Here we will mainly introduce the concept of choreography and the disputes between Docker Swarm and K8S caused by this concept.

Docker itself only provides package-deploy functionality. It does not provide distributed cluster (large-scale cluster) management, which was the main area of the original Paas project. Orchestration is the core appeal of container technology. Without orchestration, a container is just a sandbox. As Docker matured, its main strength was in programming, known as swarm, but docker’s son, swarm, was outdone by k8S.

Why is that?

First, let’s talk about what is the choreography of the container. To be simple, it is the configuration of the container and the management of the behavior of the runtime.

So how is Docker Swarm programmed? This actually involves another project, Docker-compose, which, together with Docker Machine, is called the Docker Three Musketeers.

Choreography is to manage the configuration and behavior of containers, so the natural idea is to write the definition of these configurations and behaviors into A configuration file, such as the user needs to run containers A, B, or C. Then we can write all the container-related configurations and associations, such as network, disk, boot copy, error behavior, and the way containers collaborate (boot order, etc.) into a configuration file. Finally, load and execute the configuration file with a single command to orchestrate the container.

Swarm does things that are simple, and sometimes simplicity isn’t always a good thing because it means it’s hard to meet the industry’s complex needs. Such as its inability to handle stateful services, or its difficulty in handling complex relationships between multiple services (processing services in an order is not sufficient). At this moment, Kubernetes (K8S), born out of Borg, appeared in front of people. Its body, precipitation of Google decades of experience, can say that it is the favorite standing on the shoulders of giants. So what’s its advantage over Swarm?

The answer lies in his design, which has to be explained in detail by k8S. In terms of design, K8S is based on API design as a whole, that is, all components involved in the overall architecture can be pluggable. Take containers for example. In K8S, containers can be replaced as long as they meet the corresponding interface design standards. Similar to containers are network plug-ins, volume plug-ins, and so on. K8s details are not introduced here, interested in children’s shoes can refer to the following documents:

  • Kubernetes design concept
  • Kubernetes design architecture

That is, the whole design is based on cluster management as the core, the overall architecture is loose, pluggable. Swarm, on the other hand, is based on Docker, so there are essential differences between the two designs.

Then, on the basis of containers, K8S adds another layer of packaging, namely Pod. The so-called Pod is a task group composed of containers with the same or similar functions. Why have a Pod? Remember what the essence of a container is, it’s a set of processes in an operating system, and to some extent, it’s a little bit cumbersome to manage at the process level. Linux, for example, has a process group concept that manages the same or related processes (for example, a function performed by multiple processes that work together as a process group).

In container choreography, there is often a process – and group-like relationship between containers (not to mention many distributed components), so a higher level of abstraction is needed to help us manage containers. In K8S, Pod is responsible for the similar process group. Pod is a logical concept, and Pod is also the smallest scheduling unit in K8S. Containers in the same group of PODS share volumns.

With pods, or groups, it becomes easier to manage different services. But it also has a more important meaning, which is the container-based system design pattern.

Container-based approach to distributed system design

Let’s go back to 1980 and say you were a programmer used to writing C and you were exposed to a programming concept called object-oriented. What would you think of it? Can you imagine this thing taking up half of the programming world in 30 years?

Today, a Docker container (or Pod) is something like OOP, and its core is to isolate different things from each other through modular encapsulation, so that they can work together to accomplish something.

From this perspective, you may understand why containers are not valuable, but what is valuable is choreography. Since we don’t see much value in a Java Object either, the idea of OOP programming, and the design patterns derived from it, is the essence.

So in terms of design patterns for distributed systems, how many categories of containers can there be? Similar to the setup patterns of distributed systems, there are three.

  • Single-container Patterns for Container Management
  • Single-node patterns of Closely cooperative Containers
  • And Multi-node collaboration Patterns

PS: This section mostly refers to Google’s Design Patterns for Container-based Distributed Systems. Please click the link at the bottom to see the original paper.

The single-container pattern and the single-node collaboration pattern look similar, but are completely different things.

The single-container mode, simply speaking, is based on the traditional Docker (the behavior of the traditional Docker is relatively simple, only run(), pause(), stop()), providing richer functions and life cycle management. To put it more simply, manage a single Docker service using K8S.

We mainly introduce single – node collaboration and multi – node collaboration.

Single node collaboration mode

The single-node collaboration mode, simply speaking, is managed by a single-node service assistant in a distributed container service environment. In this pattern, you rely on the k8S abstraction of the concept Pod, which stands for Task group, a collection of containers of the same or similar services.

There are mainly the following design patterns.

Sidecar Pattern

Sidecar is a term many people have never heard of (including me before). Let’s put up a picture of the car first,

A sidecar is a small car that sits next to a motorcycle and, under certain circumstances (racing, I guess), the person in the sidecar can offer the rider water, food, etc.

The sidecar pattern is similar, providing a secondary container next to the main service (its container, main Container) to help the main service do some of the dirty work.

For example, if a Web application writes log information to disk, we can add a log collection sidecar to assist the Web service in log collection. Like this:

Still quite understandable, such benefits, I believe that those who have known the design pattern of children’s shoes can easily list a few, but here is a detailed introduction from the perspective of containers:

  1. A container is a unit of resource allocation. willSide of the carAfter service separation, you can have more flexibility to configure resources through cgroups, or to perform operations that dynamically adjust resources (such as giving web services more resources when busy)Side of the carFewer resources).
  2. The container is the smallest unit of packaging. Facilitates responsibility division and testing of different services.
  3. Containers can be reusable units, such as logging services to other services.
  4. Error bounds are provided to allow normal degradation of the system, such as logging service failure, without web service failure.
  5. Containers are the smallest deployment unit and can be upgraded, and rolled back, for each service. But this can also be a disadvantage because services are difficult to manage.

These benefits of separation sound tempting

Ambassador Pattern

The Diplomat pattern provides a container that acts as a proxy to communicate with the main container. This is equivalent to having one more layer of proxy out of the communication interface, such as the master service thinking it is talking to a local Redis, when in fact the proxy is actually interacting with a Redis cluster.

The benefit of the diplomat pattern is that it isolates the master service from external components. With a proxy only, external components can be replaced seamlessly, and all master services are unaware. Then there is the convenience of testing and reuse, which is actually the benefit of decoupling between services, similar to the sidecar pattern above.

Adapter Pattern

The two patterns mentioned above are designed to help main Coninter focus more on its responsibilities. The adapter mode is for the convenience of other components.

For example, suppose you have multiple services (Web, database, cache service, etc.) and need a monitor to see if these components are working properly. In normal cases, the monitoring system needs to obtain indicator information of different services before monitoring.

The problem is that if you add or subtract services, it can be cumbersome for monitoring systems. The adapter pattern solves this problem.

If multiple services (Web, database, cache, etc.) provide a unified external interface, then we can use an adapter container to uniformly obtain the indicator information of these services, and then the monitoring system can uniformly obtain all indicator information through this adapter container. As shown in the figure below.

OK, so those are the three design patterns for single-node collaboration. Let’s look at multi-node collaboration.

Multi-node collaboration mode

This part will be a little easier, so I won’t spend too much time here.

In addition to collaborative containers on a single node, modular containers make it easier to build collaborative multi-node distributed applications. This part of the story sounds fancy, but it’s actually pretty easy to understand.

For example, zooKeeper in the distributed field should be familiar to everyone. In the paper, this mode of leader election provided by multiple nodes is called leader election mode. Similarly, message queues such as Kafka are called the RK queue pattern. The last one is similar to Spark. The master worker computing mode is called Scatter/ Gather Pattern, which distributes a computing task to multiple other computing nodes.

The patterns listed all collaborate over multiple nodes and provide external services through exposed interfaces. However, it is basically a common way of using containers to build distributed services. If you have used Docker to build Hadoop, you will be familiar with the so-called multi-node cooperation mode.

If you think about distributed systems as an engineering project, then these multi-node deployment patterns really need a reason. This can be regarded as a typical case of practice before theory and theory summing up practice. (But I still think this part of the paper is suspected of water)

So, the content of the distributed system design of containers will stop here, if you are interested in the original paper, you can turn to the bottom.

summary

OK, this article mainly introduces the history of docker container, then introduces the importance of container choreography, and briefly explains why Swarm lost the war of choreography to K8S. Finally, it extends from the concept of container choreography to the design pattern based on container technology. Among the three patterns, the single-node collaboration pattern is relatively new and has some inspiration value.

The above –

Reference article:

Design patterns for container-based distributed systems

Kubernetes design concept

Docker core technology and implementation principle

An Introduction to Docker and Analysis of its Performance