One target: container operations; Two places and three centers; Four-layer service discovery; Five kinds of Pod shared resources; Six CNI common plugins; Seven-layer load balancing; Eight isolation dimensions; Nine network model principles; Ten types of IP addresses; Hundred-grade product line; Thousand-level physical machine; Ten thousand level container; If not, K8s has hundreds of millions: hundreds of millions of daily service.

One target: container operations

Kubernetes (K8S) is an open source platform for automated container operations. These container operations include deployment, scheduling, and scaling between node clusters.

Specific functions:

Automates container deployment and replication.

Real-time elastic shrink container size.

Containers are organized into groups and provide load balancing between containers.

Scheduling: On which machine the container runs.

Composition:

Kubectl: a client-side command line tool that acts as an entry point for the entire system.

Kube-apiserver: Provides an interface in the form of A REST API service as a control entry point for the entire system.

Kube-controller-manager: Performs background tasks of the entire system, including node status, number of Pods, Pods and Service association, etc.

Kube-scheduler: Manages node resources, receives Pods creation tasks from Kube-Apiserver and assigns them to a node.

Etcd: Service discovery and configuration sharing between nodes.

Kube-proxy: runs on each compute node and is responsible for Pod network proxy. Periodically obtain service information from etCD to make corresponding policies.

Kubelet: Runs on each compute node. As an agent, it receives the Pods task assigned to the node and manages the container, periodically obtains the container status and feeds it back to Kube-Apiserver.

DNS: An optional DNS Service that creates DNS records for each Service object so that all pods can access the Service through DNS.

Here is the architecture topology of K8s:

Two and three centers

The geo-redundant center includes the local production center, local DISASTER recovery center, and remote disaster recovery center.

Data consistency is an important problem to be solved by the two centers and three centers. K8s uses the ETCD component as a highly available, highly consistent repository for service discovery. Used to configure sharing and service discovery.

It started as a project inspired by Zookeeper and Doozer. In addition to having all of their features, they also have the following four features:

Simple: The HTTP + JSON API makes it easy to use curl commands.

Security: Optional SSL client authentication mechanism.

Fast: Each instance supports one thousand writes per second.

Trusted: Distribution is fully implemented using the Raft algorithm.

Four layers of service discovery

Let’s explain layer 7 protocols in one diagram:

K8s provides two ways to do service discovery:

Environment variables: When a Pod is created, Kubelet injects environment variables for all services in the cluster into the Pod. Note that in order to inject a Service environment variable into a Pod, the Service must have been created before the Pod. This makes it almost impossible to do service discovery this way.

For example, if the ServiceName of a Service is Redis-master and the ClusterIP:Port is 10.0.0.11:6379, the corresponding environment variables are:

DNS: You can easily create KubeDNS using cluster add-on to discover services in a cluster.

These two methods, one is based on TCP, as we all know, DNS is based on UDP, they are built on top of the four layer protocol.

Five PODS share resources

Pod is the most basic operation unit of K8s, which contains one or more closely related containers. A Pod can be regarded as the “logical host” of application layer by a containerized environment. Multiple container applications within a Pod are usually tightly coupled, and pods are created, started, or destroyed on Node. Each Pod runs a special mounted Volume called Volume, so communication and data exchange between them are more efficient. We can take advantage of this feature to put a group of closely related service processes into the same Pod at design time.

Containers in the same Pod can communicate with each other simply by using localhost.

An application container in a Pod shares five resources:

PID namespace: Different applications in Pod can see the process ids of other applications.

Network namespace: Multiple containers in a Pod can access the same IP and port range.

IPC namespace: Multiple containers in a Pod can communicate using SystemV IPC or POSIX message queues.

UTS namespace: Multiple containers in pods share a host name.

Volumes: Each container in a Pod can access Volumes defined at the Pod level.

The Pod lifecycle is managed through a Replication Controller; It is defined by a template and then assigned to run on a Node. The Pod ends when the container contained in the Pod finishes running.

Kubernetes designed a unique network configuration for pods, including assigning each Pod an IP address and using the Pod name to communicate with the host during communication.

Six common CNI plug-ins

Container Network Interface (CNI) is a set of standards and libraries for Linux Container Network configuration. Users need to develop their own Container Network plug-ins based on these standards and libraries. CNI focuses only on container network connection and resource release during container destruction, providing a framework so CNI can support a large number of different network patterns and is easy to implement.

Here is a diagram of six commonly used CNI plug-ins:

Layer 7 load balancing

To talk about load balancing, you have to talk about communication between servers.

Internet Data Center (IDC), also known as Data Center or equipment room, houses servers. The IDC network is the communication bridge between servers.

There are many network devices in the picture above, what are they used for?

Routers, switches, and MGW/NAT are network devices with different roles based on performance and Intranet.

Intranet access switch: Also known as TOP of RACK (TOR), a device that connects servers to the network. Each Intranet access switch connects to 40-48 servers and uses a network segment with mask /24 as the internal network segment of the server.

Intranet core switch: forwards traffic on Intranet access switches and inter-IDC traffic.

MGW/NAT: MGW (LVS) is used for load balancing, and NAT is used for address translation when Intranet devices access the Internet.

Extranet core router: Connects to the Meituan unified extranet platform through static interconnection carriers or BGP.

Let’s talk about load balancing at each layer:

Layer 2 load balancing: Layer 2 load balancing based on MAC addresses.

Layer-3 load balancing: load balancing based on IP addresses.

Layer 4 load balancing: load balancing based on IP+ port.

Layer 7 load balancing: load balancing based on application layer information such as URL.

Here is a diagram to illustrate the difference between layer 4 and layer 7 load balancing:

The above four layers of service discovery is mainly about k8S native Kube-proxy mode. K8s service exposure is mainly through NodePort, binding a port of minion host, and then carrying out pod request forwarding and load balancing, but this method has the following defects:

There may be many services. If each Service is bound to a Node host port, the host needs to open external ports for Service invocation, resulting in confusion of management.

Unable to apply firewall rules required by many companies.

Ideally, an external load balancer would bind to a fixed port, such as 80, and then forward to the following Service IP based on the domain name or Service name. Nginx solves this requirement well, but the problem is how to modify the configuration of Nginx and load the configuration if there are new services added. Kubernetes’ solution is Ingress. This is a scheme based on seven layers.

Eight isolation dimensions

The K8s cluster scheduling side needs to make corresponding scheduling policies for the isolation from coarse-grained to fine-grained from top to bottom.

Nine network modeling principles

The K8s network model should conform to four basic principles, three network requirements principles, one architecture principle and one IP principle.

Each Pod has a separate IP address, and assuming that all pods are in a directly connected, flat network space, they can be accessed through the Pod’S IP regardless of whether they are running on the same Node.

The IP of the Pod in K8s is the minimum granularity IP. All containers in a Pod share a network stack, which is called the IP-per-POD model.

Pod the IP actually assigned by Docker0

The IP addresses and ports displayed inside the Pod are the same as those displayed outside the Pod

Different containers in the same Pod share a network and can access each other’s ports through localhost, similar to different processes in the same VM.

Ip-per-pod model Pod can be regarded as an independent VM or physical machine from port allocation, domain name resolution, service discovery, load balancing, and application configuration.

All containers can communicate with other containers without NAT.

All nodes can work with all containers in different NAT modes and vice versa.

The address of the container is the same as the address seen by others.

To conform to the following structure:

Ten types of IP addresses

Everyone knows that IP addresses are classified as ABCDE, and there are five other types of special-purpose IP.

The first kind of

Class A: 1.0.0.0-1226.255.255.255. The default subnet mask is /8, which is 255.0.0.0. Class B: 128.0.0.0 to 191.255.255.255. The default subnet mask is 255.255.0.0, that is, /16. Type C: 192.0.0.0-223.255.255.255. The default subnet mask is /24, that is, 255.255.255.0. Type D: 224.0.0.0 to 239.255.255.255. This type is used for multicast. Class E: 240.0.0.0-255.255.255.255(255.255.255.255 is an all-network broadcast address). Class E addresses are generally used for research purposes.

The second type of

0.0.0.0 strictly speaking, 0.0.0.0 is no longer an IP address. It represents a collection of all unknown hosts and destination networks. Unclear here means that there is no specific entry in the local routing table specifying how to get there. As the default route. 127.0.0.1 Local address. The third kind

224.0.0.1 Multicast address. If your host has IRDP enabled (Internet route discovery, using multicast), you should have such a route in your host routing table. The fourth class

169.254.x.x If the DHCP server is faulty or the response time is too long for a host to automatically obtain an IP address, the system will assign an IP address to the host, indicating that the network cannot work properly. The fifth class

X to 172.31.x. X, and 192.168.x. X Private addresses. A lot of use in the enterprise. Such an address is reserved to avoid address confusion when a user accesses the public network.