The author | alibaba senior technical experts Simon
Kubernetes basic network model
This article introduces some of Kubernetes’ ideas about the network model. As you know, Kubernetes has no restrictions on the specific implementation of the network, nor does it give a particularly good reference case. Kubernetes imposes restrictions on the eligibility of a container network, the Kubernetes container network model. It can be boiled down to three rules and four objectives.
- When evaluating a container network or designing a container network, its access conditions. Which three does it need to satisfy? To be considered a qualified network solution.
- The four goals mean that when designing the topology of the network and the realization of specific functions of the network, we should think clearly whether the connectivity and other major indicators can be achieved.
Took three chapters
Let’s take a look at the three rules:
- First: Any two pods can communicate directly with each other without explicitly using NAT to receive data and address translation.
- Second: Node and POD can communicate directly without using obvious address translation;
- Rule 3: POD sees its IP as the same IP as other people see it, without conversion.
I will explain my own understanding of why Kubernetes has some seemingly arbitrary models and requirements for container networks.
The four goals
The four goals are actually when designing a K8s system to provide services to the external world, we should think clearly from the perspective of network: how does the external world connect to the application inside the container step by step?
- ** How does the outside world communicate with the service? How to use a service if there is a user on the Internet or outside of the company? Service refers to the concept of service in K8s.
- How does a service communicate with its back-end POD?
- How do POD and POD calls communicate?
- And finally, the communication between containers inside the POD, right?
The ultimate goal is that the outside world can connect to the innermost and serve the container.
An explanation of the basic constraints
One interpretation of the basic constraint is that the network development complexity of the container lies in the fact that it is parasitized on the Host network. From this perspective, container network solutions can be divided into **Underlay/Overlay ** :
- Underlay’s criterion is that it is in the same layer with the Host network, and one of the visible characteristics is whether it uses the same network segment as the Host network, the input and output infrastructure, and whether the IP address of the container needs to be coordinated with the Host network (from the same central allocation or unified partition). That’s Underlay;
- An Overlay does not require an IP address from the IPM component of the Host network. Generally speaking, it only requires that the IP address does not conflict with the Host network. The IP address can be assigned freely.
Second, Netns exploration
What exactly does Netns achieve
The following is a brief introduction to the kernel foundation of Network Namespace. In the narrow sense, runC container technology does not depend on any hardware. The execution basis of runC container technology is its kernel. The kernel representative of the process is task. A space isolation data structure (NSproxy-Namespace proxy) that does not require special Settings.
In contrast, if a standalone web proxy, or mount proxy, fills it with real private data. The data structure it can see is shown above.
In the sense of an isolated cyberspace, it would have its own network card or network device. A NIC can be a virtual or physical NIC. It has its own IP address, IP address table, routing table, and protocol stack status. The TCP/Ip stack will have its own status, iptables and IPVS.
This is the sensory equivalent of having a completely separate network, isolated from the host network. Of course, the protocol stack code is still common, but the data structure is different.
Relationship between Pod and Netns
This diagram clearly shows the relationship between Netns in pods. Each POD has its own network space, which is shared by POD Net Containers. Generally, K8s recommends using Loopback interfaces to communicate between POD Net Containers, and all containers provide services through POD IP addresses. On the other hand, the Root Netns on the host can be treated as a special network space with a Pid of 1.
Iii. Introduction of mainstream network schemes
Typical container network implementation scheme
The following is a brief introduction to a typical container network implementation. Container network solutions are probably the most blossoming area of K8s, with a variety of implementations. The complexity of container networks lies in the fact that they need to coordinate with the underlying Iass layer network, and they need to make some choices in terms of performance and IP allocation flexibility. There are many different solutions.
The following is a brief introduction of several major schemes: Flannel, Calico, Canal and Finally WeaveNet. Most of the schemes in the middle adopt the method of strategic routing similar to Calico.
- **Flannel ** is a relatively unified solution that provides a variety of network backend. Different Backend implements different topologies and can cover multiple scenarios.
- **Calico ** adopts policy-based routing. BGP is used between nodes to synchronize routes. It is characterized by rich functions, especially good support for Network Point. We all know Calico’s requirements for the underlying Network, which generally requires MAC addresses to be able to pass through rather than cross the layer 2 domain.
- Of course, some of the community’s students will try to integrate Flannel’s strengths with Calico’s. We call this graft-type innovation project Cilium;
- Finally, let’s talk about WeaveNet. If you need to do some encryption on the data in use, you can choose to use WeaveNet. Its dynamic scheme can achieve better encryption.
Flannel plan
The Flannel scheme is the most widely used one. As shown in the figure above, you can see a typical container network scenario. It first works out how the container’s package gets to the Host, in this case by adding a Bridge. Backend is actually independent, meaning that it is optional how the package leaves the Host, what encapsulation is used, or does not need to be encapsulated.
There are three main types of backend:
- One is user-mode UDP, which is the earliest implementation.
- And then there’s the kernel Vxlan, both of which are kind of overlay schemes. The performance of Vxlan is better, but the kernel version is required to support the features and functions of Vxlan.
- If your cluster is not large enough and belongs to the same Layer 2 domain, you can use the host-GW mode. Backend in this mode is basically started by a broadcast routing rule and has high performance.
Iv. The use of Network Policy
Basic concepts of Network Policy
The following describes the concept of Network Policy.
As mentioned earlier, the basic model of the Kubernetes network requires full interconnection between pods. This will cause some problems: there may be some call chains in a K8s cluster that are not directly called to each other. For example, between two departments, I hope that Department A will not visit the services of Department B, and then the concept of strategy can be used.
Basically, the idea is this: It takes a variety of selectors (labels or namespaces), finds a set of pods, or the two ends that correspond to communication, and then determines whether they can communicate with each other by characterizing the stream, which can be interpreted as a whitelist mechanism.
Before using Network Policy, note that Apiserver needs to turn these switches on, as shown above. Another more important thing is that the Network plug-in we choose needs to support the implementation of Network Policy. You should know that Network Policy is only an object provided by K8s, and there is no built-in component for implementation. It depends on whether the container Network solution you choose supports this standard and its completeness. If you choose Flannel, it does not really implement this Policy. Well, then it doesn’t help that you tried this.
Configure the instance
Let’s take a configuration example, or what do you do when designing a Network Policy? I personally feel that three things need to be decided:
- The first thing is to control objects, like the spec part of this instance. In a spec, we can select a particular set of pods to be controlled by podSelector or namespace selector.
- The second is to consider the flow of clear, need to control the direction of the in or out? Or do you control in both directions?
- The most important part is the third part. If the selected direction is described by adding control objects to it, which specific streams can be put in or out? Analogous to a quintuple of flow characteristics, I can use some selector to decide which ones can be my remote, which is the object selection; The IPBlock mechanism can also be used to determine which IP addresses can be allowed. And finally, which protocols or ports. In fact, the flow characteristics are grouped into a quintuple, which selects a particular acceptable flow.
This paper summarizes
That’s the end of this article, let’s briefly summarize:
-
In pod container network, the core concept is IP. IP is the address base of each POD for external communication, which must be consistent inside and outside and conform to the characteristics of K8s model.
-
Topology is the most important factor that affects the performance of container network. Be able to understand how your packages are connected end-to-end, how they get from container to Host, and whether the Host is going to encapsulate or unencapsulate the container. Or through policy-based routing? How do you get to the opposite end?
-
Container network selection and design selection. If you do not know your external network or need the most universal solution, for example, whether the MAC is directly connected or whether the routing table of the external router can be controlled, you can choose Flannel using Vxlan as Backend. If you are sure that your network is 2-tiered, you can use Calico or Flannel-Hostgw as a backend.
-
Finally, Network Policy is a powerful tool in operation and application, which can achieve precise control of incoming and outgoing flows. The way to do that is to figure out who you want to control and how your flow is defined.
5. Thinking time
A few final thoughts for you to consider:
-
Why is the interface standardized CNI, but the container network does not have a very standard implementation, built into K8s?
-
Why does Network Policy not have a standard controller or a standard implementation, but is provided by the owner of the container Network?
-
Is it possible to implement container networking without network devices at all? Consider RDMA and other alternatives to TCP/IP.
-
There are many network problems during operation and maintenance and it is difficult to troubleshoot. Therefore, it is not worth developing an open source tool that can friendly display the network conditions at various stages from Container to Host, from Host to Host, or from encapsulation to decapsulation, so that problems can be quickly located. As far as I know, there is no such tool.
Above is my introduction to the basic concepts of K8s container Network and Network Policy.
“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”