Whether you’ve already deployed and benefited from a number of services in Kubernetes, or you’re about to use Kubernetes. Even though there are plenty of tools available to start and manage Kubernetes clusters, I suspect you’re curious about the theory behind Kubenetes. What do you do if the Kubernetes cluster fails?
The use of Kubernetes is very simple, but we must be clear that the principle behind Kubernetes is very complex, it is composed of a large number of modules and plug-ins. It’s important to understand the responsibilities of these modules and what plug-ins do if you want to be comfortable with the problems that come your way. The most complex and important module in Kubernetes is the network part.
To understand exactly how Kubernetes’ network works, I read a lot of documentation, participated in offline discussions, and even read a lot of source code. Here are some of my summaries of the Kubernetes network:
Kubernetes network module
The Kubernetes network has an important design principle: each Pod has a unique IP address.
The IP address of this Pod is public to other containers in the Pod and routable to other pods in the same node. Have you noticed the Pause container on the Kubernetes cluster node? Its job is to store and hold shared network Spaces (NetNs) of containers inside pods, also known as “sandbox containers.” This means that the Pod’s IP address does not change after the container dies and a replacement container is created. Another advantage of Sandbox Containers see IP-per-pod is that we don’t have to worry about conflicting IP addresses and ports for the Pod on the host, and we don’t have to deal with the IP addresses and ports used by the application.
To recap, Kubernetes only needs the Pod’s IP address to be routable and accessible to other PODS, regardless of which host Pod is running on.
Intra-node communication
The first step is to ensure that the Pod inside the node can communicate with each other, which is the basis for the discussion of inter-pod communication across nodes and the container inside the Pod to access the external network.
In a sense, we can think of the Kubernetes node as a Linux machine with the root network namespace (root is the root user, not the superuser).
Eth0 is the root netns for Linux machines.
Similarly, each Pod has its own NetNS and communicates with the host’s root Netns through virtual Ethernet pairs. One end of the pipe-pair is on the root netns of the host machine and the other end is on the POD netns to communicate with each other.
We call the Ethernet netns on the pod side of pipe-pair eth0. The pod side does not know the Ethernet netns on the host machine, and the host machine starts root netns by default. Pipe-pair The network device command on the host is vethxxx. Run the ifconfig and IP commands to view all network devices on the host.
The communication between Pods and the host has been well illustrated above. The bridge device CBR0 can be used for the communication between different PODS in the host computer. Docker uses a similar bridge device called docker0, using the BRCTL show command to list all Bridges.
Suppose the network package goes to pod2 via POd1:
- The network packet leaves POd1 through eth0 of POd1 and enters the root netns of the host through vethxxx
- Discover the destination address on the CBR0 bridge by ARP request (” Who is the destination IP address? “)
- Vethyyy replied to the ARP request saying it knew the destination address, so CBR0 sent the network packet to Vethyyyy
- The network packet arrives at Vethyyyy and ends up in Pod2’s Netns via pipe-pair
This is how containers within different Pods in Kubernetes communicate with each other. There are other communication schemes, but this is by far the simplest.
Communication across nodes
As I mentioned earlier, Pods also need to communicate across nodes. Thus L2 (cross-node ARP requests), L3 (cross-node IP routing such as route tables provided by cloud service providers), overlay networking or Carrier pigeons, Kubernetes doesn’t care how cross-node PODS communicate, as long as network packets can reach the specified node. Each node is assigned an IP address from a CIDR block, meaning that each pod has a unique IP address, so we don’t have to worry about duplicate addresses between different nodes.
In most cases (especially in the cloud), cloud providers provide route tables to ensure that network packets reach their destination addresses. In addition, the community also provides a large number of network plug-ins by configuring the corresponding routing on the node to achieve the same function.
As in the example above, each node has a number of network Spaces, network interfaces, and a bridge.
Assume that the network package runs from POD1 to pod4 (between different nodes)
- The network package exits from poD1’s eth0 device and enters the root netns through vethxxx
- The network packet arrives at the CBR0 bridge device, and the bridge searches for the destination address through ARP request
- Because there is no reply for the ARP request on the node, the network packet enters the trunk network through the host’s eth0 network interface
- The network packet leaves Node1 and enters the router of the service provider through the trunk network
- Nodes with CIDR IP (the IP address of Pod) are registered in the Route table, so nodes running pod4IP can be found through the routing table
- In the end, the network packet enters node2 via eth0. Although pod4’s IP address is not the IP address of the eth0 network interface, the network packet can still jump to the bridge because the node has configured the corresponding route hop. The node routing table filters routes that match pod4 IP address and finds that the CIDR IP destination address of the destination node needs to pass through bridge CBR0. You can run the route-n command to list the cBR0 routing information as shown in the following figure:
7. The network receives a network packet and sends an ARP request to discover that the destination IP address belongs to network interface VethyYY
8. Finally, the network package reaches pod4🏠
That’s piPE-pair, which is the foundation of the Kubernetes network. Next we’ll look at how Overlay networks work and how to abstract services from pods, as well as the workflows of the off-stack and on-stack networks.
Translation of the articleAn illustrated guide to Kubernetes Networking [Part 1]”Is a slight cut