The author | east Ali cloud after-sales technical experts

Introduction: Ali Cloud K8S cluster network currently has two schemes: one is flannel scheme; The other is terway scheme based on Calico and Eni elastic network card. Terway is similar to Flannel, except that Terway supports Pod flexible network cards and NetworkPolicy. In this paper, based on the current version 1.12.6, the author takes Flannel as an example to deeply analyze the implementation method of Ali Cloud K8S cluster network.

Have a bird’s eye view of

Generally speaking, after the configuration of ali Cloud K8S cluster network is completed, it is shown as the following figure: cluster CIDR, VPC routing table, node network, podCIDR of nodes, virtual bridge CNI0 on nodes, VETH connecting Pod and bridge, etc.

You’ve probably seen this diagram in many articles, but it’s hard to understand because of the complexity of the configuration. Here we can look at the logic behind these configurations.

Basically, we can think of these configurations in three different ways: cluster configuration, node configuration, and Pod configuration. These three cases correspond to three partitions of cluster network IP segments: first, cluster CIDR, then podCIDR (cluster CIDR subnets) assigned to each node, and finally, in podCIDR, each Pod is assigned its own IP.

Cluster Network Construction

The initial stage

Cluster creation is based on cloud resources, VPC and ECS. After creating VPC and ECS, we can basically get the resource configuration as shown in the following figure. We get a VPC whose network segment is 192.168.0.0/16. We get several ECS that are assigned IP addresses from the VPC network segment.

Cluster stage

Based on the above initial resources, we use the cluster creation console to get the cluster CIDR. This value is passed as a parameter to the cluster node provision script and to the cluster node configuration tool kubeadm. Kubeadm finally writes this parameter to the yaml file kube-controller-manager.yaml of the static Pod of the cluster controller.

When kubelet registers a node to the cluster, the cluster controller assigns a podCIDR to each node. As shown in the figure above, the subnet of Node B is 172.16.8.1/25 and that of Node A is 172.16.0.128/25. This configuration is recorded in the cluster node podCIDR data item.

Node stage

After the clustering phase above, K8S has cluster CIDR and podCIDR for each node. On this basis, the cluster will send Flanneld to each stage to further build a network framework that can be used by Pod on nodes. There are two main operations:

  • The first is that the cluster uses the Cloud Controller Manager to configure routing entries for the VPC. There is one routing entry for each node. Each routing entry means that if the destination address of a VPC route is the IP address of a node’s podCIDR, the route forwards the network packet to the corresponding ECS.
  • The second is to create the virtual bridge CNI0 and the routes associated with cNI0. The effect of these configurations is that network packets coming in from outside the phase, if the destination IP is podCIDR, are forwarded by the node to the CNI0 virtual LAN.

Note: in practice, CNI0 is created by Flannal CNi in the next section when the first Pod using the Pod network is scheduled to the node, but logically speaking, CNI0 belongs to the node network, not the Pod network, so it is described here.

Pod stage

In the first three phases, the cluster actually sets up a trunk road for network communication between pods. If a cluster dispatches a Pod to a flannel CNI node, Kubelet creates a network namespace and a VETH device for the Pod itself through flannel CNI. Then, one of the VEth devices is added to the CNI0 virtual bridge. Configure the IP address for the VETH device in the Pod. The Pod is then connected to the main road of network communication. It should be emphasized that flanneld in the previous section and FLANnel CNI in this section are completely two components. Flanneld is a POD delivered to each node with daemonset, whose function is to build the network (trunk road), while flannel CNI is a CNI plug-in installed through kubernetes-CNI RPM package when the node is created. This is called by Kubelet to create a network (branch) for a specific POD. Understanding the difference between flanneld and Flannel cnI helps us understand the purpose of the profile associated with Flanneld and Flannel CNI. For example, /run/flannel/subnet.env is an environment variable file created by Flanneld to provide input to flannel CNI. Conf /etc/cni/net.d/10-flannel.conf /etc/cni/net.d/10-flannel.conf /etc/cni/net.d/10-flannel.conf /etc/cni/net.d/10-flannel.

communication

The Pod network environment was built above. Based on the above network environment, Pod can complete four kinds of communication: local communication; Communicate with node Pod; Cross-node Pod communication; And Pod and entity communication outside the Pod network.

So local communication, we’re talking about communication between different containers within a Pod. Because Pod Intranet containers share a network protocol stack, they can communicate with each other through loopback devices.

The communication between Pod and node is the communication inside the CNI0 virtual bridge, which is equivalent to the communication among devices inside a Layer 2 LAN.

Cross-node Pod communication is a little more complicated, but also intuitive. Packets from the sending end are sent to the node through the gateway of the CNI0 bridge, and then sent to the VPC route through the node eth0. There is no packet operation. When a VPC route receives a data packet, it checks the routing table to determine the destination of the data packet and sends the data packet to the ECS node. After entering the node, because Flanneld creates a CNI0 route on the node, the packet will be sent to the CNI0 LAN of the destination, and then to the Pod of the destination.

Finally, Pod communication with non-POD network entities requires SNAT via the iptables rule on the node, which is configured by Flanneld based on the command line — IP-masq option.

conclusion

The above is ali Cloud K8S cluster network construction and communication principle. We mainly analyze the K8S cluster network from two perspectives of network construction and communication. The network setup includes the initial, cluster, node, and Pod phases, which helps us understand these complex configurations. With an understanding of each configuration, the principle of cluster communication is easier to understand.

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”