1 introduction

In Kubernetes, networking is essential to ensure that containers can communicate with each other. Kubernetes itself does not implement the container network itself, but through the way of plug-in free access. The following basic principles must be met for container network access:

  • Pods can communicate directly with each other no matter where they are running, without the need for NAT address translation.
  • Node and POD can communicate with each other, allowing the POD to access any network without restrictions.
  • A POD has a separate network stack. The address seen by a POD should be the same as the address seen by the outside, and all containers within a POD share the same network stack.

2 Container network basics

2.1 the host mode

The container and host share a Network namespace

2.2 container pattern

A container shares a Network namespace with another container. A POD in Kubernetes is a Network namespace shared by multiple containers.

2.3 none mode

The container has a separate Network namespace, but does not perform any Network Settings on it, such as assigning Veth pairs and Bridges, configuring IP addresses, etc.

2.4 bridge model

When the Docker process starts, a virtual bridge named Docker0 is created on the host, and the Docker container started on the host is connected to the virtual bridge. A virtual bridge works like a physical switch, so that all containers on a host are connected to a layer 2 network through the switch.

Assign an IP address to the container from the Docker0 subnet and set the DOCker0 IP address to the container’s default gateway.

3 Communicate with the host network

We can simply think of them as two hosts, which are connected by network cables. If we want to communicate with multiple hosts, we can communicate with each other through switches. In Linux, we can forward data through a bridge.

In the container, the above implementation is through the Docker0 bridge, any container connected to docker0, can communicate through it. In order for the container to connect to the Docker0 bridge, we also need the virtual appliance Veth Pair, which is similar to the network cable, to connect the container to the bridge.

We start a container:

$ docker run -d --name net_test alpine:latest /bin/sh
Copy the code

Then view the nic device:

$docker exec -it net_test /bin/sh / # ifconfig eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet ADDR :172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14 Errors :0 Dropped :0 Overruns :0 Frame :0 TX packets:0 Errors :0 Dropped :0 Overruns :0 Carrier :0 collisions:0 TXQueuelen :0 RX bytes:1172 (1.1) KiB) TX bytes:0 (0.0b) Lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP Loopback RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 Collisions :0 txqueuelen:1000 RX bytes:0 (0.0b) TX bytes:0 (0.0b) / # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0  0 eth0Copy the code

You can see that there is an eth0 network card, which is the virtual network card of one end of the Veth peer. Then run route-n to view the routing table in the container. Eth0 is also the default route exit. All requests to 172.17.0.0/16 are routed through eth0. At the other end of the Veth peer, we look at the host’s network device:

# ifconfig docker0: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 BROADCAST 172.17.255.255 inet6 fe80::42:6aff:fe46:93d2 prefixlen 64 scopeid 0x20<link> ether 02:42:6a:46:93:d2 txqueuelen 0 (Ethernet) RX packets 0 Bytes 0 (0.0b) RX errors 0 dropped 0 Overruns 0 frame 0 TX packets 8 bytes 656 (656.0b) TX errors 0 dropped 0 overruns  0 carrier 0 collisions 0 eth0: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 10.100.0.2 netmask 255.255.255.0 BROADCAST 10.100.0.255 inet6 fe80::5400:2ff:fea3:4b44 prefixlen 64 scopeid 0x20<link> ether 56:00:02:a3:4b:44 txqueuelen 1000 (Ethernet) RX packets 7788093 bytes 9899954680 (9.2 GiB) RX errors 0 Dropped 0 Overruns 0 frame 0 TX packets 5512037 bytes 9512685850 (8.8) GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: Flags =73<UP,LOOPBACK,RUNNING> MTU 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixLen 128 scopeid 0x10<host> loop Txqueuelen 1000 (Local Loopback) RX packets 32 bytes 2592 (2.5KIb) RX errors 0 Dropped 0 Overruns 0 frame 0 TX packets 32 bytes 2592 (2.5kib) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0 veth20b3DAC: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::30e2:9cff:fe45:329 prefixlen 64 scopeid 0x20<link> ether 32: E2:9C :45:03:29 TXQueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0b) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 8 bytes 656 (656.0b) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code

We can see that the other end of the Veth peer corresponding to the container is a virtual nic on the host named VEth20B3DAC, and the network bridge information can be checked by BRCTL to see that this nic is on docker0.

# brctl show
docker0		8000.02426a4693d2	no		veth20b3dac
Copy the code

Then we start another container and see if the second can be pinged from the first container.

$docker run -d --name net_test_2 it alpine:latest /bin/sh $docker exec it net_test_2 /bin/sh / # ping 172.17.0.3 PING 172.17.0.3 (172.17.0.3): 56 data bytes 64 bytes from 172.17.0.3: Seq =0 TTL =64 time=0.291 ms 64 bytes from 172.17.0.3: seq=1 TTL =64 time=0.129 ms 64 bytes from 172.17.0.3: Seq =2 TTL =64 time=0.142 ms 64 bytes from 172.17.0.3: seq=3 TTL =64 time=0.169 ms 64 bytes from 172.17.0.3: seq=3 TTL =64 time=0.169 ms 64 bytes from 172.17.0.3: Seq =4 TTL =64 time= 0.139ms ^C -- -7.07.0.3 ping statistics -- 7.0packets transmitted, 7.0packets received 0% packet loss round-trip min/ AVg/Max = 0.129/0.185/0.291msCopy the code

As can be seen, the principle of ping is that when we ping the target IP172.17.0.3, it will match the second rule of our routing table, and the gateway is 0.0.0.0, which means that it is a direct route and forwards to the destination through layer 2. To reach 172.17.0.3 over the Layer 2 network, we need to know its Mac address. In this case, the first container needs to send an ARP broadcast to look up the Mac by the IP address. In this case, the other segment of the Veth peer is the Docker0 bridge, which broadcasts to all the virtual network cards of the Veth peer connected to it, and then the correct virtual network card will respond to the ARP message, and then the bridge will return to the first container.

The above is the communication between different containers of the host machine and Docker0, as shown below:By default, container processes restricted by network Namespace are essentially different through the Veth peer device and the host network bridgenetwork namespaceData exchange.

Similarly, when you are on a host and access the IP address of the host’s container, the requested packet will first arrive at the Docker0 bridge according to the routing rules, then be forwarded to the corresponding Veth Pair device, and finally appear in the container.

4 Cross-host network communication

Under the default configuration of Docker, it is impossible for containers on different hosts to access each other by IP address. To solve this problem, a number of networking solutions have emerged in the community. At the same time, IN order to better control network access, K8S introduced CNI, namely container network API interface.

In fact, the container network communication flow of CNI is the same as the previous basic network, except CNI maintains a separate bridge instead of Docker0. The name of this bridge is: CNI bridge, and its default device name on the host is: CNI0. The design idea of CNI is that Kubernetes can directly call the CNI Network plug-in after starting the Infra container, and configure the Network stack that meets the expectation for the Network Namespace of the Infra container.

There are three network implementation modes of CNI plug-in:

  • Overlay mode is implemented based on tunnel technology. The entire container network and host network are independent. When containers communicate with each other across hosts, the entire container network is encapsulated in the underlying network, and then unpacked and delivered to the target container after it arrives at the target machine. Independent of the implementation of the underlying network. Flannel (UDP, VXLAN), Calico (IPIP) and so on
  • In layer 3 routing mode (based on routing), containers and hosts are also on different network segments. The communication between containers is based on the routing table, without establishing tunnel packets between hosts. But the restrictions must depend on large layer 2 within the same LAN. The implementation of flannel(host-GW), Calico (BGP) and so on
  • The Underlay network is the underlying network responsible for connectivity. Container network and host network still belong to different network segments, but they are in the same layer of network and in the same position. The entire network can communicate with each other at layer 3. There is no restriction on layer 2, but it needs to be strongly supported by the underlying network. Implemented plug-ins include Calico (BGP) and so on

4.1 Flannel

4.4.1 Flannel profile

The implementation of Pod network on Kubernetes system relies on third-party plug-ins, while Flannel is a mainstream container network solution mainly promoted by CoreOS. CNI plug-in has two functions: Because flannel is simple, it does not support network policies. The Flannel project itself is just a framework. It is its back-end implementation that provides network functions.

  • UDP
  • VXLAN
  • host-gw

UDP was one of the earliest methods supported by the Flannel project. It was the least high-performance method and is now deprecated.

The VXLAN and host-GW modes are most commonly used.

4.1.2 VXLan

After flannel runs, a network interface is added to each Node host:

[root@master ~]# ifconfig flannel.1 flannel.1: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1450 inet 10.244.0.0 netmask 255.255.255.255 BROADCAST 10.244.0.0 Mr 7e:df:88:39:f8:41 TXQueuelen 0 (Ethernet) RX packets 4999534 bytes 1644495535 (1.5 GiB) RX errors 0 Dropped 0 overruns 0 Frame 0 TX packets 5003648 bytes 438332880 (418.0 MiB) TX Errors 0 Dropped 1 Overruns 0 carrier 0 collisions 0 [root@node-02 ~]# ifconfig flannel.1 flannel.1: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1450 inet 10.244.1.0 netmask 255.255.255.255 BROADCAST 10.244.1.0 Mr 26:7C: 5D :33:b5:cc TXqueuelen 0 (Ethernet) RX packets 561013 bytes 58177715 (55.4 MiB) RX errors 0 Dropped 0 Overruns 0 Frame 0 TX packets 513179 bytes 106201855 (101.2 MiB) TX Errors 0 Dropped 1 Overruns 0 carrier 0 collisions 0Copy the code

You can know from the results above:

  1. By default, flannel is in VXLAN mode, namely Overlay Network.
  2. Flanneld creates a Flannel. 1 interface, which is used to encapsulate the tunnel protocol. The default Pod segment assigned to the cluster is 10.244.0.0/16.
  3. Flannel configured a Pod network 10.244.0.0 for the master node and 10.244.1.0 for Node-02. If there are more nodes, the same goes for flannel.

Start a nginx container with copy 3:

[root@master ~]# kubectl create deployment nginx --image=nginx --replicas=3
Copy the code
[root@master ~]# kubectl get po -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP           NODE      NOMINATED NODE   READINESS GATES
nginx-5b946576d4-6kftk   1/1     Running   0          35s   10.244.2.8   node-03   <none>           <none>
nginx-5b946576d4-b8bqc   1/1     Running   0          35s   10.244.1.5   node-02   <none>           <none>
nginx-5b946576d4-cwmmp   1/1     Running   0          35s   10.244.1.6   node-02   <none>           <none>
Copy the code

Two pods run on node Node-02, one of which is configured with IP 10.244.1.5.

Now, look at the network interface on this node

[root@node-02 ~]# ifconfig cni0 cni0: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1450 inet 10.244.1.1 netmask 255.255.255.0 BROADCAST 10.244.1.255 inet6 fe80::a816:67ff:fee2:6e8f prefixlen 64 scopeid 0x20<link> ether aa:16:67:e2:6e:8f txqueuelen 1000 (Ethernet) RX packets 2 bytes 56 (56.0b) RX errors 0 dropped 0 Overruns 0 frame 0 TX packets 5 bytes 446 (446.0b) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0Copy the code

When a container is running, there is a virtual interface CNI0 on the node, whose IP is 10.244.1.1. It is a virtual bridge called CNI0 created by Flanneld, which is used in Pod local communication.

Flanneld creates a pair of VeTH virtual appliances for each Pod, with one end on the container interface and the other on the CNI0 bridge.

Use BRCTL to view the bridge:

[root@node-02 ~]# brctl show cni0 bridge name bridge id STP enabled interfaces cni0 8000.aa1667e26e8f no vethc7b43a1d Vethf6777127 # has two containers of network interface hanging on the CNI0 bridge.Copy the code

Test normal access:

[root@master ~]# ping 10.244.1.5 ping 10.244.1.5 (10.244.1.5) 56(84) bytes of data.64 bytes from 10.244.1.5: Icmp_seq =1 TTL =63 time=1.67 ms 64 bytes from 10.244.1.5: ICmp_seq =2 TTL =63 time=1.04 ms 64 bytes from 10.244.1.5: icmp_seq=2 TTL =63 time=1.04 ms 64 bytes from 10.244.1.5: Icmp_seq TTL = 3 = 63 time = 1.21 msCopy the code

In the flannel VXLAN network, it is possible for two hosts to communicate with each other. The following two pods are available:

Ping 10.244.1.5 ping 10.244.1.5 (10.244.1.5): 56 data bytes 64 bytes from 10.244.1.5: Icmp_seq =0 TTL =62 time=2.379 ms 64 bytes from 10.244.1.5: ICmp_seq =1 TTL =62 time= 1.379ms 64 bytes from 10.244.1.5: Icmp_seq =2 TTL =62 time=1.064 ms 64 bytes from 10.244.1.5: ICmp_seq =3 TTL =62 time=1.483 msCopy the code

So how do containers communicate across hosts? Look at the routing information:

[root@master ~]# ip route show
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
Copy the code

Packets destined for network 10.244.0/24 are sent to flannel.1 of the local device, that is, they enter the Layer 2 tunnel and encapsulate VXLAN packets. After arriving at the target Node, flannel.1 on the target Node is decapsulated.

Once a Node is started and joined to the Flannel network, flanneld on other nodes adds a routing rule similar to this rule. This is the default VXLAN network.

The VXLAN packet is encapsulated by the master because it is pinged from another user.

[root@node-02 ~]# tcpdump -i eth0 -nn host 10.234.2.13 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), Capture size 262144 bytes 15:40:40.502259 IP 10.234.2.13.60088 > 10.234.2.12.8472: OTV, flags [I] (0x08), overlay 0, instance 1 IP 10.244.2.8 > 10.244.1.5: ICMP Echo Request, ID 24, SEq 19, Length 64 15:40:40.502543 IP 10.234.2.12.37793 > 10.234.2.13.8472: Flags [I] (0x08), overlay 0, instance 1 IP 10.244.1.5 > 10.244.2.8: ICMP Echo Reply, ID 24, SEq 19, Length 64 15:40:41.503471 IP 10.234.2.13.60088 > 10.234.2.12.8472: OTV, flags [I] (0x08), overlay 0, instance 1 IP 10.244.2.8 > 10.244.1.5: ICMP Echo Request, ID 24, SEq 20, Length 64Copy the code

VXLAN is a network virtualization technology supported by the Linux kernel. As a module of the kernel, VXLAN can be encapsulated and decapsulated in the kernel state to build an overlay network. In fact, VXLAN is a virtual layer 2 network composed of flannel. 1 devices on each host.

Because VXLAN has poor performance due to the extra packet unpacking, Flannel uses the host-GW mode, which uses the host as the gateway. The Flannel has no extra cost except for local routes and the performance is similar to that of Calico. Because packets are not stacked to forward packets, the routing table is large. Because a node corresponds to a network, it corresponds to a routing entry.

Host-gw Although the performance of a VXLAN network is much better than that of a VXLAN network, this method has a disadvantage: All physical nodes must be on the same Layer 2 network. Physical nodes must be on the same network segment. In this case, the number of hosts in a network segment will be too large, and one broadcast packet will cause interference. In private cloud scenarios, it is common for hosts to reside on different network segments. Therefore, host-gw cannot be used.

If two nodes in the same network segment use host-GW to communicate, and if they are not in the same network segment, that is, there is a router between the node where the current POD is located and the node where the target POD is located, use VXLAN to use the overlay network.

Combining host-GW and VXLAN, this is the Directrouting mode of VXLAN

Therefore, there are two VXLAN modes of Flannel:

  1. VXLAN: a native VXLAN, that is, an extended virtual LAN
  2. Directrouting: indicates Directrouting

4.1.3 Directrouting

Kube-flannel. yml 下载 kube-flannel.yml 下载 kube-flannel.yml 下载 kube-flannel:

net-conf.json: | {" Network ":" 10.244.0.0/16 # default segment "Backend" : {" Type ":" VXLAN ", # note format "Directrouting" : true}}Copy the code

redeploy

[root@master flannel]# kubectl delete -f kube-flannel.yaml 
podsecuritypolicy.policy "psp.flannel.unprivileged" deleted
clusterrole.rbac.authorization.k8s.io "flannel" deleted
clusterrolebinding.rbac.authorization.k8s.io "flannel" deleted
serviceaccount "flannel" deleted
configmap "kube-flannel-cfg" deleted
daemonset.apps "kube-flannel-ds" deleted


[root@master flannel]# kubectl create -f kube-flannel.yaml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
Copy the code

Deleting the Flannel Redeployment requires deleting the original Flannel, so you should plan for the Flannel at the beginning

Look at the route again

[root@master flannel]# ip route show
default via 10.234.2.254 dev eth0  proto static  metric 100 
10.234.2.0/24 dev eth0  proto kernel  scope link  src 10.234.2.11  metric 100 
10.244.1.0/24 via 10.234.2.12 dev eth0 
10.244.2.0/24 via 10.234.2.13 dev eth0 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 
Copy the code

The next hop to network 10.244.0/24 is 10.234.2.12, which exits from the physical interface eth0 on the local computer. This is Directrouting. If the two nodes are in different network segments, the Flannel mode is automatically degraded to VXLAN mode.

[root@node-02 ~]# tcpdump -i eth0 -nn icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:51:19.324976 IP 10.244.2.8 > 10.244.1.5: ICMP echo request, id 25, seq 98, length 64
15:51:19.325209 IP 10.244.1.5 > 10.244.2.8: ICMP echo reply, id 25, seq 98, length 64
15:51:20.326665 IP 10.244.2.8 > 10.244.1.5: ICMP echo request, id 25, seq 99, length 64
15:51:20.326926 IP 10.244.1.5 > 10.244.2.8: ICMP echo reply, id 25, seq 99, length 64
15:51:21.327844 IP 10.244.2.8 > 10.244.1.5: ICMP echo request, id 25, seq 100, length 64
15:51:21.328020 IP 10.244.1.5 > 10.244.2.8: ICMP echo reply, id 25, seq 100, length 64
15:51:22.334799 IP 10.244.2.8 > 10.244.1.5: ICMP echo request, id 25, seq 101, length 64
15:51:22.335042 IP 10.244.1.5 > 10.244.2.8: ICMP echo reply, id 25, seq 101, length 64
Copy the code

According to packet capture, pod ping is performed through eth0, which is directrouting. This performance is higher than the default VXLAN.

4.1.4 Host – gw

As shown in the figure, when container-1 on Node1 sends data to Container2 on Node2, the routing table matches the following rules:

10.244.1.0/24 via 10.168.0.3 dev eth0
Copy the code

Indicates the IP packet destined for destination network segment 10.244.0/24. The IP address of the next hop is 10.168.0.3(node2) after eth0. When it gets to 10.168.0.3, it forwards the CNI bridge through the routing table to Container2.

The above can be seen the working principle of host-GW. In fact, the next hop of each node to each POD network segment is the IP address of the node where the POD network segment resides. The mapping relationship between POD network segment and node IP address is saved in ETCD or K8S. Flannel only needs the data changes of Watch to dynamically update the routing table.

The biggest benefit of this network mode is to avoid the network performance loss caused by additional packet encapsulation and unencapsulation. The main disadvantage we can see is that when the container IP packet goes through the next hop, it must be encapsulated into a data frame by layer 2 communication and sent to the next hop. If it is not on the same Layer 2 LAN, then it is up to the Layer 3 gateway, which is unaware of the target container network (it is also possible to statically configure POD network segment routing on each gateway). Therefore, flannel host-GW must be connected to the cluster host at layer 2.

In order to solve the limitation of layer 2 communication, the network solution provided by Calico can be better implemented. Calico layer 3 network mode is similar to that provided by Flannel, and the following routing rules are added to each host:

< target container IP network segment > via < gateway IP address > dev eth0Copy the code

The gateway IP address has different meanings. If the host is reachable at Layer 2, it is the IP address of the host where the destination container is located; if the host is at layer 3, it is the gateway IP address of the local host (switch or router address).

Unlike Flannel using k8S or ETCD data to maintain local routing information, Calico uses BGP dynamic routing protocol to distribute routing information for the entire cluster.

BGP is the Border Gateway Protocol, which is supported by Linxu natively and is used to transmit routing information between autonomous systems in large-scale data centers. Just remember that BGP is simply a protocol for synchronizing and sharing routing information between nodes on large-scale networks. BGP can replace flannel to maintain the host routing table.

4.2 the Calico

2 the Calico introduction

Calico is a pure three-layer virtual network, which does not reuse docker’s Docker0 bridge, but implements it by itself. Calico network does not carry out additional encapsulation of packets, does not need NAT and port mapping, and has good scalability and performance. Calico network provides DockerDNS service, which can be accessed between containers through hostname. Calico uses LinuxKernel to implement an efficient vRouter (virtual route) on each compute node to be responsible for data forwarding, and it allocates an IP to each container. Each node is a route that connects containers of different hosts to enable cross-host container communication. Each vRouter transmits routing information of its own node to the entire Calico network through BGP (Border Gateway Protocol). For small-scale deployment, it can be directly connected, and for large-scale deployment, it can be completed by the specified BGProute Reflector. Calico also provides rich and flexible network policies based on Iptables, enabling multi-tenant isolation, security groups, and other accessibility restrictions through ACLs on individual nodes.

Calico is mainly composed of three parts:

  • Calico CNI plug-in: mainly responsible for docking with Kubernetes, for kubelet call use.
  • Felix: Maintains routing rules and FIB forwarding information base on the host.
  • BIRD: distributes routing rules, like a router.
  • Confd: Configures management components.

4.2.2 IPIP

An IP packet encapsulated in an IP packet, that is, a tunnel encapsulating the IP layer to the IP layer, basically acts as a bridge based on the IP layer. Generally speaking, the common bridge is based on the MAC layer and does not need IP, but the IPIP is a tunnel through the routes at both ends. Connecting two otherwise disconnected networks through point-to-point;

After Calico is deployed in IPIP mode, node will have a TUNl0 network adapter, which is used by IPIP for tunnel encapsulation and is also an overlay mode network. After we take the node offline and the Calico container has stopped, the device is still there and can be removed by executing rmmodipip.

Holdings BGP

The BGP working mode is almost the same as the Flannel host-GW mode

Bird is the client of BGD and communicates with the Birds of other nodes in the cluster to facilitate the exchange of routing information

As the number of nodes N increases, these routing rules will grow exponentially and put great pressure on the cluster’s own network. The official recommendation is less than 100 nodes

Restriction: The same as the Flannel host-GW restriction, the physical machine must be connected at Layer 2 and cannot cross network segments

5 concludes

The above are several network solutions commonly used by Kubernetes. In the public cloud scenario, it is easier to use cloud vendors or flannel host-GW, while in the private physical room environment, Calico project is more suitable. Select an appropriate network solution based on your actual scenario.