Flannel is a simple and easy to use container network solution designed for Kubernetes, which organizes all PODS in a large virtual layer 2 network on the same subnet. Flannel supports many kinds of back-end forwarding modes. This paper introduces two of them, VXLAN and host-GW.

Introduction of VXLAN

Virtual Extensible LAN (VXLAN) is a network virtualization technology. It uses a tunnel protocol to encapsulate Layer 2 Ethernet frames in Layer 4 UDP packets and transmits them over the Layer 3 network to form a large Virtual Layer 2 network. The format of VXLAN packets is as follows:VXLAN uses the VXLAN Tunnel Endpoint (VTEP) to packet and unpacket. It is the starting point or Endpoint of a VXLAN Tunnel.

  • At the sending end, the source VTEP encapsulates original packets into VXLAN packets and sends them to the peer VTEP through UDP.
  • On the receiving end, the VTEP disconnects VXLAN packets and forwards the original Layer 2 data frames to the destination recipient.

A VTEP can be an independent network device, such as a switch, or a virtual device deployed on a server. For example, when TOR is used as the VTEP, the VXLAN network model is as follows:

Source: legend support.huawei.com/enterprise/…

But obviously, in flannel, the capabilities of VTEP are realized through Linux virtual machine network devices. In VXLAN mode, the role of the VTEP is played by the flannel.1 virtual network adapter.

Mode of VXLAN

VXLAN is the default and recommended Flannel mode. When you install Flannel with the default configuration, it assigns each node a 24-bit subnet and creates two virtual nics on each node: CNI0 and Flannel.1. Cni0 is a bridge device, similar to Docker0, and all pods on the node are connected to cNI0 in the form of veth pairs. Flannel.1 is a VXLAN device that functions as a VTEP to unpack VXLAN packets.

Linux has supported VXLAN since kernel version 3.7, and it is fully supported by kernel version 3.12.

Intra-node communication

Obviously, the communication between containers in a node can be completed through the CNI0 bridge without any VXLAN packet unpacking. For example, the subnet of Node1 is 10.244.0.1/24, and PodA 10.244.0.20 and PodB 10.224.0.21 communicate with each other over the CNI0 bridge.

Cross-node communication

Let’s focus on container communication across nodes. Suppose there are two nodes Node1 and Node2, where the PodA of Node1 communicates with the PodB of Node2, the communication process between them is as follows:

To recap the process:

  • Sender: Initiates the packet in the PodAPing 10.244.1.21 ,ICMPA message throughcni0After the bridgeflannel.1Equipment processing.flannel.1The device is a VXLAN device and is responsible for unpacking VXLAN packets. So, at the sending end,flannel.1The original L2 packets are encapsulated into VXLAN UDP packetseth0To send.
  • Receiver: Node2 Receives the UDP packet, finds that the packet is a VXLAN packet, and delivers it to Node2flannel.1Unpack. According to the destination IP address of the original packet obtained after unpacking, the original packet is passed throughcni0The bridge is sent to the PodB.

Which IP addresses to give toflannel.1To deal with

Flanneld obtains the subnet information of all nodes from the ETCD, and then configers routes for each node. The IP addresses of non-node subnets are routed to flannel.1, and the IP addresses of node subnets are routed to cNI0.

[root@Node1 ~]# ip r. 10.244.0.0/24 dev cni0 proto kernel Scope Link SRC 10.244.0.1# Node1 subnet is 10.224.0.0/24, the local PodIP is processed by CNI0
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink # Node2 subnet is 10.224.1.0/24, and the PodID of Node2 is handled by flannel.1.Copy the code

If the node information changes, flanneld synchronously changes the routing information.

flannel.1Of the packet

VXLAN packet encapsulation encapsulates Layer 2 Ethernet frames into Layer 4 UDP packets.

The original L2 frame

To generate the original L2 frame, flannel.1 needs to know:

  • Inner layer Source/destination IP address
  • Inner source/destination MAC address

The source/destination IP address of the inner layer is known, that is, PodIP of PodA/PodB, in the legend, 10.224.0.20 and 10.224.1.20, respectively. The source and destination MAC addresses in the inner layer must be obtained from the routing table and ARP table. According to routing table ① :

  1. The MAC address of the next hop is 10.224.1.0. The MAC address of the destination MAC address is obtained by associating ARP table 2 with the next hop.Node2_flannel.1_MAC;
  2. A message fromflannel.1Therefore, the source MAC address isflannel.1MAC address of the.

It is important to note that the ARP entry ② is not learned through ARP. It is set by Flanneld for each node and maintained by Flanneld. It has no expiration time.

Check the ARP table
[root@Node1 ~]# ip n | grep flannel.110.244.1.0 dev flannel.1 llAddr BA :74:f9:db:69:c1 PERMANENT# PERMANENT indicates that it will never expire
Copy the code

With that information in mind,flannel.1You can construct an inner 2-layer Ethernet frame:

Outer VXLAN UDP packets

To encapsulate the original L2 frame into a VXLAN UDP packet, flannel.1 needs to fill in the source and destination IP addresses. As mentioned earlier, A VTEP is the start or end of a VXLAN tunnel. Therefore, the destination IP address is the IP address of the peer VTEP, which can be obtained from the FDB table. In the FDB table ③, the DST field indicates the IP address of the destination endpoint (peer VTEP) of the VXLAN tunnel, that is, the destination IP address of the VXLAN DUP packet. FDB tables are also preset and maintained by Flanneld on each node.

The Forwarding Database (FDB) table is used to store the association between MAC addresses and ports on Layer 2 devices, just like the MAC address table on switches. When a Layer 2 device forwards Layer 2 Ethernet frames, the device finds the corresponding port based on the FDB entry. For example, the CNI0 bridge is connected to many VEth pair network adapters. When the bridge forwards Ethernet frames to Pod, the FDB table queries the FDB table according to the MAC address of Pod network adapter to find the corresponding VEth network adapter, so as to realize unicom.

To view the FDB table, run the bridge FDB show command:

[root@Node1 ~]# bridge fdb show | grep flannel.1Ba :74:f9:db:69:c1 dev flannel.1 DST 192.168.50.3 self permanentCopy the code

The source IP address information comes from the setting of the flannel.1 network adapter. According to local 192.168.50.2, the source IP address is 192.168.50.2.

[root@Node1 ~]# ip -d a show flannel.1
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 32:02:78:2f:02:cb brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 1 local192.168.50.2 dev eth0 srcport 00 dstport 8472 nolearning Ageing 300 noudpcsum noudp6ZerocsumTx noudp6Zerocsumrx Numtxqueues 1 Numrxqueues 1 gSO_max_size 65536 GSO_max_segs 65535 inet 10.244.0.0/32 BRD 10.244.0.0 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::3002:78ff:fe2f:2cb/64 scope link valid_lft forever preferred_lft foreverCopy the code

At this point,flannel.1All the information needed to complete the VXLAN packet has been obtained and finally passedeth0To send a VXLAN UDP packet:The VXLAN mode of Flannel is staticThe routing table.ARP tableandFDB tableAnd VXLAN virtual network adapterflannel.1To implement a VXLAN network model in which all pods belong to a large layer 2 network.

The host – gw mode

In the VXLAN example above, Node1 and Node2 are two virtual machines on the same host in bridge mode, that is, they are on the same Layer 2 network. When the Layer 2 network is interconnected, you can configure layer 3 routes for nodes to communicate with each other without using a VXLAN tunnel. To use the host-gw mode, change backend. Type from vxlan to host-gw, and restart all kube-flannel pods.

. net-conf.json: | {"Network": "10.244.0.0/16"."Backend": {
        "Type": "host-gw"// < -host-gw}}...Copy the code

The communication process in host-GW mode is as follows:

In host-GW mode, it is not required because VXLAN packet unpacking is not involvedflannel.1Vm NICS.flanneldSet the next-hop IP address of the Pod subnet of the node to the IP address of the node, as shown in routing table (1) in the figure.

[root@Node1 ~]# ip r. 10.244.0.0/24 dev cni0 proto kernel Scope link SRC 10.244.0.1 10.244.1.0/24 via 192.168.50.3 dev eth0The next hop address of the Node2 subnet points to the public IP address of Node2..Copy the code

Without the cost of unpacking, host-GW has the best performance. However, the host-GW mode is generally not supported in the cloud environment. In the private deployment scenario, the host-GW mode can be considered.

reference

What is VXLAN In-depth understanding CNI Bridge man page IP-route man page IP-neighbour man page Flannel The VXLAN mode of the principle