As mentioned at the end of the last article, Linux supports VXLAN. We can use Linux to build an overlay network based on VXLAN to deepen our understanding of VXLAN. After all, it is all talk and not practice.
1. Point-to-point VXLAN
This section describes the simplest point-to-point VXLAN. A point-to-point VXLAN is a VXLAN constructed by two hosts. Each host has a VTEP and the VTEPs communicate with each other using their IP addresses. The figure shows the point-to-point VXLAN network topology:
In order not to affect the host network environment, we can use Linux VRF to isolate the route of the Root Network namespace. Virtual Routing and Forwarding (VRF) is a Routing instance consisting of a Routing table and a group of network devices. You can think of it as a lightweight network namespace, which only virtualizes the three-layer network protocol stack. The Network namespace virtualizes the entire network protocol stack. For details, see the principles and implementation of the Linux Virtual Routing Forwarding (VRF).
VRF is supported only when the Linux Kernel version is greater than 4.3. It is recommended that you upgrade the Kernel first.
Of course, if you have a clean host dedicated to doing experiments, you can avoid VRF isolation.
The following uses VRF to create a point-to-point VXLAN network.
Create a VXLAN interface on 192.168.57.50.
$ ip link add vxlan0 typeVxlan \ ID 42 \ dstport 4789 \ remote 192.168.57.54 \local 192.168.57.50 \
dev eth0
Copy the code
Important parameters:
- id 42: specify
VNI
Is valid between 1 andIn between. - dstport :
VTEP
Port for communication, IANA assigned port 4789. If this parameter is not specified, Linux uses it by default8472
. - Remote: indicates the IP address of the peer VTEP.
- local: Current node
VTEP
The IP address to be used is the IP address of the tunnel port on the node. - dev eth0: Used by the current node
VTEP
A communication device used to obtain the IP address of the VTEP.This parameter serves the same purpose as the local parameter.
View the details about vxlan0:
$ ip -dlink show vxlan0 11: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 82: F3:76:95 :ab: E1 BRD FF :ff:ff:ff:ff:ff 0 VXLAN ID 42 Remote 192.168.57.54local192.168.57.50 srcport 00 DSTport 4789 Ageing 300 UDPCSum NOUdp6ZerOCSumTx Noudp6ZerOCSumrxCopy the code
Then create a VRF and bind vxlan0 to the VRF:
$ ip link add vrf0 type vrf table 10
$ ip link set vrf0 up
$ ip link set vxlan0 master vrf0
Copy the code
View the information about vxlan0 again:
$ ip -dlink show vxlan0 13: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf0 state UNKNOWN mode DEFAULT group default qlen 1000 Link /ether AA :4d:80:e3:75:e0 BRD ff:ff:ff:ff:ff:ff promiscuity 0 VXLAN ID 42 remote 192.168.57.54local192.168.57.50 srcport 00 DSTport 4789 Ageing 300 UDPCSum NOUdp6ZerOCSumtX Noudp6ZerOCSumrx VRF_slave Table 10 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535Copy the code
You will find more information about VRF.
Then configure the IP address for vxlan0 and enable it:
$IP addr add 172.18.1.2/24 dev vxlan0 $IP linkset vxlan0 up
Copy the code
After the command is executed successfully, the following information is found in VRF routing entries. All network packets whose destination IP address is 172.18.1.0/24 must be forwarded by vxlan0:
$IP route show VRF vrf0 172.18.1.0/24 dev vxlan0 proto kernel scope link SRC 172.18.1.2Copy the code
An FDB forwarding table will also be added:
$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 192.168.57.54 self permanentCopy the code
The default peer IP address of the VTEP is 192.168.57.54. In other words, after the original packet passes vxLAN0, the VXLAN header is added to the kernel, and the destination IP address of the external UDP header is 192.168.57.54.
Perform the same configuration on the other host (192.168.57.54) :
$ ip link add vxlan0 typeVxlan ID 42 dstport 4789 remote 192.168.57.50 $IP link add VRf0type vrf table 10
$ ip link set vrf0 up
$ ip link setVxlan0 master vrf0 $IP addr add 172.18.1.3/24 dev vxlan0 $IP linkset vxlan0 up
Copy the code
Ping 172.18.1.3 on 192.168.57.50:
$ping 172.18.1.3 -i vrf0Copy the code
Using wireshark to capture packets remotely:
$SSH [email protected]'tcpdump -i any -s0 -c 10 -nn -w - port 4789' | /Applications/Wireshark.app/Contents/MacOS/Wireshark -k -i -
Copy the code
I won’t explain the exact meaning, but refer to the Tcpdump example tutorial.
You can view that VXLAN packets are divided into three parts:
- The innermost layer is the message seen by the actual communicating entity in the overlay network (such as here)
ARP
Request), they are no different from classic network communication messages, except becauseMTU
Some packets are small. - The middle layer is the VXLAN header, the field we care about most
VNI
Is, indeed,42
. - The outermost layer is
VTEP
Communication packet header of the host. The destination IP address is the peer192.168.57.54
.
The following describes the process of VXLAN communication in the simplest mode:
-
After sending the ping packet to 172.18.1.3 and viewing the routing table, the packet is sent from vxlan0.
-
The kernel discovers that the IP address of vxlan0 is 172.18.1.2/24, which is in the same network segment as the destination IP address. Therefore, in the same LAN, the kernel needs to know the MAC address of the peer and sends ARP packets to query the MAC address.
-
The source MAC address of ARP packets is the MAC address of vxlan0, and the destination MAC address is the broadcast address of all 1s (ff:ff:ff:ff:ff).
-
The VXLAN adds the header based on the configuration (VNI 42).
-
The VTEP address of the peer end is 192.168.57.54, and packets are sent to this ADDRESS.
-
After the peer host receives the packet, the kernel detects that it is a VXLAN packet and sends the packet to the corresponding VTEP based on the VNI.
-
After removing the VXLAN header, the VTEP retrieves the actual ARP request packet and records the source MAC address and IP address information in the FDB table. This is a learning process. Then ARP reply packets are generated.
$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 192.168.57.50 self permanent aa:4d:80:e3:75:e0 dev vxlan0 DST 192.168.57.50 selfCopy the code
-
The destination MAC address of the reply packet is the MAC address of the VTEP of the sender, and the destination IP address is the IP address of the VTEP of the sender.
-
The underlay host forwards the reply packet to the VTEP based on the VNI. The VTEP unpackets the ARP reply packet, adds the ARP cache to the kernel, and learns the IP address and MAC address of the destination VTEP based on the packet. Add to FDB table.
$IP neigh show VRF vrf0 172.18.1.3 dev vxlan0 llADDR 76:06:5C :15:d9:78 STALE $bridge FDB show 00:00:00:00:00 dev Vxlan0 DST 192.168.57.54 self permanent fe:4a:7e:a2:b5:5d dev vxlan0 DST 192.168.57.54 selfCopy the code
- At this point
VTEP
All information required for communication is known. Subsequent ICMP ping packets are unicast over the logical tunnel and do not need to be sentARP
Query packets.
To sum up, ping packets on a VXLAN go through two processes: ARP addressing and ICMP response. Once the VTEP learns the ARP address of the peer, the ARP addressing process is not required for subsequent communication.
2. VXLAN + Bridge
In the preceding point-to-point VXLAN network, only one VTEP and only one entity communicate with each other. In actual production, dozens or even hundreds of VMS or containers need to communicate with each other on each host. Therefore, a mechanism is required to organize these communication entities and forward the communication through the TUNNEL TUNNEL VTEP.
The Linux Bridge can connect multiple virtual nics. Therefore, you can use the Bridge to connect multiple VMS or containers to the same VXLAN network. The network topology is shown in the following figure:
Compared with the preceding mode, only one more Bridge is used to connect Veth pairs in different network namespaces, and VXLAN nics also need to connect to this Bridge.
Create a VXLAN interface on 192.168.57.50.
$ ip link add vxlan0 type vxlan \
id 42 \
dstport 4789 \
local 192.168.57.50 \
remote 192.168.57.54
Copy the code
Then create bridge0, bind vxLAN0 to it, bind bridge0 to VRF, and start them:
$ ip link add br0 type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up
Copy the code
Create a network namespace and pair of veth pairs. Bind one end of the veth pair to the network bridge. Bind the other end to the network namespace and IP address 172.18.1.2.
$ ip netns add ns0
$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up
$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.2/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code
Configure the VXLAN network on the other host in the same way and bind 172.18.1.3 to eth0 in the namespace of the other network.
$ ip link add vxlan0 type vxlan \
id 42 \
dstport 4789 \
local192.168.57.54 \ remote 192.168.57.50 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up
$ ip netns add ns0
$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up
$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.3/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code
Ping 172.18.1.2 from 172.18.1.3 shows that the whole communication process is similar to the previous experiment, except that the ARP packets sent by the container go through the bridge and then are forwarded to vxLAN0. Then the Linux kernel adds the VXLAN header to vxLAN0. Finally, it is sent to the peer end.
Logically, the nics of network namespaces on different hosts in the VXLAN network are connected to the same bridge. In this way, multiple containers in the same VXLAN network can be created on the same host and communicate with each other.
3. VXLAN in multicast mode
The preceding two modes can only be connected point-to-point. That is, a VXLAN network can only have two nodes. Is there a way to accommodate multiple nodes in the same VXLAN network? Let’s review two key pieces of information about VXLAN communication:
- Belongs to the peer VM (or container)
MAC
address - IP address of the peer host
VTEP
The IP address of
When two hosts communicate with each other for the first time, they need to know the MAC address of each other and send ARP packets to query the MAC address. If there are multiple nodes, ARP query packets need to be sent to all nodes. However, traditional ARP packet broadcast cannot be implemented because Underlay and Overlay are not on the same Layer 2 network. By default, ARP broadcast cannot escape the host. To implement Overlay network broadcasting, packets must be sent to all VTEP nodes. To solve this problem, there are two approaches:
- Using multicast, some nodes in the network into a virtual whole.
- Know in advance
MAC
The address andVTEP IP
Information, directly putARP
和FDB
The message tells the sender VTEP. The collected information is distributed to all nodes in the same VXLAN network through an external distributed control center.
Let’s take a look at how multicast is implemented first, and save the distributed control center for the next article.
To use the multicast mode, the underlying network must support the multicast function. The multicast address ranges from 224.0.0.0 to 239.255.255.255.
Compared with the preceding point-to-point VXLAN + Bridge mode, this command only changes the peer parameter to group. Other parameters remain unchanged.
Execute on host 192.168.57.50
$ ip link add vxlan0 type vxlan \
id 42 \
dstport 4789 \
localGroup 224.1.1.1 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up
$ ip netns add ns0
$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up
$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.2/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code
Execute on host 192.168.57.54
$ ip link add vxlan0 type vxlan \
id 42 \
dstport 4789 \
localGroup 224.1.1.1 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up
$ ip netns add ns0
$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up
$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.3/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code
What is obviously different from the above experiment is the content of the FDB entry:
$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 224.1.1.1 self permanentCopy the code
The value of the DST field is changed to the multicast address 224.1.1.1 instead of the VTEP address of the peer. The VTEP joins the same multicast Group 224.1.1.1 through IGMP.
Let’s analyze the whole process of VXLAN communication in multicast mode:
- The ping packet is sent to
172.18.1.3
To view the routing table, the packets are routed from thevxlan0
Sent out. - The kernel found
vxlan0
IP is172.18.1.2/24
, and the destination IP address are on the same network segment, so they are on the same LAN, and need to know the MAC address of the peerARP
Query packets. ARP
The source MAC address of the packet isvxlan0
The broadcast address whose destination MAC addresses are all 1s (ff:ff:ff:ff:ff:ff).VXLAN
Add headers according to configuration (VNI 42).- It’s going to be different at this point, because we don’t know the opposite end
VTEP
On which host, based on the multicast configuration,VTEP
Will multicast the address224.1.1.1
Send multibroadcast messages. - All hosts in the multicast group receive this message, and the kernel finds that it is
VXLAN
The message will be based onVNI
Send to the correspondingVTEP
. - Of all hosts that received the packet
VTEP
Will remove theVXLAN
Take out the real headARP
Request message. At the same time,VTEP
Will record the sourceMAC
Address and IP address information toFDB
This is a learning process. If you find thatARP
If you don’t send it to yourself, throw it away. Generates if it is sent to itselfARP
Reply message. - The next step is the same as the experiment above.
The whole communication process is similar to that before, except that Underlay sends packets in multicast mode, which is relatively simple and efficient for a multi-node VXLAN network. However, multicast also has its problems. Not all network devices support multicast (such as public cloud), and the packet waste caused by multicast is seldom used in actual generation. The next article focuses on how to automatically discover VTEP and MAC addresses through a distributed control center.
4. Reference materials
- Vxlan is implemented in Linux
Wechat official account
Scan the following QR code to follow the wechat public account, in the public account reply ◉ plus group ◉ to join our cloud native communication group, and Sun Hongliang, Zhang Curator, Yang Ming and other leaders to discuss cloud native technology