VXLAN Basic Tutorial: Configure the VXLAN network on Linux

As mentioned at the end of the last article, Linux supports VXLAN. We can use Linux to build an overlay network based on VXLAN to deepen our understanding of VXLAN. After all, it is all talk and not practice.

1. Point-to-point VXLAN

This section describes the simplest point-to-point VXLAN. A point-to-point VXLAN is a VXLAN constructed by two hosts. Each host has a VTEP and the VTEPs communicate with each other using their IP addresses. The figure shows the point-to-point VXLAN network topology:

In order not to affect the host network environment, we can use Linux VRF to isolate the route of the Root Network namespace. Virtual Routing and Forwarding (VRF) is a Routing instance consisting of a Routing table and a group of network devices. You can think of it as a lightweight network namespace, which only virtualizes the three-layer network protocol stack. The Network namespace virtualizes the entire network protocol stack. For details, see the principles and implementation of the Linux Virtual Routing Forwarding (VRF).

VRF is supported only when the Linux Kernel version is greater than 4.3. It is recommended that you upgrade the Kernel first.

Of course, if you have a clean host dedicated to doing experiments, you can avoid VRF isolation.

The following uses VRF to create a point-to-point VXLAN network.

Create a VXLAN interface on 192.168.57.50.

$ ip link add vxlan0 typeVxlan \ ID 42 \ dstport 4789 \ remote 192.168.57.54 \local 192.168.57.50 \
  dev eth0
Copy the code

Important parameters:

id 42: specifyVNIIs valid between 1 andIn between.
dstport : VTEPPort for communication, IANA assigned port 4789. If this parameter is not specified, Linux uses it by default8472.
Remote: indicates the IP address of the peer VTEP.
local: Current nodeVTEPThe IP address to be used is the IP address of the tunnel port on the node.
dev eth0: Used by the current nodeVTEPA communication device used to obtain the IP address of the VTEP.This parameter serves the same purpose as the local parameter.

View the details about vxlan0:

$ ip -dlink show vxlan0 11: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf-test state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 82: F3:76:95 :ab: E1 BRD FF :ff:ff:ff:ff:ff 0 VXLAN ID 42 Remote 192.168.57.54local192.168.57.50 srcport 00 DSTport 4789 Ageing 300 UDPCSum NOUdp6ZerOCSumTx Noudp6ZerOCSumrxCopy the code

Then create a VRF and bind vxlan0 to the VRF:

$ ip link add vrf0 type vrf table 10
$ ip link set vrf0 up
$ ip link set vxlan0 master vrf0
Copy the code

View the information about vxlan0 again:

$ ip -dlink show vxlan0 13: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf0 state UNKNOWN mode DEFAULT group default qlen 1000 Link /ether AA :4d:80:e3:75:e0 BRD ff:ff:ff:ff:ff:ff promiscuity 0 VXLAN ID 42 remote 192.168.57.54local192.168.57.50 srcport 00 DSTport 4789 Ageing 300 UDPCSum NOUdp6ZerOCSumtX Noudp6ZerOCSumrx VRF_slave Table 10 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535Copy the code

You will find more information about VRF.

Then configure the IP address for vxlan0 and enable it:

$IP addr add 172.18.1.2/24 dev vxlan0 $IP linkset vxlan0 up
Copy the code

After the command is executed successfully, the following information is found in VRF routing entries. All network packets whose destination IP address is 172.18.1.0/24 must be forwarded by vxlan0:

$IP route show VRF vrf0 172.18.1.0/24 dev vxlan0 proto kernel scope link SRC 172.18.1.2Copy the code

An FDB forwarding table will also be added:

$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 192.168.57.54 self permanentCopy the code

The default peer IP address of the VTEP is 192.168.57.54. In other words, after the original packet passes vxLAN0, the VXLAN header is added to the kernel, and the destination IP address of the external UDP header is 192.168.57.54.

Perform the same configuration on the other host (192.168.57.54) :

$ ip link add vxlan0 typeVxlan ID 42 dstport 4789 remote 192.168.57.50 $IP link add VRf0type vrf table 10
$ ip link set vrf0 up
$ ip link setVxlan0 master vrf0 $IP addr add 172.18.1.3/24 dev vxlan0 $IP linkset vxlan0 up
Copy the code

Ping 172.18.1.3 on 192.168.57.50:

$ping 172.18.1.3 -i vrf0Copy the code

Using wireshark to capture packets remotely:

$SSH [email protected]'tcpdump -i any -s0 -c 10 -nn -w - port 4789' | /Applications/Wireshark.app/Contents/MacOS/Wireshark -k -i -
Copy the code

I won’t explain the exact meaning, but refer to the Tcpdump example tutorial.

You can view that VXLAN packets are divided into three parts:

The innermost layer is the message seen by the actual communicating entity in the overlay network (such as here)ARPRequest), they are no different from classic network communication messages, except becauseMTUSome packets are small.
The middle layer is the VXLAN header, the field we care about mostVNIIs, indeed,42.
The outermost layer isVTEPCommunication packet header of the host. The destination IP address is the peer192.168.57.54.

The following describes the process of VXLAN communication in the simplest mode:

After sending the ping packet to 172.18.1.3 and viewing the routing table, the packet is sent from vxlan0.
The kernel discovers that the IP address of vxlan0 is 172.18.1.2/24, which is in the same network segment as the destination IP address. Therefore, in the same LAN, the kernel needs to know the MAC address of the peer and sends ARP packets to query the MAC address.
The source MAC address of ARP packets is the MAC address of vxlan0, and the destination MAC address is the broadcast address of all 1s (ff:ff:ff:ff:ff).
The VXLAN adds the header based on the configuration (VNI 42).
The VTEP address of the peer end is 192.168.57.54, and packets are sent to this ADDRESS.
After the peer host receives the packet, the kernel detects that it is a VXLAN packet and sends the packet to the corresponding VTEP based on the VNI.
After removing the VXLAN header, the VTEP retrieves the actual ARP request packet and records the source MAC address and IP address information in the FDB table. This is a learning process. Then ARP reply packets are generated.

$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 192.168.57.50 self permanent aa:4d:80:e3:75:e0 dev vxlan0 DST 192.168.57.50 selfCopy the code

The destination MAC address of the reply packet is the MAC address of the VTEP of the sender, and the destination IP address is the IP address of the VTEP of the sender.
The underlay host forwards the reply packet to the VTEP based on the VNI. The VTEP unpackets the ARP reply packet, adds the ARP cache to the kernel, and learns the IP address and MAC address of the destination VTEP based on the packet. Add to FDB table.

$IP neigh show VRF vrf0 172.18.1.3 dev vxlan0 llADDR 76:06:5C :15:d9:78 STALE $bridge FDB show 00:00:00:00:00 dev Vxlan0 DST 192.168.57.54 self permanent fe:4a:7e:a2:b5:5d dev vxlan0 DST 192.168.57.54 selfCopy the code

At this pointVTEPAll information required for communication is known. Subsequent ICMP ping packets are unicast over the logical tunnel and do not need to be sentARPQuery packets.

To sum up, ping packets on a VXLAN go through two processes: ARP addressing and ICMP response. Once the VTEP learns the ARP address of the peer, the ARP addressing process is not required for subsequent communication.

2. VXLAN + Bridge

In the preceding point-to-point VXLAN network, only one VTEP and only one entity communicate with each other. In actual production, dozens or even hundreds of VMS or containers need to communicate with each other on each host. Therefore, a mechanism is required to organize these communication entities and forward the communication through the TUNNEL TUNNEL VTEP.

The Linux Bridge can connect multiple virtual nics. Therefore, you can use the Bridge to connect multiple VMS or containers to the same VXLAN network. The network topology is shown in the following figure:

Compared with the preceding mode, only one more Bridge is used to connect Veth pairs in different network namespaces, and VXLAN nics also need to connect to this Bridge.

Create a VXLAN interface on 192.168.57.50.

$ ip link add vxlan0 type vxlan \
  id 42 \
  dstport 4789 \
  local 192.168.57.50 \
  remote 192.168.57.54
Copy the code

Then create bridge0, bind vxLAN0 to it, bind bridge0 to VRF, and start them:

$ ip link add br0 type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up
Copy the code

Create a network namespace and pair of veth pairs. Bind one end of the veth pair to the network bridge. Bind the other end to the network namespace and IP address 172.18.1.2.

$ ip netns add ns0

$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up

$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.2/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code

Configure the VXLAN network on the other host in the same way and bind 172.18.1.3 to eth0 in the namespace of the other network.

$ ip link add vxlan0 type vxlan \
  id 42 \
  dstport 4789 \
  local192.168.57.54 \ remote 192.168.57.50 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up

$ ip netns add ns0

$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up

$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.3/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code

Ping 172.18.1.2 from 172.18.1.3 shows that the whole communication process is similar to the previous experiment, except that the ARP packets sent by the container go through the bridge and then are forwarded to vxLAN0. Then the Linux kernel adds the VXLAN header to vxLAN0. Finally, it is sent to the peer end.

Logically, the nics of network namespaces on different hosts in the VXLAN network are connected to the same bridge. In this way, multiple containers in the same VXLAN network can be created on the same host and communicate with each other.

3. VXLAN in multicast mode

The preceding two modes can only be connected point-to-point. That is, a VXLAN network can only have two nodes. Is there a way to accommodate multiple nodes in the same VXLAN network? Let’s review two key pieces of information about VXLAN communication:

Belongs to the peer VM (or container)MACaddress
IP address of the peer hostVTEPThe IP address of

When two hosts communicate with each other for the first time, they need to know the MAC address of each other and send ARP packets to query the MAC address. If there are multiple nodes, ARP query packets need to be sent to all nodes. However, traditional ARP packet broadcast cannot be implemented because Underlay and Overlay are not on the same Layer 2 network. By default, ARP broadcast cannot escape the host. To implement Overlay network broadcasting, packets must be sent to all VTEP nodes. To solve this problem, there are two approaches:

Using multicast, some nodes in the network into a virtual whole.
Know in advanceMACThe address andVTEP IPInformation, directly putARP 和 FDBThe message tells the sender VTEP. The collected information is distributed to all nodes in the same VXLAN network through an external distributed control center.

Let’s take a look at how multicast is implemented first, and save the distributed control center for the next article.

To use the multicast mode, the underlying network must support the multicast function. The multicast address ranges from 224.0.0.0 to 239.255.255.255.

Compared with the preceding point-to-point VXLAN + Bridge mode, this command only changes the peer parameter to group. Other parameters remain unchanged.

Execute on host 192.168.57.50
$ ip link add vxlan0 type vxlan \
  id 42 \
  dstport 4789 \
  localGroup 224.1.1.1 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up

$ ip netns add ns0

$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up

$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.2/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code

Execute on host 192.168.57.54
$ ip link add vxlan0 type vxlan \
  id 42 \
  dstport 4789 \
  localGroup 224.1.1.1 $IP link add br0type bridge
$ ip link set vxlan0 master br0
$ ip link add vrf0 type vrf table 10
$ ip link set br0 master vrf0
$ ip link set vxlan0 up
$ ip link set br0 up
$ ip link set vrf0 up

$ ip netns add ns0

$ ip link add veth0 type veth peer name eth0 netns ns0
$ ip link set veth0 master br0
$ ip link set veth0 up

$ ip -n ns0 link setLo up $IP -n ns0 addr add 172.18.1.3/24 dev eth0 $IP -n ns0 linkset eth0 up
Copy the code

What is obviously different from the above experiment is the content of the FDB entry:

$bridge FDB show 00:00:00:00:00:00 dev vxlan0 DST 224.1.1.1 self permanentCopy the code

The value of the DST field is changed to the multicast address 224.1.1.1 instead of the VTEP address of the peer. The VTEP joins the same multicast Group 224.1.1.1 through IGMP.

Let’s analyze the whole process of VXLAN communication in multicast mode:

The ping packet is sent to172.18.1.3To view the routing table, the packets are routed from thevxlan0Sent out.
The kernel foundvxlan0IP is172.18.1.2/24, and the destination IP address are on the same network segment, so they are on the same LAN, and need to know the MAC address of the peerARPQuery packets.
ARPThe source MAC address of the packet isvxlan0The broadcast address whose destination MAC addresses are all 1s (ff:ff:ff:ff:ff:ff).
VXLANAdd headers according to configuration (VNI 42).
It’s going to be different at this point, because we don’t know the opposite endVTEPOn which host, based on the multicast configuration,VTEPWill multicast the address224.1.1.1Send multibroadcast messages.
All hosts in the multicast group receive this message, and the kernel finds that it isVXLANThe message will be based onVNISend to the correspondingVTEP.
Of all hosts that received the packetVTEPWill remove theVXLANTake out the real headARPRequest message. At the same time,VTEPWill record the sourceMACAddress and IP address information toFDBThis is a learning process. If you find thatARPIf you don’t send it to yourself, throw it away. Generates if it is sent to itselfARPReply message.
The next step is the same as the experiment above.

The whole communication process is similar to that before, except that Underlay sends packets in multicast mode, which is relatively simple and efficient for a multi-node VXLAN network. However, multicast also has its problems. Not all network devices support multicast (such as public cloud), and the packet waste caused by multicast is seldom used in actual generation. The next article focuses on how to automatically discover VTEP and MAC addresses through a distributed control center.

4. Reference materials

Vxlan is implemented in Linux

Wechat official account

Scan the following QR code to follow the wechat public account, in the public account reply ◉ plus group ◉ to join our cloud native communication group, and Sun Hongliang, Zhang Curator, Yang Ming and other leaders to discuss cloud native technology

VXLAN Basic Tutorial: Configure the VXLAN network on Linux

1. Point-to-point VXLAN

2. VXLAN + Bridge

3. VXLAN in multicast mode

4. Reference materials

Wechat official account

Related Posts

SpringBoot elegantly solves Ajax cross-domain requests

How does MySQL deal with illusory and non-repeatability?

“Ten times the Programmer” to understand UML