Linux virtual network model

In order to support multiple instances of the network protocol stack, Linux introduces network namespaces into the network protocol stack. These independent protocol stacks are isolated into different namespaces. The network protocol stacks in different namespaces are completely isolated and cannot communicate with each other. This is how Docker achieves isolation between different containers. The Veth device pair can communicate with two different namespaces. In multiple namespaces, bridge is used, and Netfilter and Iptables are used to filter, modify, and discard packets. This article will describe how to implement Linux virtual network model from these points.

Start with Netns and Veth

Netns is the Linux network namespace and the network between them is isolated, while Veth can connect two different namespaces.

1 IP netns add ns1 2 IP netns ls 3 IP netns add ns2 4 IP link add veth0 type veth peer name veth1 Veth always comes in pairs with 5 IP Link showsCopy the code

The namespace and veTH are created. You can run IP Netns ls and IP Link show respectively. IP NetNS LS result

ns2
ns1 (id: 0)
Copy the code

IP link show result

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 06:99:32:53:81:1e brd ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:8d:5f:08:37:21 brd ff:ff:ff:ff:ff:ff
4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4e:22:2b:5e:8e:1f brd ff:ff:ff:ff:ff:ff
Copy the code

Next we tie veth to netns

    6  ip link set veth1 netns ns1
    8  ip link set veth0 netns ns2
Copy the code

Then run IP link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 06:99:32:53:81:1e brd ff:ff:ff:ff:ff:ff
Copy the code

If the veth is tied to ns1 and NS2, you can run the IP netns exec ns1 IP link show command to execute the IP netns exec ns1 IP link show

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:8d:5f:08:37:21 brd ff:ff:ff:ff:ff:ff link-netnsid 1
Copy the code

At the same time, bound VetH can also use IP netns exec ns1 to execute the IP link command

   11  ip netns exec ns1 ip link set veth1 netns ns2
   13  ip netns exec ns2 ip link set veth0 netns ns1
Copy the code

Now the two Veth are bound to the two NetNs, but they cannot communicate yet because there is no bound IP. Now let’s bind IP

   14  ip netns exec ns1 ip addr add 10.1.1.1/24 dev veth0
   15  ip netns exec ns2 ip addr add 10.1.1.2/24 dev veth1
Copy the code

Start two IP addresses

   16  ip netns exec ns1 ip link set dev veth0 up
   17  ip netns exec ns2 ip link set dev veth1 up
Copy the code

Try ping IP netns exec ns1 ping 10.1.1.2

PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.64 bytes from 10.1.1.2: Icmp_seq =1 TTL =64 time=0.035 ms 64 bytes from 10.1.1.2: ICmp_seq =2 TTL =64 time=0.028 ms 64 bytes from 10.1.1.2: Icmp_seq =3 TTL =64 time=0.028 ms 64 bytes from 10.1.1.2: Icmp_seq =4 TTL =64 time= 0.03ms ^C -- 10.1.1.2 ping statistics -- 4 packets transmitted, 4 received, 0% packet loss Time 2997ms RTT min/avg/ Max /mdev = 0.028/0.030/0.035/0.004 msCopy the code

Ohhhhhhhhhh, it worked.

Bridge enriches communications

The Linux kernel Bridges through a virtual bridge device, which can bind several interface devices, bridge them together, and, crucially, have an IP address.

ip netns delete ns1
ip netns delete ns2
Copy the code

So let’s delete those two Spaces, and then we’re going to configure ns1

#Create Network Namespace 1
sudo ip netns add ns1

#Create veth
sudo ip link add veth0 type veth peer name veth_ns_1

#Move a VETH to NS1
sudo ip link set veth0 netns ns1

#Configure the IP for this VETH and enable the network interface in NS1Sudo IP netns exec ns1 ifconfig veth0 175.18.0.2/24 up sudo IP netns exec ns1 ifconfig lo up
#Enable the virtual network card left in the default network namespace
sudo ifconfig veth_ns_1 up
Copy the code

IP netns exec NS1 IP addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 6: veth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether a6:04:d5:bb:ec:30 BRD ff:ff:ff:ff:ff:ff :ff link- netnSID 0 inet 175.18.0.2/24 BRD 175.18.0.255 scope global veth0 VALID_lft forever preferred_lft forever inet6 fe80::a404:d5ff:febb:ec30/64 scope link valid_lft forever preferred_lft foreverCopy the code

Do the same thing for NS2

sudo ip netns add ns2 sudo ip link add veth0 type veth peer name veth_ns_2 sudo ip link set veth0 netns ns2 sudo ip Netns exec ns2 ifconfig veth0 175.18.0.3/24 up sudo IP netns exec ns2 ifconfig lo up sudo ifconfig veth_ns_2 UPCopy the code

Next, configure the bridge, mainly by tying the two left veth ends to the bridge

#Create a bridge
#BRCTL is in the bridge-utils package, otherwise you need to use apt-get to download it
sudo brctl addbr ns_br

#Configure the IP address of the bridge and enable itSudo ifconfig ns_br 175.18.0.1/24 up
#Configure the routingSudo route add-net 175.18.0.0/24 dev ns_br
#Add two virtual network cards to the bridge
sudo brctl addif ns_br veth_ns_1
sudo brctl addif ns_br veth_ns_2

#Configure default routes for both network namespacesSudo IP netns exec NS1 IP route add default via 175.18.0.1 dev veth0 sudo IP netns exec ns2 IP route add default via 175.18.0.1 dev veth0Copy the code

Bridge and both NetNs can now communicate

Sudo IP netns exec ns2 ping 175.18.0.1 sudo IP netns exec ns1 ping 175.18.0.1 sudo ping -i ns_br 175.18.0.2 sudo ping -i Ns_br 175.18.0.3Copy the code

However, they can’t talk to each other, and maybe that’s where the iptables problem comes in.

Linux iptables

If a Bridge acts like a switch, the IPTable filters packets forwarded on the switch

cat /proc/sys/net/ipv4/ip_forward
Copy the code

First check whether the forwarding function is enabled, if returned to zero

sudo sysctl -w net.ipv4.conf.all.forwarding=1
Copy the code

Iptables has five tables (table). Run the following command to query FILTER

iptables -t filter -n --list
Copy the code

Focus on Forward, which deals with forwarding rules

Chain FORWARD (policy ACCEPT)
Copy the code

This can be implemented if your policy is DROP

iptables -t filter --policy FORWARD ACCEPT
Copy the code

In this way, the data packet can be allowed and pinged again

IP netns exec ns1 ping -c 3 175.18.0.3Copy the code
PING 175.18.0.3 (175.18.0.3) 56(84) bytes of data.64 bytes from 175.18.0.3: Icmp_seq =1 TTL =64 time=0.038 ms 64 bytes from 175.18.0.3: ICmp_seq =2 TTL =64 time=0.041 ms 64 bytes from 175.18.0.3: icmp_seq=2 TTL =64 time=0.041 ms 64 bytes from 175.18.0.3: Icmp_seq =3 TTL =64 time= 0.0441 ms -- 175.18.0.3 ping statistics -- 3 packets transmitted, 3 received, 0% packet loss Time 1999ms RTT min/avg/ Max /mdev = 0.038/0.040/0.041/0.001msCopy the code

Ping the IP address of the host is also ok

IP netns exec ns1 ping 172.31.30.66Copy the code
PING 172.31.30.66 (172.31.30.66) 56(84) bytes of data. 64 bytes from 172.31.30.66: Icmp_seq =1 TTL =64 time=0.029 ms 64 bytes from 172.31.30.66: ICmp_seq =2 TTL =64 time=0.036 ms 64 bytes from 172.31.30.66: Icmp_seq =3 TTL =64 time= 0.03ms ^C -- 172.31.30.66 ping statistics -- 3 packets transmitted, 3 received 0% packet loss, time 1998ms RTT min/avg/ Max /mdev = 0.029/0.033/0.036/0.007msCopy the code

However, if you try to ping other external IP addresses, you will find that the ping fails. Take Baidu as an example.

root@ip-172-31-30-66:~# ping www.baidu.com ping www.a.shifen.com (220.181.38.149) 56(84) bytes of data.64 bytes from 220.181.38.149: ICmp_seq =1 TTL =46 time=21.6 ms 64 bytes from 220.181.38.149: Icmp_seq =2 TTL =46 time=21.7 ms ^C -- www.a.shifen.com ping statistics -- 2 packets transmitted, 2 received 0% packet loss, Time 1001 RTT min/avg/Max/ms mdev 21.716/0.154 = 21.625/21.670 / ms root @ IP - 172-31-30-66: ~ # IP netns exec ns1 ping Ping 220.181.38.149 (220.181.38.149) 56(84) bytes of data. ^C -- 220.181.38.149 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 2999msCopy the code

This is because it is successful to send an ICMP packet from NS1 to 220.181.38.149. In this case, the source ADDRESS of the packet is 175.18.0.2. However, when 220.181.38.149 replies, the destination IP address of the ICMP packet is 175.18.0.2, which is an Intranet IP address. Therefore, the packet must be lost.

Change the source IP address to 172.31.30.66 before the packet leaves. The NET table of Iptables is used to do this

Iptables -t NAT -A POSTROUTING -S 175.18.0.0/24 -j MASQUERADECopy the code

The POSTROUTING chain, which is also one of the five chains of Iptables, is used to do SNAT (source address translation), while the MASQUERADE strategy is: The IP address of the network adapter from which the packet is sent is replaced with the source ADDRESS of the packet. In this case, 175.18.0.2 is replaced with 172.31.30.66. -A stands for append, which is to append Settings to the POSTROUTING chain of the NAT table. If you change -a to -d, this rule will be deleted. -t indicates a table. -s indicates the source address. -j indicates the jump. Ping Baidu again

ip netns exec ns1 ping -c 3 www.baidu.com
#It is now possible to resolve IP from DNS as well
Copy the code
PING www.a.shifen.com (220.181.38.150) 56(84) bytes of data.64 bytes from 220.181.38.150: Icmp_seq =1 TTL =45 time=20.6 ms 64 bytes from 220.181.38.150: Icmp_seq =2 TTL =45 time=20.7 ms 64 bytes from 220.181.38.150: Icmp_seq =3 TTL =45 time= 30.7ms -- www.a.shifen.com ping statistics -- 3 packets transmitted, 3 received, 0% packet loss, time 2003 RTT min/avg/Max/ms mdev 20.751/0.174 = 20.630/20.706 / msCopy the code

Now that both networks can request the outside world, how can they be accessed by the outside world? Start Python’s Web server in two network Spaces

ip netns exec ns1 nohup python3 -m http.server 80 &
ip netns exec ns2 nohup python3 -m http.server 80 & 
Copy the code

You can curl directly to both servers

root@ip-172-31-30-66:~# curl -I http://175.18.0.2 HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.5.2 Date: Fri, 06 Nov 2020 06:30:38 GMT Content-type: text/html; Charset = UTF-8 Content-Length: 629 root@ip-172-31-30-66:~# curl -I http://175.18.0.3 HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.5.2 Date: Fri, 06 Nov 2020 06:30:38 GMT Content-type: text/ HTML; charset=utf-8 Content-Length: 629Copy the code

Docker -p/ -p is a mapping between ports in the container and physical machine ports. This mapping is implemented using iptables. We can also use iptables to implement mappings.

Iptables -t NAT -a PREROUTING -p TCP --dport 8088 -j DNAT --to 175.18.0.2:80 iptables -t NAT -a PREROUTING -p TCP --dport 8089-j DNAT --to 175.18.0.3:80Copy the code

Now two pthon3 open file servers are open in 8088 and 8089

conclusion

The Linux virtual network model explained in this paper is very basic, which is very different from the network model implemented by Docker, but the principle is similar. In the future, we will continue to conduct network model analysis experiments of Docker, Kubernetes and ISTIO. Please pay attention to 👏

reference

  • [1] Veth + iptables simulates Docker network Bridge mode
  • [2] Kubernetes Authoritative Guide