Linux virtual network model
In order to support multiple instances of the network protocol stack, Linux introduces network namespaces into the network protocol stack. These independent protocol stacks are isolated into different namespaces. The network protocol stacks in different namespaces are completely isolated and cannot communicate with each other. This is how Docker achieves isolation between different containers. The Veth device pair can communicate with two different namespaces. In multiple namespaces, bridge is used, and Netfilter and Iptables are used to filter, modify, and discard packets. This article will describe how to implement Linux virtual network model from these points.
Start with Netns and Veth
Netns is the Linux network namespace and the network between them is isolated, while Veth can connect two different namespaces.
1 IP netns add ns1 2 IP netns ls 3 IP netns add ns2 4 IP link add veth0 type veth peer name veth1 Veth always comes in pairs with 5 IP Link showsCopy the code
The namespace and veTH are created. You can run IP Netns ls and IP Link show respectively. IP NetNS LS result
ns2
ns1 (id: 0)
Copy the code
IP link show result
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 06:99:32:53:81:1e brd ff:ff:ff:ff:ff:ff
3: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7e:8d:5f:08:37:21 brd ff:ff:ff:ff:ff:ff
4: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 4e:22:2b:5e:8e:1f brd ff:ff:ff:ff:ff:ff
Copy the code
Next we tie veth to netns
6 ip link set veth1 netns ns1
8 ip link set veth0 netns ns2
Copy the code
Then run IP link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 06:99:32:53:81:1e brd ff:ff:ff:ff:ff:ff
Copy the code
If the veth is tied to ns1 and NS2, you can run the IP netns exec ns1 IP link show command to execute the IP netns exec ns1 IP link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7e:8d:5f:08:37:21 brd ff:ff:ff:ff:ff:ff link-netnsid 1
Copy the code
At the same time, bound VetH can also use IP netns exec ns1 to execute the IP link command
11 ip netns exec ns1 ip link set veth1 netns ns2
13 ip netns exec ns2 ip link set veth0 netns ns1
Copy the code
Now the two Veth are bound to the two NetNs, but they cannot communicate yet because there is no bound IP. Now let’s bind IP
14 ip netns exec ns1 ip addr add 10.1.1.1/24 dev veth0
15 ip netns exec ns2 ip addr add 10.1.1.2/24 dev veth1
Copy the code
Start two IP addresses
16 ip netns exec ns1 ip link set dev veth0 up
17 ip netns exec ns2 ip link set dev veth1 up
Copy the code
Try ping IP netns exec ns1 ping 10.1.1.2
PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.64 bytes from 10.1.1.2: Icmp_seq =1 TTL =64 time=0.035 ms 64 bytes from 10.1.1.2: ICmp_seq =2 TTL =64 time=0.028 ms 64 bytes from 10.1.1.2: Icmp_seq =3 TTL =64 time=0.028 ms 64 bytes from 10.1.1.2: Icmp_seq =4 TTL =64 time= 0.03ms ^C -- 10.1.1.2 ping statistics -- 4 packets transmitted, 4 received, 0% packet loss Time 2997ms RTT min/avg/ Max /mdev = 0.028/0.030/0.035/0.004 msCopy the code
Ohhhhhhhhhh, it worked.
Bridge enriches communications
The Linux kernel Bridges through a virtual bridge device, which can bind several interface devices, bridge them together, and, crucially, have an IP address.
ip netns delete ns1
ip netns delete ns2
Copy the code
So let’s delete those two Spaces, and then we’re going to configure ns1
#Create Network Namespace 1
sudo ip netns add ns1
#Create veth
sudo ip link add veth0 type veth peer name veth_ns_1
#Move a VETH to NS1
sudo ip link set veth0 netns ns1
#Configure the IP for this VETH and enable the network interface in NS1Sudo IP netns exec ns1 ifconfig veth0 175.18.0.2/24 up sudo IP netns exec ns1 ifconfig lo up
#Enable the virtual network card left in the default network namespace
sudo ifconfig veth_ns_1 up
Copy the code
IP netns exec NS1 IP addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host LO valid_lft forever preferred_lft forever Inet6 ::1/128 scope host valid_lft forever preferred_lft forever 6: veth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether a6:04:d5:bb:ec:30 BRD ff:ff:ff:ff:ff:ff :ff link- netnSID 0 inet 175.18.0.2/24 BRD 175.18.0.255 scope global veth0 VALID_lft forever preferred_lft forever inet6 fe80::a404:d5ff:febb:ec30/64 scope link valid_lft forever preferred_lft foreverCopy the code
Do the same thing for NS2
sudo ip netns add ns2 sudo ip link add veth0 type veth peer name veth_ns_2 sudo ip link set veth0 netns ns2 sudo ip Netns exec ns2 ifconfig veth0 175.18.0.3/24 up sudo IP netns exec ns2 ifconfig lo up sudo ifconfig veth_ns_2 UPCopy the code
Next, configure the bridge, mainly by tying the two left veth ends to the bridge
#Create a bridge
#BRCTL is in the bridge-utils package, otherwise you need to use apt-get to download it
sudo brctl addbr ns_br
#Configure the IP address of the bridge and enable itSudo ifconfig ns_br 175.18.0.1/24 up
#Configure the routingSudo route add-net 175.18.0.0/24 dev ns_br
#Add two virtual network cards to the bridge
sudo brctl addif ns_br veth_ns_1
sudo brctl addif ns_br veth_ns_2
#Configure default routes for both network namespacesSudo IP netns exec NS1 IP route add default via 175.18.0.1 dev veth0 sudo IP netns exec ns2 IP route add default via 175.18.0.1 dev veth0Copy the code
Bridge and both NetNs can now communicate
Sudo IP netns exec ns2 ping 175.18.0.1 sudo IP netns exec ns1 ping 175.18.0.1 sudo ping -i ns_br 175.18.0.2 sudo ping -i Ns_br 175.18.0.3Copy the code
However, they can’t talk to each other, and maybe that’s where the iptables problem comes in.
Linux iptables
If a Bridge acts like a switch, the IPTable filters packets forwarded on the switch
cat /proc/sys/net/ipv4/ip_forward
Copy the code
First check whether the forwarding function is enabled, if returned to zero
sudo sysctl -w net.ipv4.conf.all.forwarding=1
Copy the code
Iptables has five tables (table). Run the following command to query FILTER
iptables -t filter -n --list
Copy the code
Focus on Forward, which deals with forwarding rules
Chain FORWARD (policy ACCEPT)
Copy the code
This can be implemented if your policy is DROP
iptables -t filter --policy FORWARD ACCEPT
Copy the code
In this way, the data packet can be allowed and pinged again
IP netns exec ns1 ping -c 3 175.18.0.3Copy the code
PING 175.18.0.3 (175.18.0.3) 56(84) bytes of data.64 bytes from 175.18.0.3: Icmp_seq =1 TTL =64 time=0.038 ms 64 bytes from 175.18.0.3: ICmp_seq =2 TTL =64 time=0.041 ms 64 bytes from 175.18.0.3: icmp_seq=2 TTL =64 time=0.041 ms 64 bytes from 175.18.0.3: Icmp_seq =3 TTL =64 time= 0.0441 ms -- 175.18.0.3 ping statistics -- 3 packets transmitted, 3 received, 0% packet loss Time 1999ms RTT min/avg/ Max /mdev = 0.038/0.040/0.041/0.001msCopy the code
Ping the IP address of the host is also ok
IP netns exec ns1 ping 172.31.30.66Copy the code
PING 172.31.30.66 (172.31.30.66) 56(84) bytes of data. 64 bytes from 172.31.30.66: Icmp_seq =1 TTL =64 time=0.029 ms 64 bytes from 172.31.30.66: ICmp_seq =2 TTL =64 time=0.036 ms 64 bytes from 172.31.30.66: Icmp_seq =3 TTL =64 time= 0.03ms ^C -- 172.31.30.66 ping statistics -- 3 packets transmitted, 3 received 0% packet loss, time 1998ms RTT min/avg/ Max /mdev = 0.029/0.033/0.036/0.007msCopy the code
However, if you try to ping other external IP addresses, you will find that the ping fails. Take Baidu as an example.
root@ip-172-31-30-66:~# ping www.baidu.com ping www.a.shifen.com (220.181.38.149) 56(84) bytes of data.64 bytes from 220.181.38.149: ICmp_seq =1 TTL =46 time=21.6 ms 64 bytes from 220.181.38.149: Icmp_seq =2 TTL =46 time=21.7 ms ^C -- www.a.shifen.com ping statistics -- 2 packets transmitted, 2 received 0% packet loss, Time 1001 RTT min/avg/Max/ms mdev 21.716/0.154 = 21.625/21.670 / ms root @ IP - 172-31-30-66: ~ # IP netns exec ns1 ping Ping 220.181.38.149 (220.181.38.149) 56(84) bytes of data. ^C -- 220.181.38.149 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 2999msCopy the code
This is because it is successful to send an ICMP packet from NS1 to 220.181.38.149. In this case, the source ADDRESS of the packet is 175.18.0.2. However, when 220.181.38.149 replies, the destination IP address of the ICMP packet is 175.18.0.2, which is an Intranet IP address. Therefore, the packet must be lost.
Change the source IP address to 172.31.30.66 before the packet leaves. The NET table of Iptables is used to do this
Iptables -t NAT -A POSTROUTING -S 175.18.0.0/24 -j MASQUERADECopy the code
The POSTROUTING chain, which is also one of the five chains of Iptables, is used to do SNAT (source address translation), while the MASQUERADE strategy is: The IP address of the network adapter from which the packet is sent is replaced with the source ADDRESS of the packet. In this case, 175.18.0.2 is replaced with 172.31.30.66. -A stands for append, which is to append Settings to the POSTROUTING chain of the NAT table. If you change -a to -d, this rule will be deleted. -t indicates a table. -s indicates the source address. -j indicates the jump. Ping Baidu again
ip netns exec ns1 ping -c 3 www.baidu.com
#It is now possible to resolve IP from DNS as well
Copy the code
PING www.a.shifen.com (220.181.38.150) 56(84) bytes of data.64 bytes from 220.181.38.150: Icmp_seq =1 TTL =45 time=20.6 ms 64 bytes from 220.181.38.150: Icmp_seq =2 TTL =45 time=20.7 ms 64 bytes from 220.181.38.150: Icmp_seq =3 TTL =45 time= 30.7ms -- www.a.shifen.com ping statistics -- 3 packets transmitted, 3 received, 0% packet loss, time 2003 RTT min/avg/Max/ms mdev 20.751/0.174 = 20.630/20.706 / msCopy the code
Now that both networks can request the outside world, how can they be accessed by the outside world? Start Python’s Web server in two network Spaces
ip netns exec ns1 nohup python3 -m http.server 80 &
ip netns exec ns2 nohup python3 -m http.server 80 &
Copy the code
You can curl directly to both servers
root@ip-172-31-30-66:~# curl -I http://175.18.0.2 HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.5.2 Date: Fri, 06 Nov 2020 06:30:38 GMT Content-type: text/html; Charset = UTF-8 Content-Length: 629 root@ip-172-31-30-66:~# curl -I http://175.18.0.3 HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.5.2 Date: Fri, 06 Nov 2020 06:30:38 GMT Content-type: text/ HTML; charset=utf-8 Content-Length: 629Copy the code
Docker -p/ -p is a mapping between ports in the container and physical machine ports. This mapping is implemented using iptables. We can also use iptables to implement mappings.
Iptables -t NAT -a PREROUTING -p TCP --dport 8088 -j DNAT --to 175.18.0.2:80 iptables -t NAT -a PREROUTING -p TCP --dport 8089-j DNAT --to 175.18.0.3:80Copy the code
Now two pthon3 open file servers are open in 8088 and 8089
conclusion
The Linux virtual network model explained in this paper is very basic, which is very different from the network model implemented by Docker, but the principle is similar. In the future, we will continue to conduct network model analysis experiments of Docker, Kubernetes and ISTIO. Please pay attention to
reference
- [1] Veth + iptables simulates Docker network Bridge mode
- [2] Kubernetes Authoritative Guide