The article “How to Implement a Kubernetes networking plug-in” was first published at: blog.hdls.me/16131164540…
At present, there are more and more network solutions for containers. Every new solution has to be adapted for network solutions and different container runtime, which is obviously unreasonable, and CNI is to solve this problem.
While maintaining “family-level Kubernetes cluster” at home during the Spring Festival holiday, I came up with the idea of writing a network plug-in, so BASED on the existing wheels of CNI/Plugin warehouse, I wrote Village Net (github.com/zwwhdls/vil…) . Using this network plug-in as an example, this article focuses on how to implement a CNI plug-in.
How CNI works
To understand how to implement a CNI plug-in, you need to understand how CNI works. CNI is an Interface protocol used to configure the Network of containers. After the container management system provides a network namespace where the container resides, the CNI is responsible for inserting the network interface into the network namespace and configuring the corresponding IP address and route.
CNI is actually a bridge connecting the container runtime system and CNI Plugin. CNI transmits the container runtime information and network configuration information to Plugin, and each Plugin realizes the follow-up work. Therefore, CNI Plugin is the concrete realization of container network. It can be summed up in the following graph:
What is the CNI Plugin
We now know that the CNI Plugin is a concrete implementation of a container network. In the cluster, each Plugin exists in binary form and is invoked by Kubelet via the CNI interface. The specific process is as follows:
CNI plugins can be divided into three categories: Main, IPAM, and Meta. The Main and IPAM plug-ins complement each other to complete the basic work of creating a network environment for the container.
IPAM plug-in
The IP Address Management (IPAM) plug-in is used to assign IP addresses. The official available plug-ins include the following:
- DHCP: A daemon running on the host that makes DHCP requests on behalf of the container
- Host-local: the IP address segment allocated in advance is used to allocate IP addresses and the IP address usage is recorded in the memory
- Static: Assigns static IP addresses to containers for debugging purposes
The Main plugin
The Main plug-in is used to create binaries for specific network devices. The official available plug-ins include the following:
- Bridge: Create a bridge on the host and connect to the container via a Veth pair
- Macvlan: Multiple macVtaps are created. Each macVtap has a different MAC address
- Ipvlan: Similar to MacVla N, ipvlan virtualizes multiple virtual network interfaces through a single host interface, except that it virtualizes a shared MAC address with different IP addresses
- Loopback: LO device (set the loopback interface to UP)
- PTP: VeTH pair
- Vlan: Vlan devices are allocated
- Host-device: Moves an existing device from the host to the container
Meta plugin
Internal plug-ins maintained by the CNI community currently include:
- Flannel: A plug-in specifically for the Flannel project
- Tuning: Use sysctl to adjust the binary file of network device parameters
- Portmap: binary file used to configure port mapping using iptables
- Bandwidth: binary file that uses the Token Bucket Filter (TBF) for traffic limiting
- Firewall: Add rules to control the flow in and out of a container using iptables or Fireframe
The implementation of CNI Plugin
The CNI Plugin repository is at: github.com/containerne… . You can see the concrete implementation of each type of Plugin. Each Plugin needs to implement the following three methods, which are registered in main.
func cmdCheck(args *skel.CmdArgs) error{... }func cmdAdd(args *skel.CmdArgs) error{... }func cmdDel(args *skel.CmdArgs) error{... }Copy the code
Using host-local as an example, the registration method is as follows. You need to specify the three methods implemented above, supported versions, and the name of the Plugin.
func main(a) {
skel.PluginMain(cmdAdd, cmdCheck, cmdDel, version.All, bv.BuildString("host-local"))}Copy the code
What is the CNI
After understanding the working principle of Plugin, let’s look at the specific working principle of CNI. CNI’s warehouse is at: github.com/containerne… . The code analyzed in this article is based on the current version, V0.8.1.
The community provides a tool, Cnitool, that simulates the CNI interface being called and can add or remove network devices from an existing Network namespace.
Let’s take a look at cnitool’s execution logic:
func main(a){... netconf, err := libcni.LoadConfList(netdir, os.Args[2])... netns := os.Args[3]
netns, err = filepath.Abs(netns)
...
// Generate the containerid by hashing the netns path
s := sha512.Sum512([]byte(netns))
containerID := fmt.Sprintf("cnitool-%x", s[:10])
cninet := libcni.NewCNIConfig(filepath.SplitList(os.Getenv(EnvCNIPath)), nil)
rt := &libcni.RuntimeConf{
ContainerID: containerID,
NetNS: netns,
IfName: ifName,
Args: cniArgs,
CapabilityArgs: capabilityArgs,
}
switch os.Args[1] {
case CmdAdd:
result, err := cninet.AddNetworkList(context.TODO(), netconf, rt)
ifresult ! =nil {
_ = result.Print()
}
exit(err)
case CmdCheck:
err := cninet.CheckNetworkList(context.TODO(), netconf, rt)
exit(err)
case CmdDel:
exit(cninet.DelNetworkList(context.TODO(), netconf, rt))
}
}
Copy the code
As can be seen from the above code, the configuration netconf is parsed from the CNI configuration file, and the netNS and containerId information is passed to the interface cninet.addNetworkList as the runtime information of the container.
Let’s look at the implementation of AddNetworkList:
// AddNetworkList executes a sequence of plugins with the ADD command
func (c *CNIConfig) AddNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) (types.Result, error) {
var err error
var result types.Result
for _, net := range list.Plugins {
result, err = c.addNetwork(ctx, list.Name, list.CNIVersion, net, result, rt)
iferr ! =nil {
return nil, err
}
}
...
return result, nil
}
Copy the code
Obviously, this function performs the addNetwork operations of each Plugin in sequence. Look again at the addNetwork function:
func (c *CNIConfig) addNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) (types.Result, error) {
c.ensureExec()
pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)
...
newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)
...
return invoke.ExecPluginWithResult(ctx, pluginPath, newConf.Bytes, c.args("ADD", rt), c.exec)
}
Copy the code
The addNetwork operation for each plug-in is divided into three parts:
- First, the FindInPath function is called to find the absolute path of the plug-in based on its type.
- Next, the buildOneConfig function is called to extract the NetworkConfig structure of the current plug-in from NetworkList, where preResult is the result of the execution of the previous plug-in.
- Finally, calls the invoke ExecPluginWithResult function, truly perform plug-ins Add operations. Newconf. Bytes stores the byte stream encoded by NetworkConfig and the execution result of the previous plug-in. The c. Args function is used to build an instance of type Args, which mainly stores the container runtime information and the operation information to execute the CNI.
. In fact, invoke ExecPluginWithResult is just a wrapper functions, it calls the exec. ExecPlugin return, here we see the exec. ExecPlugin implementation:
func (e *RawExec) ExecPlugin(ctx context.Context, pluginPath string, stdinData []byte, environ []string) ([]byte, error) {
stdout := &bytes.Buffer{}
stderr := &bytes.Buffer{}
c := exec.CommandContext(ctx, pluginPath)
c.Env = environ
c.Stdin = bytes.NewBuffer(stdinData)
c.Stdout = stdout
c.Stderr = stderr
// Retry the command on "text file busy" errors
for i := 0; i <= 5; i++ {
err := c.Run()
...
// All other errors except than the busy text file
return nil, e.pluginErr(err, stdout.Bytes(), stderr.Bytes())
}
...
}
Copy the code
At this point, we see the core logic of the entire CNI, which is surprisingly simple, simply exec the plug-in executable and retry 5 times if an error occurs.
At this point, the entire CNI execution process has been very clear, in short, a CNI plug-in is an executable file, from the configuration file to get network configuration information, container information from the container runtime, the former in the form of standard input, the latter in the form of environment variables passed to each plug-in. Finally, each plug-in is called in the order defined in the configuration file, and the execution result of the previous plug-in is passed to the next plug-in in the configuration information.
However, the mature network plug-in schemes we are familiar with today, such as Calico, usually call the main plug-in only, call the IPAM plug-in from the main plug-in, and get the results on the spot.
How does Kubelet use CNI
Now that you know how the CNI plug-in works, take a look at how Kubelet uses the CNI plug-in.
When kubelet creates a POD, it calls the CNI plug-in to create a network environment for the POD. Source code is as follows, can see kubelet SetUpPod functions (PKG/kubelet dockershim/network/the cni/the cni. Go, call the plugin. AddToNetwork function:
func (plugin *cniNetworkPlugin) SetUpPod(namespace string, name string, id kubecontainer.ContainerID, annotations, options map[string]string) error {
iferr := plugin.checkInitialized(); err ! =nil {
return err
}
netnsPath, err := plugin.host.GetNetNS(id.ID)
...
ifplugin.loNetwork ! =nil {
if_, err = plugin.addToNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, annotations, options); err ! =nil {
return err
}
}
_, err = plugin.addToNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, annotations, options)
return err
}
Copy the code
Consider the addToNetwork function, which first builds the pod runtime information and then reads the network configuration information of the CNI plug-in, the configuration file in the /etc/cni/net.d directory. After assembling the parameters required by plugin, call the interface of CNI, cninet.addNetworkList. The source code is as follows:
func (plugin *cniNetworkPlugin) addToNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations, options map[string]string) (cnitypes.Result, error) {
rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, options)
...
pdesc := podDesc(podNamespace, podName, podSandboxID)
netConf, cniNet := network.NetworkConfig, network.CNIConfig
...
res, err := cniNet.AddNetworkList(ctx, netConf, rt)
...
return res, nil
}
Copy the code
Simulate the CNI execution process
After understanding the entire CNI execution process, let’s simulate the CNI execution process. We simulated the container network configuration with cnitool, bridge for main and host-local for IPAM.
Compile plugins
First, compile the CNI Plugin into an executable file and run the build_linux.sh script in the official repository:
$ mkdir -p $GOPATH/src/github.com/containernetworking/plugins
$ git clone https://github.com/containernetworking/plugins.git $GOPATH/src/github.com/containernetworking/plugins
$ cd $GOPATH/src/github.com/containernetworking/plugins
$ ./build_linux.sh
$ ls
bandwidth dhcp flannel host-local loopback portmap sbr tuning village-ipam vrf
bridge firewall host-device ipvlan macvlan ptp static village vlan
Copy the code
Create a network profile
Next, create our own network configuration file, select Bridge for main and host-local for IPAM, and specify the available IP segment.
$ mkdir -p /etc/cni/net.d
$ cat >/etc/cni/net.d/10-hdlsnet.conf <<EOF {" cniVersion ":" 0.2.0 ", "name" : "HDLS - net", "type" : "bridge", "bridge" : "cni0", "isGateway" : true, "ipMasq" : True, "ipam" : {" type ":" the host - the local ", "subnet configures" : "10.22.0.0/16", "routes" : [{" DST ":" 0.0.0.0/0}]}} EOF
$ cat >/etc/cni/net.d/99-loopback.conf <<EOF
{
"cniVersion": "0.2.0",
"name": "lo",
"type": "loopback"
}
EOF
Copy the code
Create a network namespace
$ ip netns add hdls
Copy the code
Run add for Cnitool
Finally, specify CNI_PATH as the path to the above compiled plug-in executable and run the cnitool tool for the official repository:
$ mkdir -p $GOPATH/src/github.com/containernetworking/cni
$ git clone https://github.com/containernetworking/cni.git $GOPATH/src/github.com/containernetworking/cni
$ export CNI_PATH=$GOPATH/src/github.com/containernetworking/plugins/bin
$ go run cnitool.go add hdls-net /var/run/netns/hdls
\{
"cniVersion": "0.2.0"."ip4": {
"ip": "10.22.0.2/16"."gateway": "10.22.0.1"."routes": [{"dst": "0.0.0.0/0"}},"dns": {}}#
Copy the code
The IP address assigned to the network namespace hdLS-net is 10.22.0.2.
validation
After obtaining the IP address of the container, verify that it can be pinged. Run the nsenter command to enter the namespace of the container, and you can find that eth0, the default network device of the container, is also created:
$ping 10.22.0.2 ping 10.22.0.2 (10.22.0.2) 56(84) bytes of data. 64 bytes from 10.22.0.2: Icmp_seq =1 TTL =64 time=0.039 ms 64 bytes from 10.22.0.2: ICmp_seq =2 TTL =64 time=0.046 ms 64 bytes from 10.22.0.2: icmp_seq=2 TTL =64 time=0.046 ms 64 bytes from 10.22.0.2: Icmp_seq =3 TTL =64 time=0.042 ms 64 bytes from 10.22.0.2: Icmp_seq = 4 TTL = 64 time = 0.073 ms ^ C - 10.22.0.2 ping statistics - 4 packets transmitted and received 4, 0% packet loss, Time 3000ms RTT min/avg/ Max /mdev = 0.039/0.050/0.073/0.013ms $nsenter --net= /var/run/netnn/hdls bash [root@node-3 ~]# ip l
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether be:6b:0c:93:3a:75 brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@node-3 ~]#
Copy the code
A veTH device pair corresponding to eth0 has been created:
$ ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:9a:04:8d brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:22:86:98:d9 brd ff:ff:ff:ff:ff:ff
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 76:32:56:61:e4:f5 brd ff:ff:ff:ff:ff:ff
5: veth3e674876@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cni0 state UP mode DEFAULT group default
link/ether 62:b3:06:15:f9:39 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Copy the code
Village Net
Village Net was chosen as the name of the plug-in in order to implement a layer 2 based networking plug-in through MacVLAN. For a layer 2 network, internal communication like a small village, communication by roar (ARP), of course, there is the meaning of the village network, although simple, but enough to use.
The working principle of
The reason for choosing macVLAN to implement network plug-in is that, for a “family level Kubernetes cluster”, the number of nodes is not much, but there are not many services, can only be distinguished through nodeport mapping (nodeport), and because all the machines are on the same switch, RELATIVELY rich IP, Macvlan/IPVLAN are both simple and easy to implement schemes. Considering that DHCP service can be used based on MAC, and even the IP of POD can be fixed based on MAC, we try to use macVLAN to implement the network plug-in.
However, macVLAN has many problems in cross-NET namespaces. For example, when a separate NET namespace exists, traffic will cross the host protocol stack, causing the cluster IP based on iptables/ IPVS to fail to work properly.
Of course, it is the same reason, but when macVLAN is used, the network between the host and the container is not interconnected, but you can create an additional MACVLAN bridge to solve the problem.
In order to solve the problem of cluster IP not working properly, the idea of just using MACVLAN is abandoned, and multiple network interfaces are used for networking.
Each Pod has two network interfaces, a bridge-based eth0 that serves as the default gateway, and routing is added to the host to ensure cross-node communication. The second interface is the MacVLAN in Bridge mode and assigns the host network segment IP to the device.
The working process
Consistent with the work flow of CNI mentioned above, Village Net is divided into main plug-in and IPAM plug-in.
The main task of IPAM is to allocate an available IP address from each of the two network segments. The main plug-in creates bridge, VEth, and MACVLAN devices based on the IP addresses of the two network segments and configures them.
The last
Village Net is still fairly simple to implement and even requires some manual work, such as the routing part of the Bridge. But the function basically meets the expectation, and has combed cnI pit completely again. Cni itself is not complicated, but there are a lot of details that were not considered at the beginning of the process, and even ended up being bypassed by several workaround. If you still have time and energy to focus on the web plug-in later, consider how to optimize it. <( ̄▽ ̄)/