The author | east Ali cloud after-sales technical experts

In my experience, it is not easy to understand the concept of Kubernetes cluster service. Especially when we based on specious understanding, to investigate service-related problems, it will be very difficult.

This is reflected in the novice, for the ping service IP address, such as basic problems, are difficult to understand; Understanding service-specific IPtables configurations can be quite challenging even for experienced engineers.

Today, this side of the article, I will in-depth explain the Kubernetes cluster service principle and implementation, easy to understand.

What is the nature of the Kubernetes cluster service

Conceptually, the Kubernetes cluster service is actually a load balancer or reverse proxy. This has a lot in common with aliyun’s load balancing products. Like load balancing, a service has its IP address and front-end port; The service is followed by mounting container groups of pods as its “back end servers” with their own IP addresses and listening ports.

When this load balancing and back-end architecture is combined with the Kubernetes cluster, the most intuitive implementation we can think of is that one node in the cluster is dedicated to the role of load balancing (similar to LVS), while the other nodes are used to load the back-end container group.

There is a huge flaw in this approach, which is the single point problem. Kubernetes clustering is the culmination of Google’s automated operations practices over the years, which are clearly at odds with its philosophy of intelligent operations.

Own messenger

Sidecar is a core concept in the field of microservices. Sidecar mode, or in a more colloquical way, is its own messenger. Those of you who are familiar with service grids will be familiar with this. But less people may notice, in fact, Kubernetes cluster original service implementation, is also based on Sidecar mode.

In the Kubernetes cluster, the implementation of the service is actually the deployment of a reverse proxy Sidecar on each cluster node. All access to the cluster service is translated by the reverse proxy on the node into access to the service back-end container group. Basically, the relationship between nodes and these SidecArs is shown below.

Bring service into reality

In the previous two sections, we saw that the Kubernetes cluster service is essentially load balancing, that is, reverse proxy; At the same time, we know that in the actual implementation, the reverse proxy is not deployed on a node of the cluster, but as a sidecar of the cluster node, deployed on each node.

Here the service into the reverse proxy reality, is a Kubernetes cluster controller, namely kube-proxy. For the principles of Kubernetes cluster controllers, please refer to my other article on controllers. In simple terms, Kube-proxies act as controllers deployed on cluster nodes, and they listen for cluster state changes through the cluster API Server. When a new service is created, kube-proxy translates the status and properties of the cluster service into the configuration of the reverse proxy.

All that remains is the reverse Proxy, which is the implementation of Proxy in the figure above.

An implementation

At present, there are three main methods for Kubernetes cluster nodes to implement service reverse proxy, namely userspace, iptables and IPVS. Today we only analyze the iptables approach in depth. The underlying network is based on ali Cloud Flannel cluster network.

Filter frame

Now, let’s imagine a scenario. We have a room. The house has an inlet and an outlet. The water that comes in from the intake pipe is not drinkable because of impurities. And we expect the water that comes out of the outlet pipe to be drinkable. To do this, we cut open the pipe and put an impurity filter in the middle.

So we need to redesign. First of all, we can’t just cut the pipe, so we need to fix the cut in the pipe. For example, in the scenario above, we made sure that the water pipe could only have one cut position. Secondly, we abstract out two ways of treating water: physical change and chemical change.

Based on the above design, if we need to filter impurities, we can add a rule to filter impurities in the functional module of chemical change; If we need to increase the temperature, we can add a heating rule to the function module of physical change.

The above filter frame is obviously much better than the way of cutting water pipes. In designing this framework, we mainly do two things, one is to fix the water pipe cut position, the other is to abstract two water treatment methods.

With these two things in mind, let’s take a look at how Iptables, or more accurately, NetFilter, works. Netfilter is essentially a filter framework. Netfilter cuts a total of 5 PREROUTING, FORWARD, POSTROUTING, INPUT, and OUTPUT ports on the network packet receiving and routing pipes. Netfilter defines several network packet processing modes, including NAT and filter.

Large view of node network

Now let’s take a look at the network overview of the Kubernetes cluster nodes. Horizontally, the network environment on the node is divided into different network namespaces, including the host network namespace and Pod network namespace. Vertically, each network namespace contains a complete network stack, from applications to protocol stacks to network devices.

In the layer of network equipment, we set up a virtual LAN inside the system through cNI0 virtual bridge. The Pod network is connected to this virtual LAN through veTH pairs. The CNI0 virtual LAN communicates with the outside world through host routing and network port eth0.

In the layer of network protocol stack, we can realize the reverse proxy of cluster nodes by programming netfilter filter framework.

Reverse proxy, in the final analysis, is to perform DNAT, that is, to modify the data packets destined for the IP addresses and ports of cluster services to the IP addresses and ports of specific container groups.

As we know from the diagram of NetFilter filter framework, NAT rules can be added to PREROUTING, OUTPUT, and POSTROUGING to change the source or destination addresses of packets.

Because what needs to be done here is DNAT, that is, changing the destination address, which must occur before ROUTING to ensure that packets can be properly routed, the rules for implementing reverse proxies need to be added to PREROUTING and OUTPUT.

Among them, PREOURTING rules are used to handle traffic from pods accessing services. After packets are sent from Pod network VEth to CNI0 and enter the host protocol stack, netFilter PREROUTING will be used for processing first. Therefore, DNAT will be performed at this location for packets sent to the service. After DNAT processing, the destination address of the packet becomes the address of another Pod, which is routed by the host and forwarded to eth0, where it is sent to the correct cluster node.

The DNAT rule added at OUTPUT is used to process the packets sent from the host network to the service. The principle is similar, that is, before routing, the destination address is modified to facilitate routing and forwarding.

Upgrade filter frame

In the Filter Framework section, we saw that NetFilter is a filter framework. Netfilter cuts five ports on the data tube to perform packet processing on these five ports. Although the fixed notch location and network packet handling classification have greatly optimized the filter framework, there is a key problem that we still have to modify the pipeline to accommodate the new functionality. In other words, the framework does not fully decouple the pipes from the filtering capabilities.

To decouple pipes from filtering, NetFilter uses the concept of tables. Table is the filtering center of NetFilter. Its core function is to classify filtering modes (table) and organize filtering rules (chain) for each filtering mode.

By decoupling the filtering function from the pipe, all processing of packets becomes configuration of tables. The five incisions in the pipe simply become entrances and exits for the flow, sending it to the filtration center and sending the treated flow down the pipe.

In the table above, NetFilter organizes rules into chains. There are default chains for each pipe cut, as well as custom chains we added ourselves. The default chain is the entry point to the data, and the default chain can perform complex functions by jumping to the custom chain. The benefits of allowing custom chains to be added here are obvious. To accomplish a complex filtering function, such as implementing a reverse proxy for Kubernetes cluster nodes, we can use custom chains to modularize our rules.

Reverse proxy of a service with a custom chain

In fact, the reverse proxy of cluster service implements the DNAT of data packets modularized by using custom chain. Kube-service is the entry chain of the whole reverse proxy, which corresponds to the total entry of all services; Kube-svc-xxxx chain is the entry chain of a specific SERVICE, kuBE-Service chain will jump to the kuBE-SVC-XXXX chain of the specific SERVICE according to the SERVICE IP; The kuBE-SEP-xxxx chain represents the address and port of a specific Pod, that is, endpoint. The specific service chain kuBE-SVC-XXXX will jump to the endpoint chain with a certain algorithm (generally random).

As mentioned above, because DNAT is required, that is, changing the destination address, this modification must occur before routing to ensure that the packet can be correctly processed by routing. So kube-service is called by the default chains PREROUTING and OUTPUT.

conclusion

Through this article, you should have a deeper understanding of the concept and implementation of Kubernetes cluster service. Basically, we need to grasp three key points:

  • Services are essentially load balancing;
  • Service load balancing is implemented using a Sidecar pattern similar to the service grid, rather than an LVS-type exclusive pattern.
  • Kube-proxy is essentially a cluster controller. In addition, we thought about the design of the filter framework and, on this basis, understood the principle of service load balancing using Iptables.

“Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on micro Service, Serverless, container, Service Mesh and other technical fields, focuses on cloudnative popular technology trends, large-scale implementation of cloudnative practice, and becomes the technical public account that most understands cloudnative developers.”