The author | Zheng Chao
Abstract: OpenYurt is an open source cloud edge collaborative integration architecture of Alibaba. Compared with similar open source solutions, OpenYurt has the ability to achieve full scene coverage of edge computing. In a previous article, we showed how OpenYurt implements edge autonomy in weak and disconnection scenarios. In this fourth article in the OpenYurt series, we will focus on cloud-side communication, another core capability of OpenYurt, and the related component Yurttunnel.
Usage scenarios
During application deployment, operation and maintenance (O&M), users often need to obtain application logs or directly log in to the running environment of applications for debugging. In the Kubernetes environment, we usually use kubectl log, Kubectl exec and other instructions to achieve these requirements. As shown in the figure below, kubelet will act as the server on the Kubectl request link, which is responsible for processing the requests forwarded by Kube-Apiserver (KAS). This requires a network path between KAS and Kubelet. Allows KAS to proactively access Kubelet.
Figure 1: Kubectl execution flow
However, in edge computing scenarios, edge nodes are often located in local private networks, which ensures the security of edge nodes, but also causes that KAS located in cloud management nodes cannot directly access Kubelet located in edge nodes. Therefore, in order to support the operation and maintenance of edge applications through cloud nodes, we must establish a reverse operation and maintenance channel between cloud and edge.
A reverse channel
Yurttunnel is an important component of OpenYurt. It is used to solve the cloud side communication problem. Reverse channel is a common way to solve cross-network communication, and the essence of Yurttunnel is a reverse channel. The subordinate nodes of an edge cluster often reside in different network regions. However, nodes in the same region can communicate with each other. Therefore, when setting the reverse channel, Ensure that each region has an Agent connected to the Proxy Server (as shown in the following figure). It includes the following steps:
- The proxy Server is deployed on the network where the control component resides.
- The Proxy Server provides an IP address accessible to the public network.
- Deploy an agent in each region and establish a long-term connection with the server using the server’s public IP address.
- Access requests from management components to edge nodes are forwarded to proxy Server.
- The Proxy Server then sends the request to the destination node through the corresponding long connection.
Figure 2
In Yurttunnel, we choose to use the upstream project Apiserver-network-Proxy (ANP) to realize the communication between server and agent. ANP is based on kubernetes 1.16 Alpha’s new EgressSelector feature, which is intended to enable kubernetes cluster components to communicate across intranets (for example, master in a managed VPC, Other components such as Kubelet are located in user VPCS.
Readers may wonder why OpenYurt is a new component in the open source version, since OpenYurt is open source based on ACK@Edge and in production, ACK@Edge’s cloud-side operation channel uses a homegrown component, Tunnellib. Once again, the core design concept of OpenYurt is “Extend Upstream Kubernetes to Edge”.
True, Tunnellib has been tested in a complex online environment with stable component performance, but we want to maintain the largest technical diocese with upstream, bringing the OpenYurt user experience closer to native Kubernetes; At the same time, during the development and operation of ACK@Edge, we found that many of the requirements of the edge cluster also exist in other scenarios (for example, most cloud vendors also need to enable nodes to communicate across networks), and the transmission efficiency of the operation channel can be further optimized (chapter 4.5 details the optimization process). Therefore, adhering to the open source spirit of open sharing, equality and universal benefits, we hope to share the valuable experience accumulated in the process of development and operation with more developers in the upstream community.
ANP is not out of the box
However, THE ANP project is still in its infancy, its functions are not yet perfect, and many problems remain to be solved. The main issues we found from our practice include:
-
How to forward requests from cloud nodes – A prerequisite for the reverse channel to work properly is that the requests from the control node to the edge node must first go through the Proxy Server. For Kubernetes 1.16 +, KAS can first send a request to a node to a specified proxy server via EgressSelector. However, for pre-1.16, KAS and other governance components (Prometheus and Metrics Server) can only access nodes directly and cannot bypass proxy Server. It is expected that some users will continue to use pre-1.16 versions in the short term, and there are no plans for management components such as Prometheus and Metrics Server to support EgressSelector in the short term. Therefore, the first problem we have to solve is how to forward the request sent by the management component to the node to the Proxy Server.
-
How to ensure that the server copy covers all regions – In a production environment, an edge cluster often contains tens of thousands of nodes and serves hundreds of users simultaneously. If a cluster has only one Proxy Server, if the Proxy Server fails, All users will not be able to operate on the POD on the edge node. Therefore, we must deploy multiple proxy Server replicas simultaneously to keep the cluster highly available. Meanwhile, the workload of proxy Server will increase with the increase of access traffic, and the access delay of users will inevitably increase. Therefore, when deploying proxy Server, we also need to consider how to scale proxy Server horizontally to cope with high concurrency scenarios. A classic solution to a single point of failure and high concurrency scenario is to deploy multiple copies of Proxy Server and use load balancing for traffic distribution. However, in the OpenYurt scenario, it is not controllable which server copy LoadBalancer (LB) will forward to for any request from KAS. Therefore, the second problem to be solved is: How to ensure that each server copy can connect to all agents.
-
How to Forward a request to the Right Agent: After receiving a request, the Proxy server forwards the request to the Agent in the corresponding Network Region based on the destination IP address of the request. However, the current implementation of ANP assumes that all nodes are located in a network space, and the server randomly selects an agent to forward the request. Therefore, the third problem we need to solve is how to correctly forward the request to the specified agent.
-
How to remove the dependency of components on node certificates – At run time, we need to provide a set of TLS certificates for the server to achieve secure communication between the server and KAS and between the server and agent. In addition, prepare a TLS Client certificate for the Agent to establish the gRPC channel between the Agent and the server. The current implementation of ANP requires that the server and KAS must be deployed on the same node, and the node Volume shares the KAS TLS certificate at startup. Similarly, the Agent also needs to mount the Volume shared Kubelet TLS certificate at startup. This implicitly reduces deployment flexibility and creates a strong dependency on node certificates for the build, and in some cases, users may want to deploy the Server on a node other than KAS. Therefore, another concern is how to wean components off node certificates.
-
How to narrow Tunnel bandwidth – A core design idea of ANP is to use gRPC to encapsulate all KAS external HTTP requests. GRPC is chosen here because of its stream support and clear interface specification. In addition, the strongly typed client and server side can effectively reduce runtime errors and improve system stability. However, we also found that there was an additional overhead associated with using ANP to increase bandwidth compared to using TCP directly. At the product level, Tunnel traffic goes through the public network. The increase of bandwidth also increases user costs. So, a very important question is, can we improve the stability of the system while also reducing the bandwidth?
Yurttunnel design parsing
1. Formulate DNAT rules to forward requests from cloud nodes
As mentioned earlier, ANP is developed based on a new upstream feature, EgressSelector, which allows users to request that KAS forward egress requests to a specified proxy Server by passing the Egress Configuration when they start KAS. But since we need to take into account the old and new versions of The Kubernetes cluster, and considering that other management components (Prometheus and Metric Server) do not support the EgressSelector feature, We need to ensure that KAS egress requests can be forwarded to the Proxy Server even if EgressSelector cannot be used. To this end, we deployed a copy of Yurttunnel Server on each cloud management node and embedded a new component, Iptabel Manager, in the Server. The Iptable Manager adds DNAT rules to the OUTPUT chain of the Iptable of the host to forward the requests of the control component to the Yurttunnel Server.
In addition, when EgressSelector is enabled, KAS follows a uniform format for external requests, so we add a component, ANP Interceptor. The ANP Interceptor is responsible for intercepting HTTP requests from the master and encapsulating them in EgressSelector format. See Figure 3 for the specific process of Yurttunnel request forwarding.
Figure 3: Yurttunnel request forwarding process
2. Dynamically obtain the number of Server copies
In the previous section, we mentioned that we will manage the Yurttunnel Server in a load-balancing manner, with all ingress requests being distributed to a server copy via LB. Since we cannot predict which server copy the LB will pick, we must ensure that each server copy is connected to all agents. Here, we will use the built-in functions of ANP to achieve this requirement, and the specific process is as follows:
-
When starting yurttunnel Server, we pass in the number of copies (serverCount) to each server copy and specify a server ID for each copy;
-
After the AGENT connects to the LB, the LB randomly selects a server copy and sets up a long-term connection with the Agent.
-
At the same time, the server will return an ACK package containing serverCount and serverID to the Agent through this channel.
-
By parsing ACK package, agent can obtain the number of server copies and record the connected serverID locally.
-
If the agent detects that the number of locally connected server copies is less than serverCount, the agent sends a connection request to the LB again until the number of locally recorded Serverids is the same as the number of Server Count.
This mechanism helps us achieve full network segment coverage of server replicas. However, it also has a disadvantage that cannot be ignored. Because the Agent cannot choose which server copy to establish a connection with, the Agent must repeatedly access the LB to connect to all server copies. During this process, because the server has not established connections with all agents, the requests sent by KAS may not be forwarded to the corresponding nodes. A potential solution is to create an independent LB for each server copy, which is responsible for the connection with the Agent, and record the LB information of all server copies on the Agent. This solution helps agents quickly establish connections to all server replicas. The implementation details of this solution are still being discussed with developers in the upstream community.
3. Add a proxy policy for ANP
In the network model of OpenYurt, edge nodes are distributed in different network regions, and randomly selected agents may not be able to forward requests to nodes in other regions. So we had to modify the logic of ANP Server’s underlying proxy forwarding. However, based on our long experience, we believe that proxy Server supports different proxy policies, such as forwarding requests to designated data centers, regions, or designated hosts, which is a common requirement. After discussion with developers in the ANP community, we decided to reconstruct ANP’s interface for managing agent connections, allowing users to implement new agent policies according to their needs, and planned to integrate this feature into the upstream code base. Currently, the refactoring work is still in progress. In the first open source version of Yurttunnel, we temporarily use the following configuration:
-
Deploy an agent on each edge node.
-
When the Agent registers with the server, the IP address of the node where the Agent resides serves as the agentID.
-
When forwarding a request, the server forwards the request to the agent by matching the request target IP address and agentID.
We plan to implement partition deployment of agent and requested partition forwarding with the newly added ANP proxy forwarding policy after OpenYurt releases Yurt Unit (Edge Node Partition Management and Control).
4. Dynamically apply for a security certificate
In order to remove the dependency of yurttunnel component on node certificates, we add cert Manager to Yurttunnel. Cert Manager will run after server and agent, Submit certificate Signning Request (CSR) to KAS. The server will use the obtained certificate to ensure secure communication between KAS and agent, and agent will use the obtained certificate to ensure secure gRPC channel between server and server. Since the connection between agent and Kubelet is through TCP protocol, there is no need to prepare a certificate for the connection between agent and Kubelet.
5. Compress Tunnel bandwidth to save costs
In 3.5, we mentioned that using gRPC to encapsulate a Tunnel can improve transmission stability, but also increase public network traffic. Does that mean we have to choose between stability and performance? Through the analysis of different user scenarios, we find that in most cases, users use operation and maintenance channels to obtain container logs (i.e. Kubectl log), while traditional log files have much the same text information, so we infer that compression algorithms such as GZIP can effectively reduce bandwidth. To verify this assumption, gzip Compressor is added to the UNDERLYING gRPC library of ANP, and the data transfer amount is compared with that of the native TCP connection scenario.
The first experimental scenario we considered was to fetch logs from the same KubeProxy container via TCP connection and ANP respectively. We intercepted the total bidirectional package and bytes on the Tunnel during this process.
Table 1: Native TCP V.S. ANP (kubectl logs kube-proxy)
As shown in Table 1, by using ANP, the total amount of data transferred was reduced by 29.93%.
After a long run, the log text of the container can often reach tens of megabytes, in order to simulate the scenario of getting a large text log. We created an Ubuntu container containing 10.5m Systemd logs (i.e. Journalctl), which we also transmitted using native TCP connections and ANP, and measured the total amount of data on the Tunnel.
Table 2: Native TCP V.S. ANP (Large log file)
As shown in Table 2, in the case of large log text, by using ANP, the total transmitted data volume decreased by 40.85%.
Therefore, compared with the native TCP connection, ANP not only provides higher transmission stability, but also greatly reduces the traffic on the public network. Considering the scale of edge clusters with tens of thousands of nodes, the new solution can help users save a lot of costs in terms of public network traffic.
Yurttunnel System architecture
Figure 4: Yurttunnel system architecture
To sum up, Yurttunnel mainly contains the following components:
-
Yurttunnel Server – Forwards requests sent to nodes by apiserver, Prometheus, and Metrics Server to corresponding Agents. Specifically, the following components are included:
- ANP Proxy Server – Encapsulates the ANP gRPC Server, manages long-term connections with Yurttunnel Agent, and forwards requests.
- Iptable Manager – Modifies DNAT rules of managed nodes to ensure that requests of managed components can be forwarded to Yurttunnel Server.
- Cert Manager – Generates TLS certificates for Yurttunnel Server.
- Request Interceptor – encapsulates KAS HTTP requests to nodes into a gRPC package that complies with ANP rules.
-
**Yurttunnel Agent **- Actively establishes a connection with the Yurttunnel Server and forwards requests from the Yurttunnel Server to Kubelet. Specifically, there are two sub-components:
- ANP Proxy Agent – Encapsulates ANP gRPC Agent. Compared with upstream, gzip Compressor is added to compress data.
- Cert Manager – Generates TLS certificates for Yurttunnel Agent.
-
Yurttunnel Server Service – Usually an SLB, which distributes requests from management components to appropriate copies of Yurttunnel Server to ensure high availability and load balancing of Yurttunnel.
Summary and Prospect
As an important open source component of OpenYurt, Yurttunnel opens up the cloud side channel of OpenYurt cluster and provides a unified entrance for container operation and maintenance of edge cluster. By adapting the upstream solution, Yurttunnel not only provides greater transmission stability, but also significantly reduces the amount of data transferred.
OpenYurt was officially open source on May 29, 2020. With the help of the community and developers, OpenYurt grew rapidly and became a CNCF sandbox-level edge computing cloud native project only three months later. In the future, OpenYurt will extend the core design philosophy of “Extending your upstream Kubernetes to Edge”, and expand the spirit of open source sharing while maintaining the greatest common technology. Work with developers to advance the Kubernetes community.
OpenYurt has been shortlisted for “top 10 New Open Source Projects of the Year 2020”. Please click the link: www.infoq.cn/talk/sQ7eKf… , select “25” to support OpenYurt!
“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”