Hello everyone, let me introduce myself briefly first. My name is Li Hui, from Tencent Cloud. I prefer open source in my spare time, and now I am Apache Apisix PPMC. Today I will briefly introduce to you the K8S Ingress controller selection experience, today I talk about these contents need you to have a certain understanding of K8S, the following is my share.
Noun explanation
You need to be familiar with the following basic concepts to read this article:
- Cluster: refers to the collection of cloud resources required for container operation, including several cloud servers, load balancers and other cloud resources.
- Instance (POD) : An instance consisting of one or more related containers that share the same storage and network space.
- Workload (Node) : Kubernetes resource object that manages the creation, scheduling, and automation of POD copies throughout their life cycle.
- Service: A micro Service consisting of multiple instances (PODs) of the same configuration and rules that access these instances (PODs).
- INGRESS: INGRESS is a set of rules used to route external HTTP (S) traffic to a Service.
K8S access the status quo
In K8S, service and POD IP are mainly used for service access within the cluster, and are invisible to applications outside the cluster. How can we solve this problem? To enable external applications to access services in the K8S cluster, the usual solutions are NodePort and LoadBalancer.
In fact, these two solutions have their own disadvantages. The disadvantage of NodePort is that only one Service can be mounted on one port, and additional load balancing is required for higher availability. The disadvantage of LoadBalancer is that each service must have its own IP, whether internal or external IP. In more cases, in order to guarantee the capability of LoadBalancer, it generally needs to rely on the cloud service provider.
In the practice and deployment of K8S, in order to solve problems such as POD migration, Node POD port, dynamic domain name allocation, or dynamic update of POD background address, Ingress solution was produced.
Selection of the Ingress
Disadvantages of Nginx Ingress
Ingress is a very important external network traffic entrance in K8S, and the director of the cloud also talked about K8S default Nginx Ingress. This Ingress is the default Ingress recommended by K8s. To distinguish it from the later commercial version of Ingress provided by NGINX, I’ll just call it K8S Ingress. K8S Ingress, as the name suggests based on NGINX platform, NGINX is now the most popular NGINX HTTP Sever in the world, I believe everyone here is familiar with NGINX, which is an advantage. The second advantage is that NGINX Ingress requires very little configuration to access the K8S cluster, and there is a lot of documentation to guide you on how to use the Ingress. For most people who are new to K8s or startups, Nginx Ingress is really a good choice.
But when Nginx Ingress is used in some environments, there are a lot of problems. The first one, NGINX INGRESS, uses some OpenRESTY features, but the final configuration load still relies on the original NGINX CONFIG RELOAD. When the routing configuration is very large, NGINX reload will take a very long time, which can reach several seconds or even more than ten seconds. This reload will seriously affect the business, and even cause business interruption. This is the first problem.
The second problem is that the Nginx Ingress plugin is very difficult to develop. If you think the Nginx Ingress plugin is enough, you can still use it. However, if you want to use some customized plug-ins, such as IM authentication of Ali Cloud or KM authentication of Tencent Cloud, additional development is needed. Nginx Ingress is very painful to develop plug-ins, additional development is very troublesome, so Nginx Ingress plug-in ability and scalability is relatively poor.
Ingress selection principle
Since found that NGINX INGRESS has a lot of problems, that is not to consider the choice of open source better use of INGRESS, said on the market than K8S INGRESS with at least a dozen. It can be confusing to choose from so many Ingresses.
Ingress is ultimately based on HTTP gateways, and there are several types of HTTP gateways available. Gateways such as Nginx, Golang native and upstart Envoy. But each developer has a different stack of technologies. For example, I’m familiar with Nginx, but some people are more familiar with HAProxy, or some are more familiar with the new gateway Envoy. Because the underlying gateway that everyone is familiar with is different, the appropriate Ingress will be different.
So the question is, how do we choose a better Ingress? Or narrow it down a bit, for developers familiar with Nginx or OpenResty, which Ingress should they choose?
Here are some of my experiences with Ingress controller selection.
The basic characteristics of
These are the basic functions, I think, that you have to have. If you don’t have these functions, you can just pass.
- Must be open source, not open source cannot be used
- POD changes very frequently in K8S, and service discovery is very important
- Now that HTTPS is commonplace, the capabilities of TLS or SSL are also important, such as the ability to manage certificates
- Support for common protocols such as WebSocket and, in some cases, HTTP2, QUIC, and so on May be required
Based on the software
As mentioned earlier, everyone is not the same technology platform, so it is important to choose the HTTP gateway you are more familiar with. Like Nginx, HAProxy, Envoy or Golang native gateway. Because you’re familiar with how it works, you’ll be able to land faster in use.
In a production environment, high performance is an important point, but even more important is high availability. This means that the gateway you choose, its availability, stability must be very strong, only in this way, the service can be stable.
The functional requirements
Regardless of the above two points, is the company’s business to the gateway special requirements. If you choose an open source product, it should work right out of the box. For example, if you need GRPC protocol conversion capabilities, you will want to choose a gateway that directly supports this capability. You certainly don’t want to choose gateways that still need to be developed. Here is a brief list of the points that influence your choice:
- Whether HTTP2, HTTP3 are supported on the protocol;
- In terms of load balancing algorithm, the most basic WRR, or consistent hashing is the load balancing algorithm sufficient, or needs a more complex load balancing algorithm like EWMA.
Identification of authority flow, simple authentication is enough, or the need for a more advanced authentication way, or to integrate, or very convenient to develop like Ali cloud, Tencent cloud IM authentication. We mentioned earlier that K8S Ingress has some major drawbacks, such as problems with Nginx Reload and weak extension ability of plugins. In fact, the ability of its rear end node to adjust the weight is not very good.
Select Apisix as the Ingress Controller
Apisix is recommended here because it has very powerful routing capabilities and very flexible plug-in capabilities. Although it will have far fewer features than Kong, Apisix’s excellent routing capabilities, flexible plug-in capabilities, and high performance make up for some of the shortcomings of the Ingress selection. If you are a developer based on Nginx or OpenResty who is not happy with the current Ingress, I recommend Apisix as Ingress.
How does Apisix work as Ingress? The first distinction we need to make is that INGRESS is either a definition of the K8S name or a rule definition, and the INGRESS Controller is a component that synchronizes the state of the K8S cluster to the gateway. But Apisix itself is just an API gateway. How can Apisix be implemented as Ingress Controller? Let’s start with a brief look at how to implement Ingress.
Implementing Ingress is essentially two things. First, you need to synchronize the configuration in the K8S cluster, or the state in the K8S cluster, to the Apisix cluster. Second, some of the concepts in Apisix, such as services and upstreams, need to be defined as CRDs in K8s. If the second part is implemented, Apisix will be generated quickly from the configuration of K8S Ingress, and the Apisix-related configuration will be generated from the Apisix Ingress Controller. We are currently working to quickly implement Apisix as an Ingress that supports K8S. We created an open source project called the Ingress Controller.
The structure of the project looks something like this. On the left is the cluster of K8s, where you can import some YAML files to make configuration changes to K8s. On the right is the Apisix cluster, with its control surface and data surface. In this case, Apisix Ingress acts as a connector between the two K8S clusters and Apisix clusters. It mainly listens for the changes of the nodes in the K8S cluster to synchronize the state in the cluster to the Apisix cluster. In addition, K8S advocates high availability for all components, so Apisix Ingress was designed with high availability in mind. We implement the high availability of Apisix Ingress Controller through the mode of dual or multi-node.
Various Ingress horizontal contrast
Compared to the popular Ingress controller in the market, we will simply compare Apisix Ingress to see what the advantages and disadvantages are. Above is a table made by a foreign developer for the K8S Ingress selection. Based on the original table, I combined my own understanding and added the function of Apisix Ingress. You can see Apisix on the far left, K8s Ingress and Kong Ingress on the back, and Traefik on the back, which is based on Golang Ingress. HAProxy is fairly common and used to be a popular load balancer. Istio and Ambassador are two very popular ingresses abroad.
We can talk briefly about these Ingress. First of all, Apisix Ingress. The advantages of Apisix Ingress, as mentioned earlier, are that it has very strong routing capabilities, very strong performance, and very flexible plug-in extension capabilities. Apisix has only been open for a few months and already has a lot of functionality. However, its disadvantages are also very obvious, Apisix has a lot of features, but there is a lack of practical cases, no articles to teach people how to use these features.
The second is K8s Ingress, whom I joked about a lot earlier, and the same Nginx Ingress that K8s recommended. The main advantages are simplicity and easy access, as mentioned above. However, the disadvantage is very obvious, Nginx Reload does not solve at all, there are a lot of plug-ins, but the ability to extend plug-ins is very weak.
The main advantage of Nginx Ingress is that it has full support for TCP and UDP, but other features, such as authentication, or traffic scheduling, are missing.
Kong himself is an API gateway, and he was the first to introduce an API gateway into K8S when Ingress. In addition, for the edge gateway, we still have a lot of needs, such as the ability to identify, limit the flow, gray deployment and so on. Kong has done these things very well. In addition, Kong Ingress also has a very big advantage. It provides some API and service definitions to abstract into CRD of K8S, so it can be easily configured through K8S Ingress to synchronize to Kong’s cluster. While Kong has many advantages, it also has a very big disadvantage, which is that it is extremely difficult to deploy and its high availability pales in comparison to Apisix.
Traefik is based on Golang’s Ingress, which is itself a microservice gateway, but is used more often in Ingress scenarios. Its main platform is based on Golang, and it supports a lot of protocols, but overall there are no drawbacks. Golang is also recommended if you are familiar with it.
HAProxy is a well-known load balancer. Its main advantage is a very strong load balancing ability, other aspects are not dominant.
Istio Ingress and Ambassador Ingress are both based on the recently popular envoy. To be honest, I don’t think there’s any downside to these two Ingress. Probably the only downside is that they’re based on Envoy, which you’re not familiar with, and there’s a lot of hurdles to get started on.
Tencent Cloud CLB Ingress
In front of the open source mainly said some INGRESS, now let’s talk about INGRESS in Tencent cloud landing situation. As mentioned earlier, like K8S Apisix, or Ingress, they are open source. K8S and Ingress, they all correspond to each other. To talk about ingress in Tencent cloud, naturally to understand what is K8S in Tencent cloud. So I first briefly introduce TKE of Tencent Cloud, which is the K8S platform of Tencent Cloud, and then the landing situation of Tencent Cloud Ingress, which is integrated with CLB to complete the function of Ingress.
The figure above shows the overall overview of the current TKE platform of Tencent Cloud, which is mainly composed of user access layer, core functions, and integrated products. The integrated products integrate the IaaS layer and PaaS layer.
The full name of TKE is Tencent Kubernetes Engine. TKE is a highly scalable and high-performance container managed service. The core is that TKE solves the problem of multi-tenant. K8S itself is single-tenant, so how can it become a multi-tenant scene on Tencent Cloud? It took us a long time to transform it. Secondly, some other problems are solved in the K8S node. We adopted the solution of VPC of Tencent Cloud to solve the communication problem between Service and POD. In addition, the internal network integrated the ability of VPC, external network integrated CLB load balancing ability, hard disk storage integrated CBS storage ability and so on, finally realized the public cloud version of Tencent cloud K8S. The current TKE cloud in Tencent almost 2 million status bar.
What does CLB look like? The above diagram shows the overall architecture of Tencent Cloud CLB Ingress. Because I wanted to talk about our Ingress cluster from the perspective of high performance and high availability, I simplified the K8S section, leaving only the user operation, API Server and controller.
TKE needs to integrate INGRESS. It only needs to abstract the original concept of load balancing into some source words of CRD in K8S, and then mapping can be carried out. For example, when creating Ingress or scheduling nodes, we can call CLB’s interface to update the state and complete the entire Ingress link.
Next, we will talk about the high performance and high availability of Tencent Cloud CLB. Because the background services are most concerned about these two points.
A high performance
High performance gateway mainly says two parts, one is the data surface, the other is the control surface. Let’s talk about the data plane first. On the data plane side, the seven-tier CLB we made is mainly based on NGINX. To ensure high performance, the first step is to tune Nginx. The second step is load balancing. The most important thing in load balancing is the power of HTTPS. HTTPS is CPU intensive. In the open source world, there is a lot of room to optimize HTTPS. For example, open source Nginx, I can’t remember whether it’s eight cores or four cores, can easily reach 100,000 KBS. But once HTTPS is used, it may not even reach 10,000 KBS. So HTTPS has a lot of room for optimization. We spent a lot of time optimizing this area when we were doing seven layers of high performance. How do you optimize it? Baidu search Nginx common optimization, the results, basically can be optimized, of course, we also do some other details of the optimization.
The second part is the optimization on the protocol layer, mainly on the optimization of HTTPS protocol itself, which includes a lot, including encryption protocol, Open SSL library, etc., we have done some optimization. There are also enhancements to the HTTPS protocol, which enables TLS encryption by default, so there is no HTTPS optimization.
Second, we have done a lot of optimization of the control surface. As I’ve mentioned many times before, you can’t avoid the Nginx Reload problem if you use Nginx. When there are only a few routes, it may not be a problem. However, when you have thousands or hundreds of thousands of route configurations, it takes at least a few seconds to use Nginx Reload, which is very serious and unacceptable for business interruptions. So what should I do? As a cloud vendor, our customers are not only in the upsteam section, but also in the background nodes, which change very fast. Moreover, customers are in a shared cluster, and a lot of customers may be operating rules, such as operating Nginx Server. So we did the dynamic Server optimization. After optimizing the upstream and dynamic Server, 99.9% of the rules can be configured using the NGINX dynamic Server and NGINX dynamic Server upstream. Instead of going through Nginx Reload again, this is the optimization of our control surface.
High availability
High availability is also divided into two aspects, one is the control cluster, that is, the high availability of the control surface, the other is the high availability of the data surface. Let’s start with high availability on the data surface.
High availability of data surfaces, mainly links as shown above. In fact, they all have special heartbeat detection, circuit breaker mechanism and timeout capability to ensure high availability between four and seven gateways, and between seven gateways and back-end nodes. For example, if I find something wrong with a node, I will eliminate that node. Once the node is restored to its state, the node will be added back again. This is the first aspect. The height of the data surface may be detected mainly through the heartbeat.
The second aspect is the disaster tolerance across the available area of the gateway. We also made a 7-layer gateway and 4-layer gateway to cross the available area disaster tolerance. For example, when the gateway of a computer room is completely down, we can still provide a high availability and high performance service. For the control surface, this is mainly through the master agent clustering mode to ensure high availability.
Tencent Cloud Future’s Ingress — Apisix
Having said that, the CLB Ingress architecture does look pretty good, with high availability and excellent performance. What are the downsides? In fact, it also has some disadvantages.
First, although I mentioned earlier that 99.9% of configuration changes can be resolved using dynamic upstreams and dynamic servers without having to use Nginx Reload. However, it does not address some of the issues of configuration changes per se, especially in the case of some burst or more back-end nodes. Because in the past, there were only dozens of back-end nodes at most, but when Dockerized, back-end nodes could easily reach thousands or even tens of thousands. It is very easy to trigger dynamic upstream field values and end up using Nginx Reload instead of dynamic upsteam, which can cause serious performance problems.
Second, all the logic and additional functions of CLB Ingress are based on NGINX. For example, ACL current limiting is developed through NGINX module. In this case, first of all, the development threshold will be very high, and then the development efficiency is relatively low.
Our requirements for load balancing are not high. For example, the main requirements of load balancing, one is the performance of the seven-layer gateway must be better, the second is the support ability of HTTPS protocol must be better, the third is to support more protocols. However, the requirement of K8S Ingress is not satisfied, because there are many nodes in K8S, and I prefer to have a good grayscale ability. The current CLB is not able to meet the requirements of customized grayscale publishing. Therefore, the CLB load balancing capability is integrated through TKE, as an INGRESS, only to a usable level, but not fully suited to the needs of the K8S platform.
These are the main problems with CLB Ingress at the moment. The grayscale ability is weak, and it is easy to trigger NGINX RELOAD, thus affecting the business. In addition to these two points, there is also very poor isolation. CLB is deployed in this way, and many customers still share a group of Ingress. Customers and customers, in fact, will affect each other. The design concept of K8S is to hope that customers can monopolize INGRESS and do not hope that the INGRESS between customers and customers will affect each other. We wanted to solve these problems, and we came across Apisix.
The advantage of Apisix is its high performance, which I won’t go into here. In addition, Apisix plug-ins are flexible enough to allow plug-ins to be inserted in more places. Also, Apisix is designed from a cloud-native perspective, which means that Apisix is well suited for deployment in a container, unlike CLBs of the past that were fine on a physical machine, but on a container, the control surface architecture is very unsuited. With Apisix’s control architecture, it is very easy to choose between having customers share a set of Ingress or having each customer have their own exclusive Ingress. Apisix has done very well in these three aspects, and eventually we plan to launch Apisix Ingress to replace Ingress in the TKE platform.
In conclusion, although it is mainly about Ingress selection. In fact, the front has talked about the positioning of INGRESS, how to choose INGRESS, and what problems should be considered in the selection. Then, Apisix Ingress and some current open source Ingress are compared horizontally, so that you can understand the advantages and disadvantages of each Ingress, so that you can quickly choose the appropriate Ingress for yourself when selecting subsequent Ingress. Finally, it briefly introduces CLB Ingress of Tencent Cloud, as well as its current problems and the next step plan.
Recommended reading
3 minutes gives you an in-depth understanding of cookies, sessions, and tokens
Dynamic Service Routing Scheme Based on OpenRESTY