Kubernetes cluster service exposure practices under the challenges of zTO express's critical business and complex architecture

This article is based on the content shared by Meetup lecturer Wang Wenhu in Shanghai.

Hello, KubeSphere community. I am Wang Wenhu, r&d engineer of ZTO Express Container cloud platform. I am mainly responsible for the development of Zto Express container cloud platform, application container promotion, container platform operation and maintenance, etc. I would like to thank the KubeSphere community for inviting me to share with you the Kubernetes cluster service exposure practices under the critical business and complex architectural challenges of ZTO Express.

ZKE Container management platform

Firstly, we will introduce ZKE, the container cloud management platform of ZTO. ZKE platform is developed based on KubeSphere, and now manages more than ten zTO internal clusters, including development, test, pre-release, production and other environments. All users manage container applications through ZKE platform.

Kubernetes cluster service exposure solution

According to zTO’s actual business needs and some exploration, several schemes exposed by Zto kubernetes cluster service are sorted out.

Access between Dubbo services

Most of zTO’s applications are developed based on Java language and use Dubbo microservice framework. At the beginning of the container, we consider the virtual machine and container coexist scene may last for a long time, so in the planning of Kubernetes cluster, through the container network and physical network through the way, to solve the Dubbo service in the container and virtual machine mixed scene call each other.

How to get through Kubernetes container network and physical network?

In our internal environment, The Kubernetes cluster network components use Calico BGP mode, and the physical network of the data center is also enabled with BGP routing protocol. By enabling BGP RR (Route Reflector) on the physical network, This avoids excessive BGP route entries caused by a large cluster. BGP RR and Kubernetes cluster nodes establish EBGP neighbors to learn routes from each other.

Access using a generic domain name

During the initial roll-out of development and testing of containerization, one of the biggest issues we encountered was how users would access the application once it was published to the container environment.

After the user creates the Ingress on ZKE platform, the domain name is not accessible, and the operation and maintenance must point the domain name to the cluster Ingress Controller. Moreover, the company needs to go through the OA process to apply for the domain name, so the progress of our container environment is very slow in the initial promotion stage.

We collected feedback from some users, combined with our own thinking, and finally explored a more efficient way to use Ingress in the development/test environment:

By assigning a three-level generic domain name to each cluster and configuring the Ingress Controller on the company’S DNS to point the corresponding generic domain name to the cluster, users can directly create the Ingress on the ZKE interface when creating the service domain name, and the domain name will take effect immediately. This saves a lot of time on containers in test and development environments. Because of the company’s security requirements, Ingress only provides HTTP protocol exposure, but it also greatly speeds up the adoption of containerization for test development.

Custom domain name access

The generic domain name can help us solve the domain name requirements of most development/test environments. However, in production environment, project domain name requires HTTPS protocol, and project domain name needs to be customized, users need to create Ingres and go through the OA process for approval.

Service exposure program trampling practice

The following content is our service exposure and network related pit in the process of using Kubernetes, for your reference to avoid pit.

Ingress Nginx Controller service stomp practice

The following is the startup flow chart I drew according to the Ingress Nginx Controller code. (For the startup process and a more detailed analysis of this problem, you can see the link: mp.weixin.qq.com/s/Pw9-_cPXx…

The Ingress Nginx Controller startup process is similar to that of a generic K8S Controller. But the actual business logic to synchronize K8S Ingress and its associated resources to Nginx configuration files starts with the n.sincingress function, which I’ll leave below.

This problem is a pit we stepped on during the use of the test environment. The cluster Ingress Controller failure was triggered when the user created the Ingress through the ZKE management platform. We found the conditions of fault recurrence during fault analysis, as shown in the figure below:

N.sincingress function is the entry point of Ingress Nginx Controller business logic. The call chain between this entry point and the final call function of this fault is also provided in PPT. The final problem spot falls on the extractTLSSecretName function.

Look at the code logic and see why:

The createServers function traversesing.Spec.Ruleswhening.Spec.TLSWhen the field is not emptyrule.HostIngress is passed as an argumentextractTLSSecretNameFunctions;
extractTLSSecretNameThe function first iteratesing.Spec.TLS, checktls.HostsIf host is included in thetls.SecretName;
whentls.HostsIf host is not included in thetls.SecretNameThe corresponding Secret resource is converted to*ingress.SSLCertType and verifies that host matches the SAN or CN property in the certificate. However, if the secret type is not TLS,cert.CertificateIf the value is nil, it causescert.Certificate.VerifyHostnameThe nginx controller will hang;

The restoration measures are divided into two parts:

Platform user operation to avoid this situation: mainly through the creation of ingress certificate selection by users filter out non-TLS type secret, to ensure that the vast majority of users through the platform will not trigger such problems;
Fixed code logic fixes this problem: added judgmentcert.CertificateNil logic.

Calico disables the natOutgoing configuration

The background of this configuration is that in the production container process of Dubbo application, Zookeeper in the production environment has a lower limit on the number of PODS per IP address than the number of PODS on the node. As a result, Dubbo application in the container on the node is often denied to connect to Zookeeper. Since the container network is connected to the physical network, calico sets the natOutgoing parameter to false, and the problem is solved.

This article is published by OpenWrite!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Kubernetes cluster service exposure practices under the challenges of zTO express’s critical business and complex architecture

ZKE Container management platform