This article is compiled by Li Yu, a lecturer of Meetup in Shanghai, according to the content he shared

Hi, everybody. It’s great to be here for Meetup this afternoon. My name is Li Yu. I am currently a researcher of KubeSphere, mainly responsible for multi-cluster work. Today, I will share the multi-cluster management and application deployment of Kubernetes under hybrid cloud. Prior to starting V3.0, KubeSphere conducted a community user survey and found that the most popular support was support for multiple cluster management and application deployment across the cloud, so KubeSphere 3.0 focuses on supporting multiple cluster management.

Kubernetes architecture in a single cluster

Kubernetes is internally divided into two roles: Master and Worker. The Controller Manager is responsible for starting multiple Controllers and coordinating the transition from spec to status of declarative apis. Scheduler is responsible for scheduling pods and Etcd is responsible for storing cluster data. Worker, as the working node, is mainly responsible for the startup of Pod.

There are many scenarios in a single cluster that cannot meet enterprise requirements, including the following.

  1. Physical isolation. Although Kubernetes provides NS level isolation, you can set the CPU memory for each Namespace and even configure Network connectivity for different namespaces using Network Policy. Enterprises still need a more complete physical isolation environment to avoid business interaction.

  2. A hybrid cloud. In hybrid cloud scenarios, enterprises want to choose multiple public cloud vendors and private cloud solutions to avoid being limited by a single cloud vendor and reduce costs.

  3. Application remote live more. Multiple service replicas are deployed to different region clusters to avoid application unavailability caused by power failure of a single region and avoid placing all eggs in one basket.

  4. Development/test/production environment. To differentiate development test production environments, deploy these environments to different clusters.

  5. Scalability. Improve the scalability of clusters and break through the upper limit of nodes in a single cluster.

In fact, the simplest way is to use multiple Kubeconfig files to manage different clusters, front-end API can be deployed multiple times at the same time, including some other existing products do the same. However, KubeSphere still wanted to manage multiple clusters in a more Cloud Native way, so KubeSphere first investigated some existing solutions.

Generally speaking, it can be divided into two directions: The first is resource distribution biased towards the control layer, For example, Federation V1 and Federation V2 of Kubernetes community, Argo CD/Flux CD (realize application distribution in pipeline) and the second one is committed to achieve Pod network reachable between multiple clusters. For example, Cilium Mesh, Istio Multi-Cluster, Linkerd Service Mirroring, because these projects are bound to specific CNI and Service governance components, So next I will look at Federation V1 and Federation V2 projects in detail.

Federation v1

Above is the Federation V1 architecture diagram. You can see the additional API Server (based on Kube-Apiserver) and Controller Manager (similar to Kube-Controller-Manager). The following is the managed cluster. The resource distribution of multiple clusters needs to be created in the upper cluster and eventually distributed to the following clusters.

Here is an example of creating Replicaset in Federation V1. Different from regular Replicaset, there are annotations, which mainly store logic for distributing resources. We can also see some disadvantages of V1.

  1. The introduction of a separately developed API Server introduces additional maintenance costs.
  2. In Kubernetes an API is defined by Group/Version/Kind, but in Federation V1 the K8s native API and GVK are fixed, resulting in poor compatibility between different versions of the cluster API.
  3. RBAC was not designed to provide permission control across clusters
  4. The biggest criticism is that Annotation based resource distribution makes the entire API bloated and unelegant.

Federation v2

Due to these shortcomings of V1, Kubernetes community gradually abandoned the design of V1, learned some lessons from V1, launched v2, namely Kubefed project. The biggest feature of Kubefed is that the CRD and Controller based approach replaces the Annotation based resource distribution scheme of V1 without invading the native K8s API or introducing additional API Server.

Above is the architecture diagram of V2. It can be seen that a CRD resource is mainly composed of Template, Override and Placement. Combined with Type Configuration, it can support multiple versions of APIS, greatly improving version compatibility between clusters. It also supports Federation for all resources, including CRD itself. At the same time, Kubefed was designed with multi-cluster service discovery and scheduling in mind.

Here is an example of a federated resource, Deployment in Kubefed corresponds to FederatedDeployment, where the template in the spec is the original Deployment resource, Placement indicates which clusters the federated resources need to be deployed to, and Override can configure fields for different clusters, such as the number of copies of each cluster in the tag of a Deployment mirror.

Of course, Kubefed is not a silver bullet and has its limitations. As you can see, the API definition is complex and error-prone, and you can only join and unbind a cluster using kubefedctl, without providing a separate SDK. Moreover, it requires that the network from the control layer cluster to the management cluster must be reachable, the API needs to be reformed from single cluster to multiple cluster, and the old version does not support state collection of federal resources.

KubeShere On Kubefed

Let’s take a look at how KubeSphere implements and simplifies multi-cluster management based on Kubefed.

Two concepts are defined in the picture. The Host cluster refers to the cluster installed with Kubefed, which belongs to the Control Plane. The Member cluster refers to the controlled cluster, and the relationship between the Host cluster and the Member cluster is federated.

In the picture, users can manage multiple clusters in a unified manner. KubeSphere defines a Cluster Object separately, which extends the Cluster Object in Kubefed, including region zone Provider and other information.

KubeSphere provides two ways to import a cluster:

  • Direct connection. In this case, Host to Member cluster network is reachable. only need to provide a Kubeconfig file to directly join the cluster, avoiding the complexity of kubefedctl mentioned earlier.

  • Proxy connection.

Kubefed has no way to federated the Host cluster to Member cluster network. Therefore, KubeSphere opened Tower based on Chisel to realize cluster federated management in the private cloud scenario. Users only need to create an agent in the private cluster to realize cluster federated.

Tower’s workflow is shown here. After an agent is created in the Member cluster, the Member cluster will connect to the Tower Server of the Host cluster. After receiving the connection request, the Server will directly monitor a port pre-allocated by the Controller to establish a tunnel. This allows you to distribute resources from Host to Member cluster through this tunnel.

Multi-tenant support in multi-cluster scenarios

In KubeSphere, a tenant is a Workspace, and tenant DE authorization is realized through CRD. In order to reduce Kubefed’s dependency on Control planes, KubeSphere sends these CRDS through the federated layer to the Member cluster directly after the Host cluster receives the API request. The original tenant information still exists in the Member cluster, and users can still log in to the Console of the Member cluster to deploy services.

Application deployment in multiple clusters

KubeSphere can directly select the name of the cluster that needs to be deployed and the number of copies of each cluster. It can also configure the mirror address and environment variables of different clusters in the differential configuration. For example, if cluster A is located in China and cannot pull the image of GCR. IO, it can be configured as DockerHub.

State collection of federated resources

For state collection of federal resources, we mentioned Kubefed was not implemented before. Therefore, KubeSphere developed the collection of the state of federal resources by itself, which can easily check the corresponding event information in the case of failed Pod creation. In addition, KubeSphere also provides the monitoring of federal resources to improve its observability.

TODO

Although KubeSphere simplifies federation between multiple clusters based on Kubefed, there are still some improvements to be made in the future.

  1. At present, the centralized Control Plane leads to the resource distribution can only push, which has certain requirements for the high availability of Host cluster. The Kubefed community is also actively developing the feature of pulling resources from Member cluster to Host cluster.
  2. KubeSphere is a very open community, we would like more community users to join, but the current multi-cluster development threshold is high, developers need to define a series of Types CRD, not friendly.
  3. The multi-cluster service found that there was no good solution, which was initially done by the community but later abandoned in favor of faster beta release.
  4. At present, the community provides RSP (Replica Scheduling Preference), and KubeSphere is expected to add it in the next version.

So, is there a way to implement multiple clusters without introducing a centralized Control Plane and without introducing too many apis? The answer is Liqo. Before introducing it, let’s first introduce Virtual Kubelet.

Virtual Kubelet can help you pretend your service is a Kubernetes node, and simulate Kubelet to join the cluster. This allows you to scale the Kubernetes cluster horizontally.

In Liqo, there is no federal relationship between clusters. In the left picture, UNDER Kubefed architecture, K2 and K3 clusters are member clusters of K1, and k1 push is required under the resource. In the right picture, K2 and K3 are only one node of K1, so when deploying applications, There is no need to introduce any API, k2 and K3 seem to be nodes of K1, so that businesses can be deployed to different clusters without perception, greatly reducing the complexity of transformation from single cluster to multiple cluster. Liqo is in its infancy and currently does not support more than two cluster topologies. KubeSphere will continue to look at other multi-cluster management solutions in the open source space in the future.

This article is published by OpenWrite!