Keep the author | Zhang Zhen (Calvin) ali greetings native application platform, senior technical experts
In 2019, Alibaba’s core system is 100% cloud native, which perfectly supports the Singles’ Day promotion. The gesture of the cloud is very unusual, not only embrace Kubernetes, but also embrace Kubernetes as an opportunity to carry out a series of deep transformation of the operation and maintenance system.
As the best practice of cloud native, Kubernetes has become the de facto container choreography engine standard. The landing of Kubernetes in Alibaba Group mainly experienced four stages:
-
R&d and exploration: In the second half of 2017, Alibaba Group began to try to use Kubernetes API to transform the internal research platform, and began to transform the application delivery link to adapt to Kubernetes;
-
Preliminary gray scale: In the second half of 2018, Alibaba Group and Ant Financial jointly invested in the research and development of Kubernetes technology ecology, striving to replace the internal research platform with Kubernetes, realizing small-scale verification and supporting part of the traffic of Double 11 in that year;
-
Cloud gray scale: Alibaba economy began to carry out a comprehensive cloud transformation at the beginning of 2019. Alibaba Group redesigned Kubernetes landing plan to adapt to the cloud environment and transform backward operation and maintenance habits, and completed the small-scale verification of cloud machine room before 618.
-
Large-scale implementation: After June 18, 2019, Alibaba Group started to comprehensively promote the implementation of Kubernetes. Before the promotion, it completed the goal of all core applications running on Kubernetes, and perfectly supported the Double 11 exam.
In the practice of these years, a question always haunts the minds of all architects: Under such a large and complex business as Alibaba, a large number of traditional operation and maintenance habits and the operation and maintenance system supporting these habits are left behind. What should Kubernetes insist on when landing? What to compromise? What to change?
This article will share Alibaba’s thoughts on these issues in recent years. The answer is obvious: embrace Kubernetes itself is not the goal, but by embracing Kubernetes leveraging business cloud native transformation, through the ability of Kubernetes, governance of the traditional operation and maintenance system under the chronic diseases, release the ability of cloud flexibility, for business application delivery unbound speed.
In alibaba’s Implementation of Kubernetes, we pay attention to the following key cloud native transformation:
Facing the final state transformation
Under alibaba’s traditional operation and maintenance system, application changes are completed by PaaS by creating work orders, initiating workflow, and then initiating changes to container platform one by one.
When an application is published, PaaS finds all containers related to the application from the database and sends a change to the container platform to modify the container image for each container. Each change is actually a workflow that involves pulling images, stopping old containers, and creating new ones. Whenever an error or timeout occurs in the workflow, PaaS is required to retry. Generally speaking, to ensure that the work order is completed in a timely manner, the retry is performed only for several times. After several retry failures, manual processing is required.
If a container fails to be deleted or times out due to a host exception, the PaaS can only retry repeatedly. To ensure the end of the work order, the PaaS can only consider that the container is successfully deleted after a certain number of retries. If the host machine returns to normal later, it is likely that the deleted container will still be running.
Traditional process-oriented container changes have been unable to solve the following problems:
-
Failure of a single change does not guarantee ultimate success
For example, if a container image change fails, PaaS cannot guarantee the final consistency of the container image; If you fail to delete a container, there is no guarantee that the container will actually be deleted. Both examples require inspection to deal with inconsistent containers. However, the inspection task is difficult to ensure its accuracy and timeliness because of less execution.
-
Multiple changes can conflict
For example, an application needs to be locked during application release and expansion. Otherwise, the newly expanded container image is not updated. Once a change is locked, the efficiency of the change decreases dramatically.
Kubernetes’ capabilities provide an opportunity to solve this problem. Kubernetes workload provides a declarative API to modify the number and version of application instances. The Workload controller can monitor the actual situation of POD to ensure that the number and version of application POD instances conform to the final state and avoid the conflict between concurrent expansion and release. Kubernetes’ Kubelet will repeatedly try to start a POD based on its spec until the pod matches the final state described by the spec. Retries are implemented internally by the container platform and are no longer tied to the application’s work order state.
Self-healing ability modification
Under the traditional operation and maintenance system of Alibaba, container platform only produces resources, and application startup and service discovery are performed by PaaS system after container startup. This layered method gives PaaS system the greatest freedom, and also promotes the first wave of container ecological prosperity of Alibaba after containerization. However, there is a serious problem in this approach, that is, the container platform cannot independently trigger the expansion and contraction of the container, so it needs to do complex linkage with each PaaS, and the upper PaaS also needs to do a lot of repetitive work. This prevents the container platform from effectively self-healing when the host machine fails, restarts, processes in the container become abnormal, or get stuck, and makes elastic scaling very complicated.
In Kubernetes, through container commands and lifecycle hooks, the process of PaaS starting applications and checking application starting status can be built into POD. In addition, by creating a Service object, the container can be associated with the corresponding service discovery mechanism, so as to unify the lifecycle of the container, application, and service. Container platforms are no longer just producing resources, but delivering services that can be used directly by the business. This greatly simplifies the construction of fault self-healing and automatic elastic capacity expansion and contraction after cloud loading, and truly gives play to the elastic capacity of cloud.
In addition, in the event of a host failure, PaaS traditionally need to expand the application before removing the host container. However, in large clusters, we find that we often get stuck in the application expansion step. Application resources amount may be not enough, the cluster to satisfy application scheduling constraints idle resources may also not enough, not expansion cannot to expel the container on host, and also can’t send the abnormal host machine, over time, the whole cluster fault machine is easy to fall into a lot of, want not repaired, filled with trouble.
In Kubernetes, the processing of the failed machine is much “simple and rough”. It is no longer required to expand the capacity of the application, but directly delete the container on the failed machine, and after deletion, the load controller will expand the capacity. This scheme sounds audacious at first glance. When we landed in Kubernetes, many PaaS students rejected this method, believing that it would seriously affect the stability of the business. In fact, most core service applications maintain a certain amount of redundant capacity to facilitate global traffic switchover or deal with unexpected service traffic. Deleting a certain amount of containers temporarily does not cause service capacity insufficiency.
The key problem we face is how to determine the capacity available to the business, which is of course a more difficult problem, but for a self-healing scenario there is no need for an accurate capacity estimate at all, just a pessimistic estimate that can drive self-healing. In Kubernetes, you can use PodDisruptionBudget to quantitatively describe the amount that can be migrated to an application, such as the number or proportion of concurrent applications that can be expelled. This value can be set with reference to the percentage of quantities per batch at release. If application releases are typically distributed in 10 batches, then you can set maxUnavailable to 10% in the PodDisruptionBudget (for scale, Kubernetes still considers it acceptable to expel one instance if the application has less than 10 instances). What if the application really doesn’t allow expulsion at all? So sorry, such applications need to be transformed before they can enjoy the benefits of the cloud. Applications can modify their architecture or use operators to automate application o&M operations to allow instance migration. <
Upgrading immutable infrastructure
The emergence of Docker provides a unified form of application delivery. The binary, configuration and dependency of the application are integrated into the image during the construction process, and the changes of the application are completed by using the new image to create containers and deleting the old containers. Docker based in delivering applications and traditional software or script delivery, there is a significant difference between is the mandatory container immutable, want to change container can only be done through the newly created container, and every new container comes from application of the same image creation, to ensure the consistency, avoiding the configuration drift, or snowflakes server problem.
Kubernetes further reinforces the concept of immutable infrastructure by not changing not only containers but also pods during the default rolling upgrade. Each release is completed by creating a new POD and deleting the old POD, which not only ensures the consistency of application mirroring, but also ensures that data volumes, resource specifications, and system parameters are consistent with the spec of the application template.
In addition, many applications have complex structures, and an application instance may contain components developed independently by multiple teams. For example, an application might include business-related application servers, log collection processes developed by the infrastructure team, and even third-party middleware components. These processes, components if you want to publish independently can not be placed in an application image, so Kubernetes provides the ability of multiple container POD, you can arrange multiple containers in a POD, to publish a single component, just need to modify the image of the corresponding container.
However, Alibaba’s traditional container form is rich container, that is, application server, log collection process and other related components are all deployed in a large system container, which causes that the resource consumption of log collection and other components cannot be individually limited, nor can they be easily upgraded independently. Therefore, alibaba began to separate all other components in the system container except business applications into independent Sidecar containers in this cloud launch, which is called lightweight container transformation. After modification, a POD will include a master container for running business, an operation and maintenance container for running various infrastructure agents, and sidecar containers for service grids. With the lightweight container, the business master container can run business services at a lower cost, making it easier to retrofit serverless.
However, Kubernetes’ default rolling upgrade process rigidly enforces the idea of immutable infrastructure, leading to a serious lack of capability support for multi-container Pods. While it is possible to orchestrate multiple containers in a POD, if you publish one container in a POD, the actual publication will not only rebuild the container to be published, but also remove the entire POD, reschedule, and rebuild. This means that if you want to upgrade the log collection component of the infrastructure, other components, especially the application server, will be deleted and restarted, thus interrupting normal service running. As a result, multiple component changes are still not decoupled.
For services, if there is a component of local cache in POD, and the cache process restarts every time a service is released, the hit ratio of cache will be greatly reduced during service release, affecting performance and even user experience. In addition, if the component upgrades of the infrastructure, middleware, and other teams are tied to the component upgrades of the business, this can create a significant obstacle to the iterative updating of the technology. If the team in charge of middleware launches a new version of the Service Mesh and needs to beg for business releases to update the mesh components, the technical upgrade of the middleware will be significantly slowed down.
Therefore, we believe that the technical advantages of Kubernetes multi-container POD can be better utilized by adhering to the principle of container-level immutability rather than pod level immutability. To this end, we built the ability to modify only part of the pod containers in place when the application is published, specifically a workload controller that supports in-place container escalation, and replaced Kubernetes’ default Deployment and StateFulset controllers as the main internal workloads.
In addition, a Sidecarset has been built to support cross-application sidecar container upgrade to facilitate the upgrade of infrastructure and middleware components. In addition, support for in-place upgrades provides additional advantages such as deterministic cluster distribution, accelerated image downloads, and so on. This capability has been open-source through the OpenKruise project. Kruise in OpenKruise is a homonym of cruise, ‘K’ for Kubernetes, which means automatic cruise applied on Kubernetes. It is full of Alibaba’s experience in application deployment management for many years and the best practice of Alibaba’s cloud biological process. Currently, OpenKruise is planning to release more controllers to cover more scenarios and functions, such as rich publishing strategies, Canary publishing, blue-green publishing, batch publishing, etc.
conclusion
This year, we realized the large-scale implementation of Kubernetes, which stood the test of the real scene of the Double 11 promotion. There is no shortcut for landing K8s in a scenario with a large number of applications like Alibaba. We have withstood the temptation of landing K8s at a rapid scale. Instead of choosing compatibility and compromise and backward operation and maintenance habits, we choose to lay a solid foundation and dig deep into the original value of the cloud. Next, we will continue to promote the cloud native transformation of more applications, especially the transformation of stateful applications, to make the deployment, operation and maintenance of stateful applications more efficient. In addition, the cloud native transformation of the entire application delivery link will be promoted to make application delivery more efficient and standardized.
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.