Brief Introduction: The unified scheduling for the first time in scale on November 11 of this year achieved a new breakthrough in the industry by unified management of computing, storage and network resources at the bottom through a set of scheduling protocols and a set of system architecture, with super-large scale, high efficiency and automatic resource flexibility. In offline mixing department, offline mixing department, the new fast up and down technology, reduce tens of thousands of server procurement, bring hundreds of millions of resource cost optimization and greatly promote efficiency improvement.
01 background
The unified scheduling Project 1.0 successfully supported the 2021 Double 11 promotion, and the unified scheduling scheme realized the comprehensive upgrade and optimization of the whole process from container scheduling to fast up-down. More than 100 core members of the project team have successfully gone through various stages of project approval, POC, scheme review and design, closed development and test, and big sprint.
As the core project of Alibaba, Ali Cloud (container team and big data team), together with Alibaba resource efficiency team and Ant container Arrangement team, lasted more than a year of research and development and technical breakthrough, and achieved a comprehensive upgrade from “mixing technology” to today’s “unified scheduling technology”. **
Today, unified scheduling has realized the comprehensive unification of scheduling of Alibaba e-commerce, search promotion, MaxCompute big data and ant business, the unification of POD scheduling and high-performance task scheduling, the unification of complete resource view and scheduling coordination, and the improvement of mixing and utilization rate of a variety of complex business forms. It fully supports the large-scale resource scheduling of dozens of data centers, millions of containers and tens of millions of cores around the world.
Yunyuansheng product family
02 Unified scheduling technology comprehensive upgrade
The essence of cloud computing is to turn small computing fragments into larger resource pools, fully cutting peak and filling valley, and providing the ultimate energy efficiency ratio. Under the pursuit of low carbon energy saving, green environmental protection, technological development and more efficient operation of data center, Alibaba’s exploration of technology is endless. Alibaba’s technologists have a vision of turning the computing power of data centers into infrastructure like water, electricity and gas, right out of the box.
In order to maximize the advantages of peak-valley complementarity between businesses, in the past, we built the mixed-part technology to break the fragmentation of multi-resource pools and make multi-scheduling brains in different computing fields cooperate and share resources. The old generation of mixing technology has brought great improvement in resource unification and utilization, but the nature of multi-scheduler limits our pursuit.
Alibaba continues to pursue the construction of a new generation of scheduling technology that can support more complex tasks with undifferentiated mixing, extreme flexibility and complementary, leading to achieve the ultimate global optimal scheduling and provide higher quality computing power. This year we reached a new tipping point in technology when container services ACK led and partnered with many teams to launch a new generation of UNIFIED scheduling based on ACK.
Container Product Family
This year’s Unified scheduling, the first large-scale launch of Singles’ Day, achieved a new breakthrough in the industry by unified management of computing, storage and network resources at the bottom through a set of scheduling protocols and a set of system architecture, with super-large scale, high efficiency and automatic resource flexibility. In offline mixing department, offline mixing department, the new fast up and down technology, reduce tens of thousands of server procurement, bring hundreds of millions of resource cost optimization and greatly promote efficiency improvement.
This year for the first time to introduce large-scale data intelligent scheduling ability, to further enrich provides including real-time load sensing, automatic specification recommended (VPA), differential SLO workload scheduling, CPU normalization, support periodic prediction of HPA, time-sharing multiplexing, and so on, provide more dimensions of cost optimization technology and high reliable security container runtime.
Centering on the unified scheduling of the new generation, alibaba’s e-commerce, search, big data and other platforms and different types of complex computing resources apply for resources in a consistent way. The coordinated quota management and resource planning can be completed by borrowing hundreds of thousands of nuclear resources in seconds. Based on unified scheduling, Ali Cloud and Ant have also realized the integration of scheduling technology, and ant ecology has been comprehensively upgraded to unified scheduling. The scheduling platform brings more imagination space for the future. For example, we can use various means, such as price lever and other economic factors, to drive alibaba’s internal businesses to make more rational use of the resources of each data center, ensure that the global resource level of the data center is balanced as far as possible, and improve the energy efficiency ratio of the data center.
Ali cloud container service ACK to the standard Kubernetes further enhanced, higher performance throughput and lower response delay to build a stable and reliable super large scale single cluster capacity, stable support for 12,000 nodes over 1 million cores of super large scale cluster, for unified scheduling of large resource pool production operation provides a solid base. Alibaba’s various types of complex resources have also achieved comprehensive integration and upgrading based on container service base ACK.
In addition to e-commerce, search, big data and other ali classic scenes, unified scheduling is also greatly empowered by new technological innovation. Take live TV shopping mall as an example, decision-making has a high demand for real-time computing, such as second-level data analysis of real-time data such as browsing and trading generated by more than 90 million online viewers in the Via Double 11 live broadcast room. This year, Alibaba upgraded Blink real-time computing engine to a new generation engine based on unified scheduling, which greatly improves the cost, performance, stability and user experience. Compared with Yarn, the pulp-up performance of large-scale operations is 40% faster, and the error recovery efficiency is 100% higher. The unified scheduling technology can save hundreds of thousands of CPUS in the double 11. When the CPU water level of the cluster exceeds 65%, the global zero hot spot can be realized, ensuring the timeliness of all live streams.
In terms of Serverless, function service has been implemented on a large scale in the group for the first time, and has been applied to double 11 to support more than 10 business scenarios such as Taobao search recommendation, data processing and front-end SSR. With the help of unified scheduling technology, function calculation can realize large-scale mixed running with Ali resource pool, make full use of cluster fragment resources, and completely solve the problem of idle resource cost in Serverless scenario during low traffic peak period. Based on ACK image loading on demand and network stack optimization, the cold start time of function instance is less than 150ms, and the cold start rate of function calculation container is less than 5% combined with pooling technology, which is the key to ensure the success of double 11.
03 Future Outlook
In the future, container service ACK will export alibaba’s unified scheduling experience to the whole industry, support more new computing load ecology and architecture evolution of new technology forms, realize cloud computing everywhere, comprehensively empower more enterprises and release greater low-carbon value dividends.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.