200 million students take classes online.

As the epidemic raged, teachers and students who should have returned to the campus flooded online, and the sudden flood of traffic brought great challenges to the online education industry.

Baiyun, which has long served education companies, is no exception. As an enterprise committed to providing one-stop cloud classroom solutions for educational institutions, Baijiayun received demands from many educational institutions to build online cloud classrooms during the epidemic, while the flow of educational institutions that were mainly offline was instantly switched to online.

In response to the education department’s call to suspend classes and help students and teachers to reopen, all the staff of Baiyun shortened their holidays and began to work at home from the second and seventh days of the New Year.

The explosive demand in a short time is unexpected for every educational enterprise. According to Li Gangjiang, CEO of Baiyun, the business volume of baiyun has increased by dozens of times in a short period of time. The need to scale up so quickly, and to do so without the customer’s knowledge, is more difficult than delivering a new system.

The silver lining is that the Hundred Cloud team’s exploration of agile architecture has prepared them for such high concurrency scenarios. Before this battle, Baiyun has optimized its container cluster architecture and planning with the help of Ali Cloud team, and achieved dynamic expansion and efficient control calmly with the core solutions of Ali Cloud container service ACK and flexible bare metal (Shenlong) example.

Explore container transformation, and be able to cope with flood peak agility

Baiyun was very lucky to complete the container transformation before this round of explosion. Other online education companies that did not use containers had to multiply their machines in the face of surging users, resulting in longer deployment time and sharply higher business costs. The story begins with the business development of Baijia Cloud. From the beginning of its birth in 2017, Baiyun is a cloud video company with the most pure education gene in the industry. In 2018, it achieved revenue of over 100 million yuan and served more than 1,000 educational enterprises. The rapid growth of business is also prompting the baiyun technical team to explore the optimization of its own technical architecture. In 2019, Baijiyun gradually launched small-class class products, which are different from large-class classes in that they need to record and replay the courses through audio and video screen capture. During this process, you also need to isolate audio from video. Isolation at the virtual machine level is too costly; In a unified virtual machine, the processes interfere with each other. The 100 Cloud team then focused on container, a more lightweight virtualization technology. Since the first half of 2019, Baiyun has tried to carry out small-scale container transformation of its business and completed the basic process. However, with the expansion of the scale of hundreds of cloud containers, scheduling and management become a new problem. Alibaba Cloud Container Service for Kubernetes (ACK) greatly reduces the work of hundreds of clouds. Hundreds of technical teams say that containers reduce the workload of operations and testing; It is convenient to realize version control of application running environment. In addition, IT has lower computing overhead than virtual machines, reducing IT costs. At that time, the tide of container-based cloud native was sweeping. The container-based cloud native architecture provided agile and elastic technical reserves for the possible business peak of Hundreds of clouds.

It’s just, this is the first step.

The instantaneous arrival of the flood peak, or to hundreds of clouds brought a test.

With the help of Aliyun “Container + Shenlong”, the capacity expansion was achieved by dozens of times within three days

As the tide hits, the problem is straightforward: expansion.

The epidemic is a common enemy across the country. With the continuous and stable growth of business, Baiyun did not expect to face such a “battle” in the New Year. Many configurations of the original container cluster were not planned according to the large-scale cluster, leading to the limitation of the nodes that a single cluster can accommodate. The original small specification instances also limited the capacity of a single node.

In view of the capacity expansion of Baiyun, Aliyun team suggests customers to choose flexible bare metal servers with large specifications (Shenlong). According to the application load characteristics of Baiyun, the container service is combined with the control of elastic bare metal instances with appropriate specifications to optimize the cost, avoid waste and improve the guarantee of elastic supply.

First of all, Ali Cloud elastic bare metal server (Shenlong) server specifications are high, can help Baiyun significantly improve the capacity of a single node.

More importantly, The K8s cluster of Baiyun has extremely high performance requirements. Shenlong server has obvious performance advantages. The solution of “container + elastic bare metal (Shenlong)” is very suitable for the scenario of baibaiyun with large traffic and high concurrency.

The container-based architecture meets the requirements of rapid service provisioning and flexibility. Dpca server completely eliminates virtualization loss and improves computing performance by 8%. Dpca server has physical-like characteristics and can carry out secondary virtualization.

The dragon’s properties, combined with the container’s elasticity, make for a match made in heaven. According to the data, containers running in the cloud perform 10% to 15% better than non-cloud physical machines. The CPU/Mem of Dragon has no virtualization overhead because virtualization overhead is offloaded onto the MOC card, while each container running on Dragon has its own ENI elastic network card, which increases network throughput by 13%.

Third, DpCA server is separated from storage bandwidth and computing bandwidth, which can meet the massive read and write requirements of hundreds of cloud business scenarios. With DpCA servers, computing power increases, but storage I/O performance bottlenecks are also encountered. Baiyun solves the I/O bottleneck by using ali Cloud’s high-performance NAS service and horizontally expanding into four clusters.

Based on the above scheme, with the help of its own large-scale cluster management ability, ali Cloud team effectively upgraded the original architecture scheme of Baiyun team in just a few days, realized the capacity expansion of dozens of times, greatly improved its performance and stability, and had the ability to deal with explosive scale.

Optimize architecture and cluster planning to significantly reduce o&M costs

In the face of the sudden increase of traffic pressure, how to quickly dynamic elastic capacity expansion and efficient control operation and maintenance becomes an urgent problem.

Changed the original form of virtualization nesting, hundred clouds using Shenlong to achieve a container of high density deployment. With the agile management ability of the container, the cost was saved by at least 25% and the operation and maintenance workload was reduced by 80%. At the same time, the K8s cluster should be rationally planned and the overall architecture, such as network, storage scheme and capacity expansion principle, should be optimized to ensure the stability of subsequent operation and maintenance and reduce the use cost.

In addition, Baiyun also uses aliyun’s efficient operation and maintenance management tools, which significantly reduces the operation and maintenance workload.

As the container time of Baiyun business is very tight, there is not much time to spend on operation and maintenance monitoring. Using ARMS Prometheus, 100 clouds monitored the container node environment in just half an hour. Compared to open source Prometheus monitoring, ARMS Prometheus has unlimited data volume and seamless integration with Alicloud container service ACK, allowing BACCarat to efficiently and quickly locate problems in the container and understand how to improve its products.

In the log service (SLS) of Ali Cloud container platform, the event center of small and medium-sized applications displays the cluster status change and component abnormality events in detail, helping Baiyun to summarize the abnormal information of the logs in the nodes to the control panel and timely alarm.

Li Gangjiang concluded that the value of Ali Cloud to hundreds of clouds is mainly the following three points:

1. Provide space for elastic computing and capacity expansion with agility and security: Ali Cloud carries out image preheating and other means for application images, so that containers can be pulled up in the first time during capacity expansion. Based on the container mirroring service (ACR), large-scale container mirroring assets are securely hosted and the whole life cycle of application mirroring is managed quickly and securely through fine-grained image authorization management and control.

2. Provides relatively stable service and excellent performance: Based on the Shenlong integrated software and hardware architecture developed by Ali Cloud, the flexible bare metal server (Shenlong) has the performance of physical machine and the experience of virtual machine. By leveraging DpCA, Baiyun achieves better scheduling of K8s clusters, coupled with high-performance NAS services, and solves I/O bottlenecks.

3. The technical support team responded in a timely manner and helped Baiyun optimize its architecture: Part of the reason why Baiyun later faced expansion problems was that its original business architecture plan was not well prepared for managing large-scale clusters. Aliyun helped Baiyun optimize its business architecture and the management ability of clusters in a short time.

As the top cloud service provider in China and the world, Ali Cloud has strong capabilities in IaaS and PaaS layers. The accumulation of baiyun in the education SaaS layer can be complementary with Ali Cloud to cover the market with a complete online education program. Both parties are gradually deepening cooperation. Baiyun will soon launch Aliyun Cloud market — the commercial platform of Aliyun SaaS accelerator, namely “Software Tmall”. Subsequent users can directly purchase Baiyun’s services in the cloud market.

On February 26, welcome to join at 10 a.m. ali cloud war “epidemic” digital classroom series course, the details about ali cloud online education a complete solution, ali cloud technology can help online education to large flow high concurrency, ensure the continued safe operation platform and outbreak during ali cloud special support for online education policy. “Cloud + education” unlimited possibilities, please scan the qr code below to enter the live broadcast room.

For more information, please email [email protected]



Read more: https://yqh.aliyun.com/detail/6417?utm_content=g_1000106257

On the cloud to see yunqi: more cloud information, on the cloud case, best practices, product introduction, visit: https://yqh.aliyun.com/