Ali Cloud Xiong Ying: Evolution and practice of edge cloud native architecture based on fusion and collaborative system

Native and edge of cloud computing is in the past two years are very fire technical topics, the 10th cloud computing standards and application meeting, ali cloud senior technical experts hodgson’s hawk-eagle Shared the edge cloud native architecture based on fusion and collaborative system evolution and practice, and hope that through the introduction now ali cloud on the edge of the edge of computing and cloud native evolution of these technologies in the field of system architecture, Let you understand the business in the cloud native and edge computing combined scenarios falling ground some thinking.

Follow aliyun Edge Plus wechat official account, reply to “Xiong Ying”, and get PPT materials of the speech

An overview of the

In recent years, the development of edge computing is very rapid, you can see the definition of it in a variety of standards and data, here to make a summary of several basic concepts

【 origin 】

The concept of edge computing can be traced back to a long time ago, but its real popularity is mainly due to the development of 5G. The development of 4G has led to the explosion of mobile Internet. Therefore, the edge of computing in the 5G era has been given great expectations, hoping to become a new industrial track; 5 g, on the other hand, 3 GPP defined three scenarios in the large bandwidth, low latency, wide connection, the application of computing from various aspects will strengthen the edge of the scene, and the resulting telecom infrastructure improvement, make the calculation can sink further, from the Internet to further sinking to the access network, core network, computing are becoming is getting closer and closer to the user.

【 Definition 】 The definition of edge computing is different in different fields and perspectives. Operators, cloud service manufacturers and hardware manufacturers have different definitions of edge computing. In the edge cloud standard of Aliyun, edge cloud is defined as: providing distributed, definable, schedulable, standard open and secure computing platform and service in the network nodes close to terminals (people and things). The goal is to push the boundaries of the cloud, bring computing and connectivity closer to things, and make it the cornerstone of the Internet of everything.

【 characteristics 】

Compared with the central cloud, the nodes on the edge are decentralized and multi-level, with many nodes and small volume, not only at the regional level and provincial level, but also at the prefecture-level and municipal level and the park level. In the 5G scenario, it is more necessary to sink to the access network. The network between the cloud and the edge and the edge may be the Internet channel.

“Challenge”

Massive, distributed, and heterogeneous edge node resources bring huge challenges to services. Multiple network portals represent the unified traffic monitoring and elastic scaling policies are unavailable. The large number of nodes and small volume represent the weak elasticity of a single cluster, while the overall elasticity is strong. The management of massive nodes and the network environment of the Internet have a great impact on high availability, disaster recovery, and migration.

In general, 5G, the Internet of Things and the industrial Internet have received wide attention due to the proposed and layout of new infrastructure. The acceleration of commercialization and industrialization of 5G has made the underlying infrastructure more mature. This year has given rise to a large number of new industries, such as cloud applications, cloud games, interactive entertainment, industrial Internet 2.0, etc. At the same time, it is also driving the rapid change and evolution of the overall technology architecture.

Infrastructure Evolution

First, the evolution of edge infrastructure is introduced. According to the business form, AliYun defines three stages:

First stage ready for edge cloud, at this stage, the user is only apply to run on the physical machine, migrated to virtualization environments, this process is to reduce costs as the main driving force, users no longer self-built node, the underlying physical facilities operations to edge processing of cloud, and the development and operational mode of the application is not very different.

Second stage for edge cloud native, users hope to be able to further reduce the overall cost of ownership, improve system capacity and research and development efficiency, using standardized, automated way to manage resources, delivery, application and operational system, based on user K8S in-depth development and customization, will be an integrated edge resources, features, on the edge of the adapter On the development and construction of their own PaaS platform, to provide internal business use.

You should be familiar with the first two phases, which are similar to the evolution of the central cloud;

The third stage is edge fusion cloud native, which should be regarded as a relatively new concept. This is a stage that Ali Cloud groping and defining in the practice process, combined with the thinking of user business.

To expand: Edge features are distributed, small and many resources, and complex network conditions, so users need to pay attention to the stability of infrastructure at any time, and switch and migrate services and data. In addition, the flexibility is not strong, so the user’s business is difficult to achieve on-demand use; In addition, the integration of various edge capabilities in the technical architecture also requires users to be more deeply involved in K8S and have customized development capabilities. In summary, the user needs to be aware of the underlying resources, infrastructure, even inventory, water level, planning, etc., and the technical challenges and difficulties of sinking the business to the edge are great. With edge converged cloud native, users can enjoy flexible, highly available, on-demand capabilities without having to care about the underlying infrastructure at the edge. Edge fusion cloud native should shield the edge characteristics of heterogeneous resources, multiple clusters and inventory levels. The capabilities of resource scheduling, elastic scaling and multi-level coordination are settled and opened. Using the good scalability of cloud native, the resources and capabilities are abstracted and integrated; In addition, unified and standard interface encapsulation should be provided for common and emerging business scenarios. Release these capabilities to the user.

System Architecture Evolution

In the practice of technological architecture evolution, layered design is carried out according to the same idea just mentioned. Infrastructure layer: It has the capabilities of heterogeneous resource management, multi-level network architecture, converged storage form, etc., and solves the problems of integrated management, converged production, and abstract shielding of underlying resources. Cloud-side collaboration layer: with computing, storage, network flow capabilities, with cloud side, side, multi-cloud collaboration capabilities, to solve various capabilities and system coordination problems; Platform engine layer: with the edge cloud native abstract integration ability, to solve the integration of resources, components and applications, scheduling, choreography and other capabilities; Business scenario layer: it has the ability of unified interface, business precipitation and scenario deepening, and solves the problem of developer ecological closed-loop.

It can be expected that with the continuous evolution and improvement of 5G technology and infrastructure, as well as the development of innovative businesses, the system architecture will continue to evolve and change.

On paper come zhongjue shallow, must know this to practice. Next, Xiong Ying through the introduction of Ali Cloud in the actual business practice in the process of the case, to interpret the ability and design of each layer.

Application case — Stateless applications

This scenario applies to task services (such as pressure testing, dial-up testing, and offline transcoding) or peer-to-peer networks (P2P transmission networks). These services have high requirements on resilience and scalability and are highly sensitive to costs, but have low requirements on location and high availability. This scenario is a typical application scenario that tests the capabilities of edge computing infrastructure because the elasticity of edge single nodes is weak but the elasticity of global resources is strong. In terms of architecture, it needs to have a unified inventory, converged scheduling, and collaborative scheduling of global resources. In terms of computing form, it supports multiple converged computing forms, such as virtual machines, containers, and secure containers, to meet the business requirements of different scenarios. In terms of resource inventory, there should be a converged resource pool; In terms of scheduling and choreography, there is also a coordinated and unified scheduling capability; In this way, it can provide the ability to flexibly scale and use on demand in event triggered and traffic burst scenarios, and also greatly reduce user costs.

Application cases – Stateful applications

In this scenario, in addition to computing and elastic hosting, the business will also host domain names and scheduling. In addition, due to the complexity of the business, the architecture is also increasingly complex. First, within a single cluster, the system needs to be split into multiple microservices that work independently; Second, multiple microservices are choreographed and dependent on each other. Third, there will be a need for collaborative communication between cloud and edge (management control and business) and edge to edge (cluster to cluster). Finally, add domain name and traffic scheduling, SLB, database, middleware and other general capabilities and component integration requirements; From this point of view, the application scenario in the edge is no less complex than the application in the central cloud, but also the edge of the distributed, multi-cluster, wide scheduling features; Distributed cloud computing is a good description of this scenario.

How do you architecturally address the needs of the business? At the lowest level of infrastructure, distributed SLB and DISTRIBUTED DB are introduced in product capability; In the network capability, the programmable and configurable cloud side and side coverage network capability is added. On the collaboration layer, cloud side collaboration, side collaboration, dynamic balance collaboration of traffic and resources are also core capabilities. At the engine level, there needs to be deep development of cloud-native capabilities for edge adaptation, For example, the K8S multi-cluster management federation capability introduced by the massive node management, the Virutal Cluster capability to solve the isolation of multi-tenant services, the Service Mesh component to solve the Service discovery and collaborative communication in the micro-service architecture, the CNI and CSI components for the edge virtual network, virtual storage and so on.

Xiong Ying: “There are not many standards and specifications for the concept of distributed cloud computing. To make a complex application distributed and sink from the center to the edge, a lot of work needs to be done to transform the system architecture. This is also the direction of Ali Cloud efforts, hope to precipitate more platform capabilities, form a closed-loop development ecology, so that distributed cloud computing, can easily fall to the edge.”

Application case — Cloud on terminal

This business scenario has been very hot this year, typically in the areas of cloud gaming and cloud applications. Services host systems or applications running on terminals to the cloud to reduce terminal costs and lower the entry threshold for high-quality services. In edge converged cloud native, there will be a fundamental conceptual shift from resource hosting and application hosting to device hosting and location-insensitive hosting. In the infrastructure layer and the engine layer, the resources are preliminarily encapsulated for all kinds of heterogeneous resources, and a unified standard virtualization resource is abstracted to provide security and isolation capabilities. At the business layer, another layer of encapsulation is carried out to mask the resource attributes. The concept of resource is replaced by the concept of device. At the same time, the ability of collaborative computing, collaborative storage and collaborative network is added on the collaboration layer, so that virtual devices can flow. Instead of the traditional concepts of applications and resources, you can only view the management and control capabilities of one virtual device, such as device data, device applications, and device scheduling.

Here to emphasize the concept of digital twin, Xiong Ying said: in the Internet of all things, behind every physical terminal, there will eventually be a shadow terminal on the edge of the cloud, or is the carrier of data, or is the extension of the system.

Application case — Ultra high-definition video

This scenario is still in the stage of technology exploration, and it is a scenario that truly sinks into the 5G MEC node, hoping to create a common technology architecture model that can be replicated in the 5G field. In this scenario, the key is to get through the resource coordination, traffic scheduling and network diversion coordination capabilities of the collaboration layer and the carrier MEC system. In the 5G/MEC era, computing power continues to sink to access networks and MEC nodes, and common protocols such as DNS protocol cannot meet precise scheduling requirements. On the one hand, the scheduling decision will need to be made according to the accurate regional information of the terminal; on the other hand, the decision should also be made according to the demand scenario of the business. For example, high real-time services such as positioning and AR/VR will be placed in the access room to meet the real-time requirements. Services with high transmission bandwidth saving such as video analysis and services with high real-time performance of cloud games will be placed in the convergence room, taking into account both functional and real-time requirements. The business of heavy computing/large storage is placed in the reconvergence room or core room; The design of multi-stage computing and multi-stage network can make the whole system more powerful and rich.

Edge fusion Cloud native allows users to flexibly select service deployment locations according to different scenarios while taking into account the requirements of service delay and computing power. Of course, these capabilities should be provided as encapsulated abstractions to the upper layers, and users and businesses should not be aware of the complexity of the underlying infrastructure.

conclusion

5G era, terminal cloud, VR/AR, edge AI, industrial Internet, smart agriculture application scenarios will gradually erupt; In some proprietary areas, there are already heavy applications; But in the world of universal Internet technology, the true Killer application of 5G has yet to emerge, or the architecture of the true combination of 5G technology and infrastructure has yet to evolve. Hodgson’s hawk-eagle really looking forward to the edge of computing platform to build, cooperation, multilevel network scheduling, resource integration in the cloud side side real through coordination, native technology based on the cloud, for the industry to provide an open, standard edge of cloud synergy, cloud network integration ability, let more applications can easily sink to the edge, realize interconnection of all time.