On November 13, 2021, Ruan Zhaoyin, a technical expert of China Telecom Tianyi Cloud Container research and development, delivered a speech with the theme of Tianyi Cloud’s Large-scale CDN Scenario Landing Practice based on KubeEdge in the cloud Native Edge Computing Forum. This paper introduces how to complete the CDN edge node management, automatic deployment and upgrade of CDN edge service and edge service disaster recovery through KubeEdge in the process of cloud transformation.

▲ Zhaoyin Ruan/China Telecom Tianyi Cloud Container R & D technical expert

The speech mainly includes the following four aspects:

1) Background of Tianyi Cloud CDN Cloud project

2) Edge node management based on KubeEdge

3) Edge application service deployment

4) Thinking about the evolution direction of future architecture

01 Background of Tianyi Cloud CDN Cloud project

Tianyi Cloud CDN business background

China Telecom accelerates cloud network integration with the resource layout of “2+4+31+X”. “X” is the access layer. Content and storage are placed in the nearest place to users, enabling the network to move with the cloud, convenient access to the cloud, and smooth communication between clouds to meet users’ demand for selection and low latency. Although The CDN of Tianyi Cloud started late, it has all the basic CDN functions, rich resource reserves, supports precise scheduling, and adheres to quality first. The overall business development is stepping into the fast lane. \

Edge service containerized background

Different from other cloud vendors and traditional CDN vendors, Tianyi Cloud CDN started late, but also coincides with the popularity of cloud native concept. Therefore, we chose to build the CDN PaaS platform through container and K8S choreography technology, but the CDN edge service has not completed the cloud native transformation. \

Existing problems:

  • How to manage the CDN nodes distributed in the edge on a large scale?
  • How to deploy and upgrade CDN edge services reliably?
  • How to build a unified and extensible resource scheduling platform?

02 Edge node management based on KubeEdge

CDN physical node architecture

CDN provides cache acceleration services to solve the last kilometer acceleration problem. In order to meet the requirements of nearby access and rapid response, most CDN nodes need to be deployed near users and access users to the nearest node through the CDN global traffic scheduling system. Generally, CDN nodes are distributed discretely. Most of them are based on regional IDC and city-level IDC room resources. Each edge room builds multiple CDN service clusters according to the planning of egress bandwidth and server resources. \

Selection of edge service container technology

In the process of considering to do containerization, we have carried out technical selection and research in the early stage, mainly in three directions: \

Standard K8s: Edge nodes join master as standard worker nodes, but there are problems in this way. If too many connections are made, it is easy to cause heavy load pressure on Relist and K8s master terminal. Network fluctuations caused pod evictions, resulting in unnecessary rebuilding

Access CDN edge nodes by node: Deploy K8s or K3s by cluster. In this way, there are too many control surfaces and clusters, so it is impossible to build a unified scheduling platform. Moreover, if each KPI cluster is deployed in high availability mode, at least three are required, which will occupy excessive machine resources.

Cloud-side access: Access through KubEedge. In this way, edge node connection can be converged and unified K8S cluster can be accessed, and cloud-side collaboration, edge autonomy and other capabilities can be provided, while retaining most of the original capabilities of K8S.

Edge node management scheme based on KubeEdge

The figure above is a schematic diagram of our current architecture. We build several K8s clusters in each regional center and data center. Below the K8s cluster are clusters connected to the region according to the nearby planning.

To avoid single point of access and excessive cloudcore load in a single K8S cluster, K8S clusters are built in each region for edge node access and application orchestration management. However, in the early version 1.3 of the community, the high availability solution provided at that time was only multiple groups of high availability, which was not able to meet the performance requirements of our large-scale management, so we later rolled out multiple groups of deployment in the community.

This deployment was fine in the early stages, but when the number of connected edge nodes and deployed containers became too large, the problem gradually became apparent:

Cloudcore multi-copy deployment is unbalanced

The diagram above shows hub to Upstream’s final submission to Apiserver. The intermediate upstream module is partially distributed and runs in the mode of single ctrip. As there is only one ctrip in the intermediate upstream module, the message submission is too slow, and some nodes of edge nodes cannot submit to Apiserver in time, which eventually leads to some anomalies in our deployment.

Later, we deployed Cloudcore as multiple copies, and the problem of unbalanced connection was found in the process of cloudcore upgrade or some unexpected restart. To solve this problem, we deployed a 4-layer LB before the multi-replica and configured load balancing policies such as ListConnection on the LB. However, in fact, the 4-layer LB like LS has some filtering caching mechanism, which cannot guarantee the balanced allocation of connections.

Based on the above background, we carried out optimization:

Cloudcore Multi-copy balancing optimization solution

  • After startup, each Cloudcore instance reports real-time information such as the number of connections through configMap. \

  • Cloudcore calculates the expected number of connections for each machine by combining the number of connections between the local instance and other instances

  • Calculate the difference ratio between the local connection number and the expected connection number, and the difference ratio is greater than the maximum allowable connection number difference. Enter the connection reception release stage, and enter a 30s observation period.

  • After observation, enter the next detection cycle until the connection is balanced.

Schematic diagram of connection number change

Equilibrium after restart

03 Deploying Edge Application Services

CDN accelerates the overall service flow

CDN is mainly divided into two core systems: scheduling system and cache system. The scheduling system will collect the CDN link status of the whole network, real-time node status and node bandwidth cost status in real time, so as to determine the coverage data of the optimal scheduling and push this data to a scheduler of Local Dns or 302 or HDS.

After the Local Dns obtains the optimal data, it carries out Dns resolution response. The client can access the edge cluster nearby through the resolution response. Because the edge cluster involves cache, it may be involved in miss. If the transfer cache is not available, the data will be returned to the cloud station, which is the overall process of CDN service.

In the cache system, the cache of different products will generally use different services, such as the acceleration service for live streaming media and some static acceleration will have some differences. This also creates some costs for development and maintenance, and their convergence may be a trend.

CDN cache service features

  • Resource exclusivity: the cache service is a service to maximize storage, storage on the machine, and bandwidth resources, so it needs to be exclusive \

  • Large scale: large scale, wide coverage, the same software or the same machine cache service, may undertake tens of thousands of domain names, or even 100,000 domain names

  • A disaster recovery partition fault: The cache loss of a small number of nodes in the group or the cache failure of a small number of nodes in the whole world is tolerated. If the cache failure occurs, the cache breakdown occurs. If the cache breakdown occurs, the access delay increases, and service exceptions occur.

  • High availability: 4/7 LB with real-time detection, cutting flow and drainage capabilities; L4 LB ensures the flow balance between hosts in the group. L7 LB tries to keep only one copy of each URL in the group through consistent hashing

From the above features, we can see that the following problems should be solved in the deployment of CDN:

  • How to make the node container orderly and controlled upgrade?
  • How to do version A/B testing?
  • How do I verify the upgrade process?

Our upgrade deployment plan includes:

Batch upgrade and intra-group upgrade concurrency control:

  • Create batch upgrade tasks
  • The controller is upgraded according to the specified machine

Fine-grained version Settings:

  • Create a host granularity version mapping
  • Controller adds POD version selection logic

Elegant upgrade:

  • Lifecycle prestop/postStart will allow regular flow and recovery
  • In special scenarios, the GSLB is linked to perform flow cutting

Upgrade verification: The Controller is linked to the monitoring system. During the upgrade, service exceptions are detected and terminated or rolled back in time

Choreography security protection: Workload granularity and POD granularity are modified and deleted by adding Adminsion Webhook to verify whether they meet expectations

Edge container Dr And migration based on KubeEdge CDN

Migration steps:

1) Backup etcd, restore etCD in new cluster;

2) Switch access DNS;

3) Restart Cloudcore to disconnect the cloud-side hub;

Advantages:

  • Low cost. Due to the edge autonomy feature of KubeEdge, there is no reconstruction of edge container and no interruption of service.

  • Simple process, controllable, high service security;

CDN mass file distribution

Requirement scenarios:

  • CDN Edge service configuration
  • GSLB scheduling decision data
  • The container image preheating task

04 Thinking about future Architecture evolution

Edge computing Challenge

Resource management: wide distribution, variety, structure, specifications are not unified \

Network delay, reliability: heterogeneous network, mobile network and other weak network environment, bandwidth limited, insufficient stability \

Security: edge service is more difficult to form a unified security protection system

Diversified services: Various scenarios and service types

Basic ability of edge computing platform based on CDN

Resources:

  • Extensive coverage of CDN nodes and redundant resources of tidal characteristics;
  • Cloud-side collaboration is provided through Kubeedge;
  • It is capable of deploying and managing heterogeneous resources.

Scheduling and Networking:

  • Dedicated EDNS supports precise scheduling at prefectures and cities to achieve real access to the nearest area;
  • Unified scheduling of CDN service and edge computing service;
  • Cloud side private network, management channel, data transmission, dynamic acceleration network more reliable;
  • Large-scale V6 support

Safety capability:

  • CDN Waf anti-D, traffic cleaning, near source interception, etc.
  • Certificate acceleration and security, SSL hardware unload, Keyless selfless key scheme, provide edge security access;

Gateway:

  • Edge scheduling and rich load balancing capabilities;
  • General protocol processing capability, conventional streaming media protocol, meet most Internet acceleration scenarios

Evolution of CDN edge computing

Border infrastructure construction

  • Node extension edge computing and CDN hybrid nodes
  • Service mesh network with node granularity
  • Perfect container isolation and security
  • The CDN gateway is ingress and universal
  • CDN Edge resource virtualization
  • Build the edge Serverless container platform
  • Build CDN scheduling and container unified resource scheduling platform

Explore business

  • Offline computing, video coding and transcoding, video rendering
  • Batch jobs
  • Dial test, pressure test

Finally, welcome to join the KubeEdge community and make the ecosystem of edge computing more prosperous!

Attached: KubeEdge community contribution and technical exchange address

Website: kubeedge. IO

Github address: github.com/kubeedge/ku…

Slack: kubeedge.slack.com

Mailing list: groups.google.com/forum/#! The for…

Weekly community meeting: zoom.us/j/416723730…

Twitter: twitter.com/KubeEdge

The document address: docs. Kubeedge. IO/en/latest /