SuperEdge added domestic intelligent acceleration card support, for edge intelligent reasoning speed up 10 times

The author

Cambrian AE team, Tencent Cloud Container Center Edge Computing team, SuperEdge developer

SuperEdge supports the home-grown Smart Accelerator Cambrian MLU220

SuperEdge’s corresponding commercial product TKE Edge has also been continuously working on hardware and acceleration, not only supporting NVIDIA series GPU acceleration, but also GPU virtualization, QGPUization and other aspects. The united Cambrian supports the domestic intelligent edge acceleration card to facilitate the user to conduct model training on the edge and improve the performance of edge intelligent reasoning. The following is a joint statement on the compatibility of the Domestic Cambrian Edge Computing accelerator card after joint testing by the Cambrian AE team and the SuperEdge open source team.

The mLU220-M.2 intelligent edge acceleration card and SuperEdge distributed edge container management system are compatible with each other, which can provide tens of times of acceleration capability for video, image, voice and other applications of edge equipment equipped with M.2.

The throughput performance comparison of the two classification networks on CPU and M.2 is given below.

A network model	M.2(fps)	CPU(fps)
vgg16	184	13
resnet50	417	29

It can be seen that vgg16 m.2 is 14 times of the ordinary i7-8700k, and Resnet50 is also 14 times of the ordinary i7-8700k.

The CPU is Intel(R) Core(TM) I7-8700K AT 3.70ghz

SuperEdge Edge container solution

SuperEdge is an edge distributed container management system based on native Kubernetes, which was released by Tencent Cloud and Intel, VMware, Huya, Cambrian, Meituan and Capital online in December 2020. The system extends the cloud native capability to the edge, and well realizes the management and control of the cloud to the edge, greatly simplifying the process of application deployment from the cloud to the edge. In September 2021, it was accepted by CNCF Foundation as the CNCF Sandbox project, supervised and operated by CNCF.

SuperEdge offers the following capabilities:

Edge of the autonomous

Cloud network is often weak network, the middle may be wired, wireless, WIFI… Connection, may be 4G, 5G network, cloud edge chain is normal. The disconnection time may vary from three to five minutes, to several hours or days. So how to ensure that marginal services are not expelled and continue to provide normal services? The edge autonomy capability of SuperEdge ensures the stable operation of edge services when the cloud is disconnected. Even if the edge node is powered off and restarted, edge services deployed to the node can be automatically restored to continue running.

Distributed Health Check

The edge distributed health check capabilities provided by SuperEdge serve two purposes:

SuperEdge will deploy an Edge-Health deamonset on each edge node as long as the edge nodes are normal. Nodes of the same edge Kubernetes cluster will regularly Check each other and vote on each other’s health. And feed the results back to the cloud. Even if a node in the edge Kubernetes cluster is disconnected from the cloud, other nodes will report its health back to the cloud and will not expel that node.
Distributed health check can be grouped, that is, the edge nodes of the Kubernetes cluster are divided into multiple groups (the same machine room, or the same area), and the nodes in each group check each other. The advantage of this method is to avoid the situation that the Check data interaction between nodes becomes larger after the cluster scale increases, which occupies node traffic and makes it difficult to reach consensus on voting results.

Edge-health’s design avoids mass Pod migration and rebuilding due to cloud side network instability, ensuring the stability of edge services.

Service Access control

ServiceGroup developed by SuperEdge implements edge computation-based service access control, which has three main functions:

One key to deploy edge services to different sites can be deployed to the same set of services in the same edge Kubernetes cluster of different sites, the service of each site is completely consistent. The feature currently supports DeploymentGrid and ServiceGrid, two Custom resources that make it easy to deploy a set of services across multiple rooms or regions in a cluster.
The same site can realize a traffic closed-loop. Although each site has the same set of services, the access of the local site is locked within the local site, and the same set of services of other sites are not accessed across sites.
Automatic deployment of corresponding services for new sites A newly added site can automatically deploy the same set of site services by specifying service labels, providing automatic deployment services for site expansion.

Cloud side tunnel

Kubectl logs, Kubectl exec… Kubectl logs, kubectl exec… And other cloud access to the edge node completely failed. SuperEdge’s self-developed self-built cloud tunnel (currently supporting TCP, HTTP, HTTPS, and SSH) solves the problem of cloud connection in different network environments. Implement unified cloud operation and maintenance for edge nodes without public IP addresses.

Add LAN edge nodes in batches and remotely operate and maintain LAN edge nodes

To solve the problem of mass access to edge nodes in a production environment, the SuperEdge team provided the Penetrator-Controller component that enabled access to thousands of edge nodes in the LAN and remote login to edge nodes in the CLOUD for remote operation and maintenance.

More features can be found on SuperEdge’s official website: SuperEdge. IO. Cooperation and exchange can be found on the community github.com/superedge/s… Mention Issuse.

What is the MLU220?

Mlu220-m.2 is an acceleration card specially designed for edge computing in the Late Cretaceous. It integrates 8TOPS theoretical peak performance on the standard M.2 acceleration card, which is about the size of a finger, and consumes only 8.25W power. It supports a variety of ARTIFICIAL intelligence applications such as vision, speech, natural language processing and traditional machine learning to achieve edge intelligent solutions for various businesses.

The MLU220 has the following features:

Little bigger intelligent si yuan 220 chip is geared to the needs of intelligent edge side tailored solutions, under the size of the U disk size 8 road can provide high-definition real-time intelligent video analysis, which can be widely supported vision, voice, natural language processing and artificial intelligence application of traditional machine learning and other highly diversified, edge to compute nodes with the wisdom of the brain.
The MLUv02 architecture is not a simple upgrade from the previous generation. The new architecture is built on network on chip (NOC), with the parallel efficiency of multiple NPU clusters. Hardware-based in-chip data compression improves cache capacity and bandwidth. New architecture provides INT16, INT8, INT4, FP32, FP16 comprehensive support AI precision, satisfy the requirement of the diversified neural network computing power, gm, performance capabilities.
The computational elasticity and programmable Element 220 chip supports multiple types of neural networks, and the NeuWare software stack makes it easy to deploy reasoning environments. BANG Lang programming environment can directly customize computing resources to meet diversified AI customization requirements, professional rather than special.

Acceleration card hardware specifications

The hardware specifications of the accelerator card can be summarized as follows:

parameter	specifications
model	MLU220-M.2
memory	4GB, LPDDR4, 3200MHz
AI work force	8TOPS(INT8)
Codec capability	Supports H.264,H.265, VP8, VP9; Decode 8 x1080p @ 30 hz; Code 4 x1080p @ 30 hz
Image decoding	Supports JEPG maximum resolution of 8K; Decode 410 fps@1080P; Code 400 FPS @ 1080 p
The interface specification	M.2 2280, B+M Key (PCIE 3.0 X2)
Power consumption	8.25 W (3.3 V 2.5 A)
Structure size	80mm x 22mm x 7.3mm(without heat dissipation)/21.3mm(with heat dissipation)
Heat dissipation	Passive cooling

What can the MLU220 be used for?

Due to its small size and powerful computing power, MLU220 can be widely used in intelligent transportation, smart power grid, intelligent manufacturing, intelligent finance and other edge computing scenarios. The following are some typical application scenarios:

Intelligent transportation

To ensure road safety and order, multi-channel cameras and MLU220 edge acceleration cards are deployed at intersections and key roads in the city. MLU220 can decode the input image of multi-channel camera. Based on deep learning technology, MLU220 can realize the detection, tracking and structuralization of pedestrians, motor vehicles and non-motor vehicles in monitored sections, and further achieve intelligent traffic flow statistics, illegal capture and forensics, and key people and vehicles identification and capture, which greatly improve the efficiency of traffic departments.

Wisdom factory

In order to build a smart factory of modern intelligent manufacturing, multi-channel cameras and MLU220 edge acceleration cards are deployed in the factory station. With an independent codec unit, MLU220 can decode multi-channel camera images. With the support of powerful computing power, the MLU220 can realize the detection and identification of workers, posture recognition and workpiece detection and identification, thus realizing the intelligent management of the factory to detect whether workers are on duty, whether workers are operating in compliance and whether the workpiece is placed according to regulations.

Wisdom of livestock

In order to realize livestock production management, safety and health monitoring and intelligent detection of breeding environment, multi-channel cameras and MLU220 edge acceleration card were deployed in the breeding base. Through deep learning technology, it can realize identification, detection, case segmentation and tracking of breeding pigs, and further realize intelligent breeding technologies such as breeding pig points, health detection, feeding statistics and slaughtering assistance, so as to reduce labor costs and provide breeding efficiency.

How do I use the Cambrian MLU220 on SuperEdge

We demonstrate how to use the Cambrian Edge Smart Accelerator card based on SuperEdge:

Create a SuperEdge edge Kubernetes cluster with edgeadm

Download the edgeadm installation package

arch=amd64 version=v06.. 0 && rm -rf edgeadm-linux-* && wget https://superedge-1253687700.cos.ap-guangzhou.myqcloud.com/$version/$arch/edgeadm-linux-$arch-$version.tgz && tar -xzvf edgeadm-linux-* && cd edgeadm-linux-$arch-$version && ./edgeadm
Copy the code

Initialize the edge Kubernetes Master node

./edgeadm init --kubernetes-version=1.182. --image-repository superedge.tencentcloudcr.com/superedge --service-cidr=10.96. 0. 0/12 --pod-network-cidr=192.168. 0. 0/16 --install-pkg-path ./kube-linux-*.tar.gz --apiserver-cert-extra-sans=<Master Public IP> --apiserver-advertise-address=<Master Intranet IP> --enable-edge=true
Copy the code

Added an edge node with a Cambrian Edge Smart accelerator card

./edgeadm join <Master Public/Intranet IP Or Domain>:Port --token xxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxx --install-pkg-path <edgeadm kube-* install package address path> --enable-edge=true
Copy the code

Edgeadm is used to install edge K8s cluster and native K8s cluster

Install the plug-in for the Cambrian Edge Smart Accelerator card

Install the edge Smart Accelerator card plug-in

kubectl create -f https://github.com/Cambricon/cambricon-k8s-device-plugin/blob/master/device-plugin/examples/cambricon-device-plugin-daemonset. yaml
Copy the code

Check whether the plug-in is installed successfully
```
Kubectl get node < NodeName> -o json --output="jsonpath={.status.allocatable}"
Copy the code
```
If the edge node status. Allocatable is available cambricon.com/mlu If the related resource value is available, the edge Intelligent Accelerator card and plug-in are successfully installed.
```
"allocatable": {
    "cambricon.com/mlu": "1", ## MUL card resources"cpu": "12"."memory": "16164684Ki"."pods": "110"
}
Copy the code
```
If cambricon.com/mlu is found in Allocatable and its resource value is greater than or equal to 0, the Cambrian Edge Smart Accelerator card and its plug-in have been successfully installed.

Download the mLU plugin at github.com/Cambricon/c…

Mlu monitoring component: github.com/Cambricon/m…

Use edge smart accelerator card for edge application acceleration

Specify cambricon.com/mlu when submitting the edge load to apply the Cambrian Edge Smart accelerator card for acceleration, for example:

apiVersion: v1
kind: Pod
...
spec:
  containers:
  - image: 10.1330.52.:5000/yolov4:latest
    name: yolov4-ctr
    resources:
      limits:
        cambricon.com/mlu: 1## Specifies acceleration card limitsrequests:
        cambricon.com/mlu: 1## specify acceleration card requests...Copy the code

In the future

In the future, Cambrian and Tencent Cloud will carry out more cooperation on edge hardware and edge cloud services, for edge AI, edge IoT, digitalization, artificial intelligence… Enable software and hardware, and support users with relevant capabilities in relevant commercial products. Welcome to pay attention to Tencent Cloud Edge computing cloud platform TKE Edge and Cambrian related accelerated commercial products, and try to experience more accelerated products at the Edge.

About us

More about cloud native cases and knowledge, can pay attention to the same name [Tencent cloud native] public account ~

Benefits:

① Public account background reply [Manual], you can get “Tencent Cloud native Roadmap manual” & “Tencent Cloud native Best Practices” ~

② Public number background reply [series], can get “15 series of 100+ ultra practical cloud original dry goods collection”, including Kubernetes cost reduction and efficiency, K8s performance optimization practices, best practices and other series.