Preface:

Edge AI is a very exciting area right now, with a lot of development and innovation to come. For years, there has been a clear trend in machine learning predictions to move down to embedded hardware that is closer to the user, does not require a network connection and can solve complex problems in real time (such as autonomous driving). There are many models with new frameworks and engines that take up much less space and are specifically designed to run on Edge devices. In addition, it is much easier to address the very important issues of user privacy and security when their personal data does not leave the edge of the device. Complex algorithms that analyze the results of reasoning can be executed on edge devices, sending only ultimately confusing information to the cloud (for example, an alert of something unusual).

This article builds on our work at Darwin Edge, where we focus on Edge AI applications for healthcare and manufacturing. For the performance demonstration of the board, we used the well-known open source NCNN high-performance neural network inference computing framework, optimized for mobile and embedded platforms (github.com/Tencent/ncn… Cmake configates and builds and configates sysroot using cross-compilation tools), and for simple board performance comparisons, the NCNN default inference baseline has been implemented (benchNCNN). The NCNN benchmark is statically built, and most of its dependencies are, so it’s generally not difficult to get it to run on a standard embedded Linux system.

In our development, we often use the Bonseyes developer platform (www.bonseyes.com/), which can use the same work… The platform comes with tools and Docker images that can be used in cross-compilation environments for many embedded platforms. It supports building platform images, setting up target boards, and cross-compilation of custom applications, which is performed in a platform-independent manner by building applications inside containers.

Click a concern, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Raspberry PI type 4 b

The well-known platform that makes embedded hobby applications accessible and popular can also be used for moderately complex machine learning applications. At an affordable price of around $50, it’s a great tool for enthusiasts of edge computing.

Raspberry PI 4

There are two Linux-based OS distributions available — the official Raspberry Pi Distribution Raspberry Pi OS (Raspbian) and the Ubuntu Raspberry Pi port.

Raspbian is more user-friendly (and easier for beginners) in configuring and managing the target board, but Ubuntu has broader support for external applications and libraries, with access to the standard Ubuntu ARM package repository. AI application porting and cross-compiling may be easier. Both distributions are available in 32-bit and 64-bit versions.

In our experience, 64-bit systems perform 50% faster than 32-bit systems in machine learning application benchmarks.

Raspberry PI 4 Hardware specifications

With 100,000 tests in Vulkan 1.0’s Khronos conformance test suite, an open source program to develop Vulkan drivers for Raspberry Pi 4 Gpus has been officially released. Unfortunately, Vulkan driver development is focused on 32-bit platforms, with very little support for 64-bit operating systems.

In general, Ubuntu (or other well-supported Linux distributions) is a good cross-compilation environment for Raspberry Pi 4. The GNU cross-compilation toolchain can easily install linux-gnu from official repositories (g++-aarch64-linux-gnu and gcc-aarch64-).

A good way to implement a cross-compile environment is to create a Docker image, where all host development tools are installed, and use the Ubuntu multi-platform repository support to create the target system root, And having the Ubuntu APT tool parse and download all the dependencies of the target board library requires this sysroot to be fully functional (more on this topic in a future article).

The following table shows the NCNN benchmark results for Raspberry Pi 4B:

Jetson AGX Xavier

Aimed primarily at edge machine learning applications, NVIDIA’s flagship product is based on the NVIDIA® Jetson AGX Xavier module. It is a powerful platform with a rich ecosystem of NVIDIA tools and libraries for developing AI applications.

The easy-to-use NVIDIA SDK manager running on PC workstations supports downloading and installing the latest available operating system (currently Ubuntu 18.04) as well as additional libraries and drivers associated with onboard hardware. Many of the pre-installed libraries have additional support for NVIDIA hardware (such as GStreamer hardware encoding/decoding).

Jetson AGX Xavier Hardware specifications

Volta Gpus with 512 CUDA cores provide decent computing and processing power. It supports CUDA computing capability of 7.0. By comparison, the well-known NVIDIA GTX 1060 PC GPU has Compute Capability 6.0 support and 1280 CUDA cores.

In addition, because CUDA was the first general-purpose GPU programming platform and has been used primarily in machine learning over the past decade, it has the unique advantage of being able to easily use existing AI frameworks and libraries directly on board with simple cross-compilation. Ubuntu systems running on the target support easy installation of apt tools using many of the packages available in the Ubuntu ARM repository.

In general, because of its power, ease of setup, and availability of packages, the target board itself can be used to compile native custom applications without setting up a cross-compilation environment on a PC workstation. The process typically involves cloning the Github repository directly on board, running cmake to check for missing dependencies, installing those dependencies using the apt-get tool and performing reconfiguration, and then performing make to build the application. Compiling on AGX Xavier is an order of magnitude slower than on a high-end modern PC workstation, but for mid-sized projects it’s an easy way to test and try new things.

AGX Xavier also supports TensorRT, an NVIDIA engine for running optimized AI reasoning. It compiles standard Model formats (such as ONNX) into highly optimized code that can be executed on a GPU or deep learning accelerator. More information about TensorRT can be found on the NVIDIA website.

The AGX Xavier Developer Kit costs about $850, which almost makes it the Edge AI’s first hobby kit. Its cheaper ($100) cousin, the NVIDIA Jetson Nano, lacks such a good GPU (it’s based on the Maxvell architecture — 128-core CUDA computing power 5.2) and tensorrt-enabled deep learning accelerator, But the development environment and tools/OS are the same, so it’s a good strategy to start simple with Nano boards and then easily switch to the AGX Xavier platform when complexity of machine learning applications starts to hurt performance.

The following table shows the NCNN benchmark results for the AGX Xavier Developer Suite:

NXP i.MX 8 Multisensory Enablement Kit (MEK)

NXP has a range of I.Mx 8 application processors for advanced graphics, imaging, machine vision and security-critical applications, as well as a range of presentation/target platforms with a variety of peripherals based on these cpus. NXP I. MX 8 Multisensory Enablement Kit (MEK) is a convenient machine learning application platform.

It is very powerful and has many included hardware/extension features:

Priced at 1000+$, it is intended to be used as a professional platform.

There is extensive documentation of the board on the NXP website, as well as available Linux images that can be easily burned to an SD card on a PC using tools such as Etcher. The operating system used on board is NXP custom based on the default Poky Linux distribution of Yocto Linux. This customization adds many of the libraries common in Edge AI applications (OpencV, GStreamer, QT, working Vulkan drivers).

Unboxing and booting the MEK board from the pre-burned SD card is easy, but setting up the workstation PC as a cross-compile environment requires more work. It requires building the system root and SDK using Yocto, and cross-compiling custom applications to run on the board.

Yocto is a very complex build system that provides great functionality in terms of the ability to customize board images and various aspects of the system library, but with a steep learning curve that requires a lot of time to understand and practice. Customizing the default NXP iMX8 build (for example, if some additional open source library is missing) is tedious. Yocto downloads the source code and builds everything from scratch (including the GNU cross-compilation toolchain tailored to build a computer environment), an initial build that would take a full day on a high-end modern PERSONAL computer.

NCNN benchmarks performed on the platform gave the following results:

Conclusion

For those who want to jump into the exciting Edge AI space, there are a number of easy-to-use target platforms to choose from. We compared several platforms. For hobby/demo purposes, there are cheap but machine learnable Raspberry PI 4 and NVIDIA Jetson Nano. Nvidia AGX Xavier Nvidia AGX Xavier is Nvidia’s flagship product with convenient tools and Ubuntu OPERATING system, good GPU and CUDA support. NXP I. MX 8 is a more complex platform to handle, targeting specialized industrial and business applications.

Author: Atanasievski

CV Technical Guide

Original link:

Medium.com/darwin-edge…

This article comes from the public CV technical guide technical summary series.

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Reply keyword “technical summary” in the public account to obtain the summary PDF of the original technical summary article of the public account.

Other articles

CV technical Guide – Summary and classification of essential articles

Summary of tuning methods for hyperparameters of neural networks

CVPR2021 | to rethink BatchNorm in Batch

ICCV2021 | to rethink the visual space dimension of transformers

CVPR2021 | Transformer used for End – to – End examples of video segmentation

ICCV2021 | (tencent optimal figure) count and rethink the crowd positioning: a purely based on the framework

Complexity analysis of convolutional neural networks

A review of the latest research on small target detection in 2021

Self attention in computer vision

Review column | attitude estimation were reviewed

CUDA optimizations

Why is GEMM at the heart of deep learning

Why are 8 bits enough to use deep neural networks?

Capsule Networks: The New Deep Learning Network

Classic paper series | target detection – CornerNet & also named anchor boxes of defects

What about the artificial intelligence bubble

Use Dice Loss to achieve clear boundary detection

PVT– Multifunctional backbone without convolution intensive prediction

CVPR2021 | open the target detection of the world

Siamese network summary

Past, present and possibility of visual object detection and recognition

What concepts or techniques have you learned as an algorithm engineer that have made you feel like you’ve improved by leaps and bounds?

Summary of computer vision terms (a) to build the knowledge system of computer vision

Summary of under-fitting and over-fitting techniques

Summary of normalization methods

Summary of common ideas of paper innovation

Summary of efficient Reading methods of English literature in CV direction

A review of small sample learning in computer vision

A brief overview of intellectual distillation

Optimize the read speed of OpenCV video

NMS summary

Loss function technology summary

Summary of attention mechanism technology

Summary of feature pyramid technology

Summary of pooling techniques

Summary of data enhancement methods

Summary of CNN structure Evolution (I) Classical model

Summary of CNN structural evolution (II) Lightweight model

Summary of CNN structure evolution (iii) Design principles

How to view the future trend of computer vision

Summary of CNN visualization technology (I) Feature map visualization

Summary of CNN visualization technology (II) Convolution kernel visualization

CNN visualization technology summary (iii) class visualization

Summary of CNN visualization technology (IV) Visualization tools and projects