AI scientist Jia Yangqing commented: “Compared with general frameworks such as Tensorflow and Caffe2, which cover both training and reasoning, MNN pays more attention to acceleration and optimization during reasoning and solves the efficiency problem during model deployment, thus realizing the business behind the model more efficiently on the mobile terminal. This coincides with the idea of inference engines such as server-side TensorRT. In large-scale machine learning applications, the reasoning side of machine learning is often more than ten times as much computation as the training side considering the large scale model deployment, so the optimization of the reasoning side is especially important.”


How is the technical framework behind MNN designed? What are your future plans? Let’s take a closer look today.

1. What is MNN?



MNN is a lightweight deep learning end-to-end reasoning engine. The core of MNN is to solve the end-to-end reasoning operation problem of deep neural network model, covering the optimization, transformation and reasoning of deep neural network model. At present, MNN has been used in more than 20 apps such as Taobao, Tmall, Youku, Juhuasuan, UC, Feigu and Qianniu, covering live broadcasting, short video, search recommendation, commodity image search, interactive marketing, rights and interests issuance, security risk control and other scenes, running steadily over 100 million times a day. In addition, cainiao self-service container and other IoT devices have applications. MNN was used in scenes such as the Smiley red envelope at the Tmall gala, A sweep, and a celebrity fist fight during the 2018 Singles’ Day shopping festival.

The project is now open source on Github. Follow ali Technology’s official account and reply to “MNN” in the dialog box to get the Github download link and learn more.

2. Advantages of MNN

MNN is responsible for loading network model, and inference prediction returns relevant results. The whole inference process can be divided into loading and parsing of model, scheduling of calculation graph, and efficient operation on heterogeneous backend. MNN has the characteristics of versatility, lightweight, high performance and ease of use:

General:

  • Support Tensorflow, Caffe, ONNX and other mainstream model formats, support CNN, RNN, GAN and other common networks;
  • Support 86 TensorflowOp, 34 CaffeOp; Number of MNN Op supported by each computing device: CPU 71, Metal 55, OpenCL 40, Vulkan 35;
  • Supports iOS 8.0+, Android 4.3+, and POSIX embedded devices.
  • It supports mixed computing on heterogeneous devices. Currently, it supports CPU and GPU. The GPU Op plug-in can be dynamically imported to replace the IMPLEMENTATION of CPU Op.

Light weight:

  • Deep customization and tailoring according to the characteristics of end-side devices, without any dependence, can be easily deployed to mobile devices and various embedded devices;
  • On iOS, armV7 + ARM64 static libraries are about 5MB in size, link generation executable files are about 620KB in size, and metallib files are about 600KB in size.
  • On the Android platform, the size of SO is about 400KB, OpenCL library is about 400KB, Vulkan library is about 400KB;

High performance:

  • Do not rely on any third party computing library, rely on a large number of handwriting assembly to achieve core computing, give full play to the ARM CPU computing power;
  • GPU acceleration (Metal) can be enabled on iOS devices, which supports iOS 8.0 or higher and is faster than Apple’s native CoreML on common models.
  • Android offers OpenCL, Vulkan, and OpenGL solutions to meet as many device needs as possible, with deep tuning for mainstream Gpus (Adreno and Mali).
  • Convolution and inversion convolution algorithms are efficient and stable, and can run efficiently for convolution of any shape. Winograd convolution algorithm is widely used to efficiently realize symmetric convolution such as 3×3 -> 7×7.
  • Additional optimization for the new architecture of ARM V8.2, the new device can take advantage of the characteristics of semi-precision computing to further speed up;

Ease of use:

  • Complete documentation and examples;
  • Efficient image processing module, covering common deformation, transformation and other requirements, in general, no additional introduction of libyuv or OpencV library image processing;
  • Support callback mechanism, easy to extract data or control the running direction;
  • Support running partial paths in the network model, or specify parallel running between CPU and GPU;

3. MNN core introduction

3.1 Module Design



As shown in the figure above, MNN can be divided into Converter and Interpreter.

Converter is composed of Frontends and Graph Optimize. The former is responsible for supporting different training frameworks and MNN currently supports Tensorflow(Lite), Caffe and ONNX; The latter optimizes the graph by means of operator fusion, operator substitution and layout adjustment.

Interpreter consists of Engine and Backends. The former is responsible for model loading and computing graph scheduling. The latter includes memory allocation and Op implementation under each computing device. In Engine and Backends, MNN applies a variety of optimization schemes, including Winograd algorithm in convolution and deconvolution, Strassen algorithm in matrix multiplication, low-precision computation, Neon optimization, handwritten assembly, multithreading optimization, memory overcommitment, heterogeneous computation, etc.

3.2 Performance Comparison

MobileNet, SqueezeNet and mainstream open source frameworks commonly used in business are compared, and the results are shown as follows:



MNN has more than 20% advantage over NCNN, Mace, Tensorflow Lite and Caffe2. In fact, we focus more on the internal use of business model optimization, in-depth optimization for face detection and other models, iPhone6 can achieve a single frame detection of about 5ms.

Note: Mace, Tensorflow Lite, Caffe2 all use the Master branch of GitHub code repository as of March 1, 2019; NCNN uses 20181228 Release precompiled library due to compilation problems.

4. Open source history of MNN

4.1 Why do side to side reasoning?

With the continuous improvement of mobile computing power and the rapid development of deep learning, especially the continuous maturity of small network model, the reasoning and prediction originally performed in the cloud can be transferred to the terminal. Compared with server-side intelligence, end-to-end intelligence has advantages such as low latency, data privacy, and cloud resource saving. At present, terminal intelligence is gradually becoming a trend, from the industry perspective, it has played a great value in AI camera, visual effects and other scenes.

As a super App of e-commerce, Handtao has rich business forms. Business scenes such as Bialitao, live short videos, interactive marketing, makeup testing and personalized recommendation search all have the appeal of terminal intelligence. Combined with terminal intelligence ability, it can bring users new interactive experience and help business innovation and breakthrough.

Generally speaking, the application of end-to-end deep learning can be divided into the following stages:



  • Model training phase, mainly solve model training, using annotated data to train the corresponding model file. The model size and computation need to be considered when designing the model for end-to-end.
  • In the model compression stage, the model size is mainly optimized, which can be reduced by pruning, quantization and other means for use on the end.
  • Model deployment phase, mainly realize model deployment, including model management and deployment, operation and maintenance monitoring, etc.
  • In the end – side reasoning stage, model reasoning is mainly completed, that is, loading the model and completing all calculations related to reasoning;

It can be seen from the above that the end-to-end reasoning engine is the core module of the end-to-end intelligent application, which needs to efficiently utilize resources and quickly complete reasoning under the limitations of limited computing power and memory. It can be said that the performance of the end-to-end inference engine directly determines whether the algorithm model can run on the end-to-end and whether the business can be online. So, we need an end-to-end inference engine, a good end-to-end inference engine.

4.2 Why open source MNN?

At the beginning of 2017, before we started engine development, we focused on the research of system solutions and open source solutions, from the aspects of versatility, lightweight, high performance, security and so on. CoreML is Apple’s system framework, MLKit and NNAPI are Android’s system framework. The biggest advantage of system framework is lightweight — relatively large in package size. The biggest disadvantage is versatility. CoreML requires iOS 11+, while MLKit and NNAPI require Android 8.1+. The models that can be covered are very limited, and it is difficult to support the use scenarios of embedded devices. In addition, the system framework supports fewer network types and Op types, and the scalability is poor. The computing power of the equipment has not been fully utilized, and there are problems in model security. In summary, the system framework is not a good choice. In the open source solution, Tensorflow Lite has not been announced, Caffe is mature but not designed and developed for end-to-end scenarios, and NCNN has just been released and is not mature enough. In general, we cannot find a set of simple, efficient and safe end-to-end reasoning engine for different training frameworks and deployment environments.

Therefore, we hope to provide a simple, efficient and secure end-to-end inference engine MNN for different business algorithm scenarios, different training frameworks and different deployment environments. It can smooth the differences between Android and iOS, the differences between fragmented devices, and the differences between different training frameworks to achieve rapid end-to-end deployment and operation, and can flexibly add OP and deeply optimize the performance of heterogeneous devices such as CPU/GPU according to the business model.

As time goes by, NCNN, Tensorflow Lite, Mace, Anakin and others gradually upgrade and open source, giving us good input and reference. We are also iterating and optimizing with the business needs, and have experienced the double 11 test, has been relatively mature and perfect, so open source to the community, hope to contribute our strength to the application and IoT developers.

5. Application scenarios

At present, MNN has been integrated into 20+ Group apps such as Taotao, Maoke, Youku, Juhuasuan, UC, Feizu, Qianniu, etc. It is used in scenes such as Pailitao, live short video, interactive marketing, real person authentication, makeup testing, search and recommendation, and runs steadily over 100 million times a day. MNN was also used in scenes such as cat night smiley red envelope and scan celebrity guessing game during the 2018 Singles’ Day shopping festival.



Bealitao is an image search and recognition product in Handtao. After its first launch in 14 years, it has grown into an application with more than 10 million UV users through continuous iterative development. Among them, the technology is constantly updated, from the earliest cloud recognition of taking photos and uploading pictures, to the current cloud recognition of object recognition and matting uploading pictures on the end, which effectively improves user experience and saves computing costs on the server side. For some simple object classification recognition and logo recognition, it has also supported real-time recognition directly through the model on the end.

Smiling face a red envelope is a cat late 18 years double tenth the opening of the first show, the game is based on real-time face detection and face recognition ability, compared various interactions through the touch screen, before this activity through camera real-time face detection algorithm to realize from the traditional touch interactions play a natural interaction play across, bring new user experience to the user.

Collecting five blessings is the 19th Spring Festival activity, and it is the first time for handshopping to join this activity through the way of shopping. By scanning the product identification ability, identify the red New Year goods, in addition to fuka, but also get down quilt, Wuliangye, Maotai, King crab and other physical prizes and maochao, Tmall spirit and other threshold free coupons, so that the family’s New Year goods into a golden egg “hen”.

6, Roadmap

We plan to Release a stable Release every two months. The current plan is as follows:

Model optimization:

  1. Perfect Converter image optimization
  2. Improved support for quantization and added support for sparsity

Scheduling optimization:

  1. Added model FLOPS statistics
  2. Dynamic scheduling of running policies based on device hardware characteristics

Computational optimization:

  1. Existing Backend continuous optimization (CPU/OpenGL/OpenCL/Vulkan/Metal)
  2. Optimized Arm V8.2 Backend to support quantization models
  3. NNAPI is used to add NPU Backend
  4. Fast matrix multiplication and Winograd algorithm were used to optimize performance

Other:

  1. Documentation and Examples
  2. Improve test and benchmark tools
  3. Support for more Op


The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.