Editor’s note: This article is from the wechat official account InfoQ (ID: InfoQChina), written by Li Yonghui. 36 Krypton distributed by license.

Writing in the front

Deep learning technologies are already having an impact on many aspects of the Internet, and there are more and more discussions about deep learning and neural networks in the science news every day. With the rapid development of deep learning technology in the past two years, all kinds of Internet products compete to apply deep learning technology, and the introduction of products to deep learning will further affect people’s lives. With the widespread use of mobile devices, the application of deep learning and neural network technology in mobile Internet products has become an inevitable trend.

Image technology, closely associated with deep learning, is also widely used in the industry. The combination of traditional computer vision and deep learning has enabled the rapid development of image technology.

GitHub address: github.com/baidu/mobil…

Application of deep learning technology in mobile terminal

Baidu Application Case

Convolutional Neural Network (CNN) is a typical deep learning technology applied in mobile terminals, which is referred to as Convolutional Neural Network. Mobile -deep-learning (MDL) is a mobile terminal framework based on convolutional neural network.

What are the main application scenarios of MDL on mobile? The more common ones are sorting out what objects are in a picture; Or figuring out where and how big an object is in a picture, which is subject recognition.

The following App is called Sixiang and can be found in Android App Treasure. It automatically classifies photos for users, a feature that appeals to users with lots of photos.

In addition, image search can be opened on the right side of baidu search box on mobile phone. The interface effect after image search is opened is shown as follows. When the user turns on the automatic shooting switch under the general vertical category (marked at the bottom of the picture), it will automatically find the object for box selection when the hand stops steady, and initiate image search directly without taking photos. The whole process provides a smooth experience for the user, without the user taking a photo manually. The frame in the picture applies the typical deep learning subject recognition technology, using the mobile-deep-learning (MDL) framework. MDL currently runs several versions stably in Mobile Baidu, and its reliability has been greatly improved after several iterations.

Other Cases in the industry

There have been more and more cases of the Internet industry using neural networks on mobile terminals.

At present, there are two main schools. One is to run the neural network entirely on the client side. The advantage of this method is obvious, that is, it does not need to go through the network, and if the speed can be guaranteed, the user experience is very smooth. If the mobile terminal can ensure efficient running of neural network, users can not feel the loading process. Examples have been given of apps that completely break away from the Internet network and operate neural networks on mobile terminals, such as the aforementioned Shixiang and image search in Mobile Baidu.

The second is another, in which the process of an operational neural network relies on the Internet network and the client is only responsible for the UI presentation. Before the neural network landing on the client side, most apps use the way of such calculation on the server side and display on the client side. The advantage of this approach is that it is relatively easy to implement and cheaper to develop.

In order to better understand the implementation methods of the two neural networks mentioned above, two examples of recognizing plants and flowers are shown below, respectively using flower recognition and shape and color apps. Both apps use a typical classification method and can be found in the App Store on iOS. The picture below is a lotus flower, which can get better classification results by using the two apps of flower recognition and shape and color classification. You can try installing both apps and see which one of the above methods is used.

General flowers

Many flower recognition apps have emerged in the past year. Microsoft “Flower Awareness” is an App for identifying flowers launched by Microsoft Research Asia. Users can choose flowers after taking photos, and the App will give relevant information about such flowers. The accurate classification of flowers is a highlight of its external publicity.

Shape color

This “Shape-color” App just needs to take photos of plants (flowers, grass and trees), and can quickly give the names of plants. There are also a lot of interesting plant knowledge, such as what other nicknames the plant has, the flower language of the plant, related ancient poems, plant culture, interesting stories and conservation methods. After reading this App, we can gain a lot.

Difficulties in applying deep learning on mobile

Due to the limitation of technology threshold and hardware condition, there are few successful cases of applying deep learning in mobile terminal. Traditional mobile UI engineers have little access to mobile deep learning data when writing neural network code. On the other hand, the current Internet competition is quite fierce, the first to enter Xianyang Wang, can take the lead in the application of deep learning technology in mobile terminal, can grasp the opportunity of The Times earlier.

The computing power of mobile devices is very small compared to that of PCS. The CPU of the mobile terminal has to maintain the power consumption at a very low level, which restricts the improvement of the performance index. Doing neural network computation in App will cause CPU computation to soar. How to coordinate user power consumption index and performance index is very important.

Baidu image search client team began to tackle the application of deep learning technology in mobile terminals at the end of 2015. Finally, the challenging problems were solved one by one, and now the relevant codes have been running on many apps, including products with PV of 100 million yuan per day and products in the start-up stage.

It is already difficult to apply deep learning technology in mobile terminals, but it has to face the requirements of various models and hardware as well as the indicators of Mobile Baidu to apply deep learning technology in mobile devices. How to make neural network technology stable and efficient operation is the biggest test. Dismantling is the number one problem facing mobile teams. After a brief summary, we found that it is easier to present problems and difficulties when comparing mobile terminal and server terminal, and then made the following comparison between the application of deep learning technology on server terminal and client terminal.

Difficulty In comparing with the server Side Memory Memory: weakly limited on the server side – Limited memory on the mobile side Power consumption: unrestricted on the server side – Strictly limited on the mobile side Dependent library volume Dependent library volume: unrestricted on the server side – Strongly limited on the mobile side Model volume Model size: From 200 MB for the conventional model on the server – 10 MB for the mobile terminal Performance: Powerful GPU BOX on the server – CPU and GPU on the mobile terminal

In the development process, the team gradually solved the above difficulties and formed the current MDL deep learning framework. In order to enable more mobile engineers to quickly use the wheel and focus on business, Baidu has opened all relevant code, and the community welcomes anyone to join in the development process of building the wheel.

MDL framework design

Design ideas

As a mobile terminal deep learning framework, we fully consider the characteristics of the mobile application itself and the operating environment, and put forward strict requirements in speed, volume, resource occupancy and other aspects, because any of these indicators have a significant impact on user experience.

At the same time, scalability, robustness and compatibility are also taken into account at the beginning of our design. In order to ensure the scalability of the framework, we abstract the Layer layer, so that users of the framework can customize specific types of layers according to the needs of the model. We expect MDL to support more network models by adding different types of layers, without changing the code in other locations. In order to ensure the robustness of the framework, MDL throws C++ low-level exceptions to the application layer through reflection mechanism, and the application layer processes the exceptions by capturing them, such as collecting exception information through logs and ensuring sustainable software optimization. At present, there are many kinds of deep learning training frameworks in the industry, while MDL does not support model training ability. To ensure compatibility of the framework, we provide tool scripts for converting Caffe model into MDL, where users can complete model conversion and quantification with a single command. In the future, we will support PaddlePaddle, TensorFlow and other models to be converted into MDL, which is compatible with more types of models.

The overall architecture

The overall architecture design diagram of MDL framework is as follows:

MDL framework includes MDL Converter module, Loader module, Net module, Gemmers module and JNI Interfaces for Android. The model conversion module is mainly responsible for converting Caffe model into MDL model, and supports quantizing 32bit floating-point parameters into 8bit parameters, thus greatly reducing model volume. The model loading module mainly completes the process of model inverse quantization, loading verification, network registration, etc. The network management module is mainly responsible for the initialization and management of each Layer in the network. MDL provides a JNI interface layer that can be invoked by the Android side. Developers can easily complete the loading and prediction process by invoking the JNI interface.

MDL positioning is easy to use

The MDL open source project was clearly defined from the beginning. In the research and development process of mobile platform technology with complex equipment and low performance, it is very attractive to find suitable scenarios for novel deep learning technology and apply it to our own products. However, if every mobile engineer has to rewrite the implementation of the whole neural network in the process of applying deep learning, it will increase the cost. MDL is designed to be simple to use and deploy neural networks without much configuration or modification if basic functions are used, or even the compilation process of machine learning libraries. It just needs to focus on the specific business implementation and how to use it.

At the same time, the simple and clear code structure of MDL can also be used as learning materials to provide reference for the r & D engineers who just touch deep learning. Because we support cross-compilation on mobile platforms as well as x86 compilation on Linux and Mac, we can compile and run the code directly on the work computer while adjusting the deep learning code, without deploying to the ARM platform. All it takes is a few lines of code, check out MDL’s GitHub Readme.

# https://github.com/baidu/mobile-deep-learning

# mac or linux:

./build.sh mac

 cd build/release/x86/build

./mdlTest

The complex build process often takes longer than the development process, and android can handle so and test with just one line in MDL./build.sh.

./build.sh android

MDL performance and compatibility

  • The volume armv7 300 k +

  • Speed iOS GPU Mobilenet up to 40ms, Squeezenet up to 30ms

MDL has been iterating for more than a year from project to open source. The mobile terminal is concerned about a number of indicators, such as volume, power consumption, speed, are performing well. Baidu’s internal product line has been compared for many times before application. Compared with relevant open source projects, MDL can guarantee speed and energy consumption while supporting a variety of deep learning models, such as Mobilenet, Googlenet V1, Squeezenet, etc., with iOS GPU version. Squeezenet can reach a maximum of 3-40ms in one run.

Comparison of similar frames

Frame caffe2sorflow NCNNDL (CPU)MDL(GPU) hardware CPUCPUCPUCPUGPU speed slowly fast extremely fast volume large size small compatible with Android&iOSAndroid&iOSAndroid&iOSAndroid& ios android &iOSiOS android &iOSiOS android &iOSiOS android & ios OS

Compared with the mobile terminal framework supporting CNN, MDL has high speed, stable performance, good compatibility and complete demo.

compatibility

MDL can run stably on both iOS and Android platforms, among which iOS10 and above platforms have GPU-based API and excellent performance, while Android platform is pure CPU operation. The running status of high and low end models and coverage on Mobile Baidu and other apps have absolute advantages.

MDL also supports direct conversion of Caffe models to MDL models.

Overview of MDL features

At the beginning of the research and development related to mobile AI, baidu image search team compared most of the same CNN framework that has been open source, which also exposed the problems in this direction. Some frame experimental data have excellent performance, but in actual products, the performance is poor and unstable, or the models cannot be fully covered, or the volume cannot reach the online standard. To avoid these problems, MDL adds the following Features:

  • With one click of deployment, script parameters can be switched between iOS and Android

  • Support automatic conversion of Caffe model to MDL model

  • GPU support

  • The stable operation of MobileNet, GoogLeNet V1, Squeezenet models has been tested

  • Small size, no third party dependence, pure handmade

  • Provide quantization script, directly support 32-bit float to 8-bit UINT, model volume quantized around 4M

  • Communicated with ARM algorithm team online and offline for many times, and continued to optimize for ARM platform

  • NEON uses convolutional, normalized, pooled, and other operations

  • Loop unrolling, in order to improve performance and reduce unnecessary CPU consumption, all unrolling judgment operation

  • Brings a large number of heavy computing tasks into overhead

Subsequent planning

  • To further reduce MDL size, MDL does not use protobuf as the model configuration store, but uses Json format. MDL currently supports Caffe model conversion to MDL model, and will support all mainstream models conversion to MDL model in the future.

  • With the improvement of computing performance of mobile terminal devices, GPU will play a very important role in the field of mobile terminal computing in the future. MDL attaches great importance to the implementation of GPU. Currently, MDL supports the running of iOS GPU, which can be used in all models with iOS10 or later versions. According to the statistics obtained so far, iOS10 has covered most of the iOS system, in iOS10 can use CPU operations. In addition, although the GPU computing power of the Android platform is generally weak compared with CPU, the GPU of the emerging new models has become more and more powerful. MDL will also be followed by the implementation of GPU Feature. GPU computing on the Android platform based on OpenCL will improve the computing performance of high-end models.

Developers are welcome to contribute code

The stable and efficient operation of mobile terminal neural network is inseparable from the coding of many developers. MDL has long been in line with the principle of reliable operation, practical but not beautiful, hoping to contribute to mobile terminal deep learning technology. We strongly welcome people of insight to join us and make deep learning technology widely used in mobile terminals and spread in China.

Finally, here is the GitHub directory of MDL:

https://github.com/baidu/mobile-deep-learning