PocketFlow is an automated deep learning model compression and acceleration framework that integrates multiple model compression and acceleration algorithms and uses reinforcement learning to automatically search for appropriate compression parameters. It solves the pain point that traditional deep learning model is difficult to deploy on mobile devices due to its large model size and high consumption of computing resources, and greatly reduces the technical threshold of model compression, enabling mobile TERMINAL AI application development.

This is a model compression framework suitable for developers at all levels of professional ability. Developed based on Tensorflow, it integrates multiple model compression and training algorithms developed by the current mainstream and AI Lab, and adopts the super-parameter optimization component to realize the whole process of automated managed model compression. Developers can quickly deploy AI technology to mobile products without knowing the details of specific model compression algorithms to achieve efficient local processing of user data.

At present, the framework has compressed and accelerated several mobile AI application models in Tencent, and achieved satisfactory results, playing a very important role in the overall online effect of the application.

Project Address:

Github.com/Tencent/Poc…

On making for the first time since 2016, tencent released open source project (https://github.com/Tencent), which has accumulated open-source covering areas such as artificial intelligence, mobile development, small programs nearly 60 projects.

With the popularization of mobile Internet and the growing maturity of AI algorithms, the demands and application scenarios of mobile AI are increasingly rich, such as intelligent beauty, weight loss, gesture recognition, scene recognition, game AI, video detection, vehicle-based voice interaction, smart home, real-time translation and so on. By deploying the model on the mobile end instead of the server end, real-time interaction can be realized, and it can run normally in the absence of network environment, and data privacy and security are protected, and operation and maintenance costs are reduced. In terms of mobile AI model deployment, small and fast has become one of the most desired requirements of model reasoning, because a small model can reduce the storage space of the entire application. Improving the reasoning speed of the model can accelerate the reaction speed of the whole model application scenarios, and can also achieve good results on low equipment. As a result, the reasoning speed of the model has gradually become one of the most important indicators for evaluating the competitiveness of major AI application markets. However, the reasoning speed or size of the model is often not enough for training, and most developers are confused by the extreme model compression and acceleration methods.

PocketFlow was born to solve this problem that has plagued many AI app developers.

Framework open source content:

PocketFlow Framework This open source content is mainly composed of two components, namely the model compression/acceleration algorithm part and the hyperparameter optimization part, as shown in the figure below:


The model compression/acceleration algorithm part includes a variety of deep learning model compression and acceleration algorithms:

L Channel pruning: In CNN network, by pruning the channel dimensions in the feature graph, the model size and computational complexity can be simultaneously reduced, and the compressed model can be directly deployed based on the existing deep learning framework. PocketFlow also supports the grouping finetune/retrain function for channel pruning, which in our experiments can significantly improve the accuracy of compressed models.

L Weight sparsification: by introducing sparsification constraints on network weight, the number of non-zero elements in network weight can be greatly reduced. The network weights of the compressed model can be stored and transmitted in the form of sparse matrix to realize the model compression.

L Weight quantization: By introducing quantization constraints on network weights, the number of bits required to represent each network weight can be reduced; We also provide support for uniform and non-uniform quantization algorithms, which can make full use of hardware optimization of ARM and FPGA devices to improve the computing efficiency of mobile terminals, and provide software support for future neural network chip design.

L Network distillation: For each of the above model compression components, we conducted training of the compressed model by using the uncompressed output of the original model as additional supervisory information, yielding accuracy improvements ranging from 0.5% to 2.0% with the same compression/acceleration ratio.

L Multi-GPU training: The deep learning model training process has high requirements on computing resources, and it is difficult for a single GPU to complete model training in a short time. Therefore, we provide comprehensive support for multi-machine and multi-card distributed training to speed up the development process of users. Both resNET-50 image classification model based on ImageNet data and Transformer machine translation model based on WMT14 data can be trained in one hour.

Hyper-parameter optimization can search for the most appropriate compression ratio of each layer to achieve the highest accuracy of the whole under the condition that the overall compression ratio is certain through reinforcement learning or AutoML. Most developers often know little about the model compression algorithm, and adjusting the parameters of the compression algorithm requires long-term learning and experiment to gain experience, but the value of hyperparameter often has a huge influence on the final result. PocketFlow’s hyperparametric optimization solves this problem and, in our experiments, is better than manual tuning by a professional model compression engineer. Its structure is shown as follows:


PocketFlow performance

The introduction of a hyperparameter optimization component not only eliminates the high threshold and tedious manual tuning, but also allows PocketFlow to outperform manual tuning in all compression algorithms. Taking the image classification task as an example, PocketFlow performs effective model compression and acceleration on a variety of CNN network structures such as ResNet and MobileNet on ciFAR-10 and ImageNet data sets.

In ciFAR-10 data set, PocketFlow used RESNET-56 as the benchmark model for channel pruning, and added training strategies such as hyperparameter optimization and network distillation to achieve a classification accuracy loss of 0.4% at 2.5 times acceleration and 0.7% at 3.3 times acceleration. It is significantly better than the uncompressed RESNET-44 model. On ImageNet data set, PocketFlow can continue the weight thinning of the original very thin MobileNet model, and obtain similar classification accuracy with smaller model size. Compared with the Inception-V1 and Resnet-18 models, the model size is only about 20-40% of the latter, but the classification accuracy is basically the same (or even higher).




The AutoML hyperparameter optimization component in the PocketFlow framework can achieve similar performance in just over 10 iterations compared to manual tuning. The searching hyperparameter combination can reduce the precision loss by about 0.6% after 100 iterations. PocketFlow achieves consistent performance improvements when compressing the Mobilenet-V1 model for the ImageNet image classification task by using a hyperparameter optimization component to automatically determine the quantized bits of each layer’s weight in the network. When the average number of quantized bits was 8 with PocketFlow, the accuracy rate increased from 70.89% before quantization to 71.29% after quantization.


Launching PocketFlow helps launch internal mobile app AI

Within Tencent, the PocketFlow framework supports model compression and acceleration for a variety of mobile services. For example, in the mobile photo APP, the face key point location model is a commonly used pre-processing module. Through the recognition and positioning of more than 100 feature points of the face (such as the corner of the eye, the tip of the nose, etc.), it can provide necessary feature data for the subsequent face recognition, intelligent beauty and other applications. Based on PocketFlow framework, we compress the face key point location model, which greatly reduces the computational cost while keeping the location accuracy unchanged, and achieves the acceleration effect of 1.3 ~ 2 times on the already very simplified network. The compressed model has been deployed in the actual product.


In the human body recognition project, PocketFlow accelerates the model reasoning speed by more than three times while meeting the requirements of on-line accuracy, which plays a decisive role in the project’s mobile terminal landing.