When wechat applets meet TensorFlow: Server implementation

Is the annual Golden Tourism week, you are in the scenic spot to see people to follow the crowd to yi, or on the highway to watch the old lady playing tai chi? Travel golden week I generally try not to go out, this Eleven is no exception. On October 1, I ran a horse and a half to welcome the National Day. On October 2, I chose to go to a partial scenic spot: Zhangzhidong and Wuhan Museum. Today, I’m at home, eating and drinking, thinking about the wechat mini program to identify dogs.

Since I thought of developing a dog recognition app, my first instinct was to develop a wechat mini program. Compared with mobile phone native APP, wechat small program has simple development and deployment, especially without installation, ready-to-use, especially suitable for this kind of single function, occasionally used app.

The first thing that comes to mind is tensorflow. js, which realizes deep learning on the mobile side without the need for the server side. However, tensorflow. js does not support wechat applets, so we have no choice but to choose the mode of applets + server. As I am not good at web + Server development, I first implemented an Android app using TensorFlow Lite as mentioned in “This Mid-Autumn Festival, I Developed an App to Identify dogs”. This Android App is more of an experimental project. This National Day, I have a lot of free time, so I decided to complete the whole wechat mini program.

Because of the mode of the end plus the server, the image recognition is completed on the server, so the main function is realized on the server. Let’s talk about the server-side implementation first.

TensorFlow Serving

There are many server-side implementations, including C++/Java/Python. At one point, I even considered using node.js. Looking at the Google Developer conference material last week, I noticed that TensorFlow already provides a server deployment solution: TensorFlow Serving.

TensorFlow Serving is a flexible high-performance service system for machine learning models designed specifically for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments while maintaining the same server architecture and API. TensorFlow Serving provides a package of integration solutions with the TensorFlow model and can also be easily extended to serve other types of models.

Detailed information please visit: tensorflow. Google. Cn/serving /

TensorFlow Serving is still being perfected. Direct reference examples do not provide the desired functions, but after searching for information from various sources, the whole flow is finally delivered.

SavedModel

TensorFlow provides two model formats:

Checkpoints, a format that relies on code to create models.
SavedModel, which is a format independent of the code that creates the model.

SaveModel is a language-independent, recoverable, sealed serialization format. TensorFlow provides a variety of mechanisms for interacting with savedModels, such as TF.Saved_Model API, Estimator API, and CLI. TensorFlow Serving requires a model file using the SavedModel format.

Retrain and save as SavedModel

In the article “This Mid-Autumn festival, I Developed a Dog Recognition App”, we don’t need to train deep learning models to recognize dogs from scratch. Instead, we use transfer learning to retrain existing models. Considering that the model is deployed on the server side, I chose the Inception V3 model, which is more discernable.

Stanford Dog datasets are used for labeled dog datasets. Please download and decompress them by yourself, and then run the following commands for training:

python retrain.py --image_dir=./Images --saved_model_dir=models/inception_v3
Copy the code

The trained model is stored in models/inception_v3/1, where 1 is the version number and can be specified by the command line argument of the retrain.py script.

Install the TensorFlow Model Server

This is very easy in Ubuntu, just use the following command:

sudo apt install tensorflow-model-server
Copy the code

For ease of development, you need to install the TensorFlow Serving Python API:

pip install tensorflow-serving-api
Copy the code

Start the TensorFlow Model Server

According to the documentation, it is very simple to start the TensorFlow Model Server. The rest_API_port parameter is added here to start the server, and the RESTful API is provided, which is convenient for wechat apts to communicate with it.

tensorflow_model_server --rest_api_port=8501 --model_base_path=$PWD/models/inception_v3
Copy the code

However, after starting the TensorFlow Model Server in this way, I did not communicate with the client for half a day. When I was at a loss, I saw a project on Github: github.com/tobegit3hub…

In short, Simple TensorFlow Serving is a encapsulation of TensorFlow Serving, a generic and easy-to-use service for the machine learning model.

It’s also very ambitious, boasting features like:

Support for distributed TensorFlow model
Supports general RESTful/HTTP apis
Support GPU-accelerated reasoning
Support for curl and other command-line tools
Support clients using any programming language
Support automatic generation of client code without coding
Supports inference using raw image files in image models
Statistics to support detailed requests
Support for serving multiple models simultaneously
Support for dynamic online and offline model versions
Support for loading new custom operations for the TensorFlow model
Support for secure authentication through configurable basic authentication
Support TensorFlow/MXNet/PyTorch/Caffe2 / CNTK/ONNX/H2o/Scikit-learn/XGBoost/PMML and other models

What I most look at is its automatic generation of client code function, before this, I looked up a lot of information, but did not deal with the communication between the client and the server. It also provides a Web interface where you can view the structure of the model and the signature, which I struggled with for a long time.

The browser is http://127.0.0.1:8500. The web page is as follows:

Installation of Simple TensorFlow Serving is very Simple:

pip install simple_tensorflow_serving
Copy the code

Next start server:

simple_tensorflow_serving --model_base_path="./models/inception_v3" &
Copy the code

The client

Wechat small program development has not started to learn, first use Python to write a client first test, we can use the automatic generation of client code function:

curl http://localhost:8500/v1/models/default/gen_client? language=python > test_client.pyCopy the code

The automatically generated code is as follows:

#! /usr/bin/env python

import requests

def main(a):
  endpoint = "http://ilego.club:8500"
  json_data = {"model_name": "default"."data": {"image"[[[[:1.0.1.0.1.0], [1.0.1.0.1.0], [1.0.1.0.1.0], [1.0.1.0.1.0], [1.0.1.0.1.0]]]]}}
  result = requests.post(endpoint, json=json_data)
  print(result.text)

if __name__ == "__main__":
  main()
Copy the code

As you can see, the client posts a piece of JSON data to the server and retrieves the result. Modify on the basis of this code, add picture reading, picture scaling and conversion to JSON format data, that is, complete the test client code, please refer to the code: github.com/mogoweb/aie…

Try testing a picture of a dog:

python test_client.py --image=./Images/n02116738-African_hunting_dog/n02116738_1105.jpg
Copy the code

The results are as follows:

N02116738 African Hunting Dog 0.780203342438 N02115913 Dhole 0.0102733308449 N02092002 Scottish Deerhound 0.00600153999403Copy the code

The category label is followed by the probability of belonging to a category. The probability of Top 1 in the result above is 0.78.

conclusion

The server side is far from perfect, and there are some problems:

The images on the client and server are transmitted in JSON format, and the image data is converted from binary to JSON string, resulting in low spatial efficiency. Base64 encoding of image data is considered later.
The efficiency of prediction is relatively first, from the time the request is sent to the time the response is received, there is tens of seconds without finding where the bottleneck is.
Concurrency support, because now it is only a simple test, if you consider the product stage, multiple mobile phone wechat applets to identify at the same time, there will be a lot of work to do.

Well, on the development and deployment of the server side here first, the next article I will talk about the development of micro channel small program and communication with the server side, please pay attention to!

For the complete code of this article, see: github.com/mogoweb/aie…

When wechat applets meet TensorFlow: Server implementation

TensorFlow Serving

SavedModel

Retrain and save as SavedModel

Install the TensorFlow Model Server

Start the TensorFlow Model Server

The client

conclusion

Related Posts

Read the reinforcement learning behind AlphaGo

Successfully resolved RecursionError: Maximum Recursion depth exceeded

Pandas DataFrame tutorial