!!!!!!!!! The following content is the author’s original, first in the nuggets platform. Without the consent and permission of the original author, any person, any organization shall not be reproduced in any form. It is not easy to be original. If I have provided some help to your problem, I hope I can get your support.

0. PaddleOcr profile

PaddleOcr is a wheel trained based on the open source deep learning platform paddlePaddle, which acts just as the name suggests: extracting and recognizing text in pictures. PaddleOcr has now officially released the 80+ language recognition model, which is sufficient for everyday use. The following is the official Chinese and English general OCR model as an example, to step by step to teach you how to download in the Centos7 system, installation, testing, deployment service and other all the process of the following tutorial is very detailed, suitable for 0 basic partners to learn to operate.

1. Write before teaching

Why would you want to write a tutorial like this?

In fact, paddleOcr already has a detailed tutorial on Github, and people who have the skills and the basics can read the official instructions and do all the work themselves. But not so familiar with the small partners on the tutorial is not so friendly, their own online to find some relevant information, or may be stuck in one or two small problems, resulting in the deployment success;

At the end of the tutorial I’ll put all the references and links below. Let’s jump right into this tutorial

2. Prepare the Docker tool in centos

Is the so-called work to do good, and first sharpen its tools; We directly use the officially prepared Docker environment to install, which will avoid most of the problems, but also encounter some small pits, which will be explained one by one below

Find the docker installation package under centos and install it

  • yum list docker-ce --showduplicates | sort -r

You can choose a stable version to install, or do not specify the version, directly install the latest version

  • yum install docker-ceYum to install Docker

Start the Docker service

  • service docker startStart the Docker service

  • docker --versionCheck the docker version to see if Docker started properly

  • systemctl enable dockerConfigure the Docker service to start automatically upon startup

3. Download the official paddleOcr Docker image

Official Github repository address: Official Gitee repository address: The official advice is to go to Github, but github access speed understand understand. The following example uses the github address as an example. People who cannot access Github go to Gitee to find an alternative address

Create the paddleOcr directory

The directory is used to store the paddleOcr image. The official suggestion is to run the mkdir /home/projects command to create a project directory under /home/projects. Run the CD /home/projects command to go to the project directory

Download the official image

Docker run - name ppocr - v $PWD: / paddle - network = host - it paddlepaddle/paddle: latest dev - cuda10.1 cudnn7 -- gcc82 /bin/bash: The official docker command does not map the running port. The official docker command starts with network=host, that is, the port used in the container is the port of the hostCopy the code

Then Docker automatically starts downloading the image, and there’s a long wait, which takes about 10 minutes

After downloading, you will automatically enter the shell inside the mirror and enter the following interface.

Leave it alone and just exit.

docker ps -aLook at the docker running process and you can see that the ppocr that you just downloaded has been shut down.

docker start ppocrRestart the Ppocr container

4. Install paddlepaddle2.0

As I said earlier in the 0. Introduction, paddleOcr is based on the platform paddlepaddle, so of course it can’t run without paddlepaddle

[Important] Check the Python3 and PIP3 versions in Docker

When entering the Docker container, be sure to check the version of PYTHon3 and the version used by PIP3. Make sure that the version is 3.7 and above, which is the official required version. However, the official Docker image is python3 3.5.1. You must manually upgrade to install the new version

docker exec -it ppocr /bin/bashGo inside the Docker container

python3 --versionCheck the PYTHon3 version, as shown in figure 3.5.1, which must be upgraded

pip3 --versionCheck the version of PIP3. If the version of PIP3 is 3.5.1, upgrade it with it

Upgrade python3

The python3 source installation files are already in the /home directory, 3.7.0 and 3.8.0.

CD/Python - 3.8.0Go to the python-3.8.0 directory

./configureThe compiler performs some pre-installation checks, which are completed after a few moments.

make && make installSource installation, wait a few minutes, wait for the installation is complete.

When the installation is complete, re-check python3, PIP3 to ensure that the version has been upgraded to 3.8.0

[Important] Update the user environment variable parameters

Install a Vim to facilitate text editing within the container

apt-get update
apt-get install vim
Copy the code

vi ~/.bashrcModifies the Python environment variable specified in.bashrc

Delete all python3.5.1 configurations in the edit file. Delete and save the contents in the red box in the screenshot below

source ~/.bashrcThe configuration file takes effect again

Upgrade pip3

Once python3 has been resolved, you need to install paddlePaddle2.0, the environment required for paddleOcr

pip3 install --upgrade pipUpgrade PIP3 (official Installation Guide)

Just a moment

Install paddlepaddle2.0

The official guidance for this step is to distinguish between GPU and CPU versions. The following examples are CPU versions. (If you need to install the GPU version, please go to the official document link to find the installation instructions of the GPU version.)

python3 -m pip install paddlepaddle==2.0. 0 -i https://mirror.baidu.com/pypi/simple
Copy the code

Wait a few minutes for the download and update

CD /home Switch back to the directory

Clone PaddleOcr Repository code

[recommended] Git clone HTTPS://github.com/PaddlePaddle/PaddleOCRIf you don't have access to Github, you can download the source code from the Gitee repository: git Clone HTTPS://gitee.com/paddlepaddle/PaddleOCR
Copy the code

Installing third-party libraries

cd /home/PaddleOCRSwitch to the PaddleOcr directory:

Pip3 install -r requirements. TXT Installs the third-party library

Enter a slightly longer download wait. I failed in this step once in the actual installation process due to network reasons. Please be patient with the installation. If it is https:// ConnectionPool Read timed out. For this problem, please try to install it several times until the installation is complete.

Download the official model

Take the official server-side model as an example. (The official model package is a smaller model suitable for mobile terminal, interested little friend xiaoshifu official documentation github.com/PaddlePaddl…)

Create the model directory in the Docker image

mkdir /home/PaddleOCR/inference && cd /home/PaddleOCR/inferenceCreate the inference models directory under paddleOCR

Download and unpack the model

The official model is divided into detection model, direction model, identification model, respectively download and decompress

Download the detection model wget HTTPS:/ / paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tarDownload direction classifier wget HTTPS:/ / paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tarDownload identification model wget HTTPS:/ / paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tarDecompress the package tar xf ch_pPOcr_mobile_v2.0_cls_infer. Tar tar xf ch_pPOcr_server_V2.0_det_infer. Tar tar xf Ch_ppocr_server_v2. 0 _rec_infer. TarCopy the code

The decompressed directory is as follows

6. Single picture recognition test

Go back to /home/paddleocr

cd /home/PaddleOCR

Image test

Use the official image to test identification. The official image directory is /home/paddleocr /doc/imgs

The test command is python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir=". / inference/ch_ppocr_server_v2. 0 _det_infer/"  --rec_model_dir=". / inference/ch_ppocr_server_v2. 0 _rec_infer/" --cls_model_dir=". / inference/ch_ppocr_mobile_v2. 0 _cls_infer/" --use_angle_cls=True --use_space_char=True --use_gpu=False
Copy the code

The following are the original drawing, the identification drawing after identification, and the identification result

7. Service deployment

Once the single image test passes, we need to deploy the service on the WEB so that other services can invoke it as an interface.

The PaddleHub Server service is deployed

This mode of deployment is also one of the official recommended deployment modes.

Install the PaddleHub environment

Run the following command in the Docker image

Pip3 install paddlehub = = 1.8.3 - upgrade - I https://pypi.tuna.tsinghua.edu.cn/simple

Wait a moment to complete the download

Modify the deployment parameter file

Deployment parameters file address for docker mirror: / home/PaddleOCR/deploy/hubserving/ocr_system params. Py

Open params.py with vi and set the following in the red box3File address, respectively is modified to the following address: / home/PaddleOCR/inference/ch_ppocr_server_v2. 0 _det_infer //home/PaddleOCR/inference/ch_ppocr_server_v2. 0 _rec_infer //home/PaddleOCR/ Inference/CH_PPOCR_MOBILE_V2.0_CLs_infer/Other parameters do not need to be modifiedCopy the code

Installing a Service Module

Installation detection + identification series service module:

hub install deploy/hubserving/ocr_system/

Install the flask

Flask is used to deploy the Web frameworkpip3 install flask

Install the flask – cors

pip3 install flask-cors

Create a New Web service application

Trtr.py (/home/PaddleOCR/tools) A new py file with the name test_tr. py and permission 775 is provided.

# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '.. ')))

from ppocr.utils.logging import get_logger
logger = get_logger()

import cv2
import numpy as np
import time
from PIL import Image
from ppocr.utils.utility import get_image_file_list
from tools.infer.utility import draw_ocr, draw_boxes

import requests
import json
import base64

from flask import Flask,request
from flask_cors import CORS


import requests
app = Flask(__name__)
CORS(app)  # Solve cross-domain problems

def cv2_to_base64(image) :
    return base64.b64encode(image).decode('utf8')


def draw_server_result(image_file, res) :
    img = cv2.imread(image_file)
    image = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    if len(res) == 0:
        return np.array(image)
    keys = res[0].keys()
    if 'text_region' not in keys:  # for ocr_rec, draw function is invalid 
        logger.info("draw function is invalid for ocr_rec!")
        return None
    elif 'text' not in keys:  # for ocr_det
        logger.info("draw text boxes only!")
        boxes = []
        for dno in range(len(res)):
            boxes.append(res[dno]['text_region'])
        boxes = np.array(boxes)
        draw_img = draw_boxes(image, boxes)
        return draw_img
    else:  # for ocr_system
        logger.info("draw boxes and texts!")
        boxes = []
        texts = []
        scores = []
        for dno in range(len(res)):
            boxes.append(res[dno]['text_region'])
            texts.append(res[dno]['text'])
            scores.append(res[dno]['confidence'])
        boxes = np.array(boxes)
        scores = np.array(scores)
        draw_img = draw_ocr(
            image, boxes, texts, scores, draw_txt=True, drop_score=0.5)
        return draw_img

@app.route("/test")
def test() :
    return 'Hello World! '

@app.route("/myocr", methods=["POST"] )
def myocr() :
    # Input parameter
    image_file = request.files['file']
    basepath = os.path.dirname(__file__)

    logger.info("{} basepath".format(basepath))

    savepath = os.path.join(basepath, image_file.filename)
    image_file.save(savepath)
    img = open(savepath, 'rb').read()
    if img is None:
        logger.info("error in loading image:{}".format(image_file))

    # into base64
    data = {'images': [cv2_to_base64(img)]}
    # Send request
    url = "http://127.0.0.1:8866/predict/ocr_system"
    headers = {"Content-type": "application/json"}
    r = requests.post(url=url, headers=headers, data=json.dumps(data))

    # Return result
    res = r.json()["results"] [0]
    logger.info(res)
    return  json.dumps(res)
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
Copy the code

Start the service

Services are divided into hub services and Web services

Starting the Hub service

【 very important 】 export PATHONPATH=.

If you do not set the environment variable, an error message will be displayed indicating that the Tools Module cannot be found

hub serving start -m ocr_system &The following description appears for success

Starting the Web Service

CD /home/paddleocr /tools Switch to the tools directory. Go to python3 test_tr. py & and start the Web service

8. Service testing

The Postman tool calls the test

Using Postman to make a request to port 5000, you can see that the service returns the identification result

Vue simple page test

Write a simple picture upload page using Vue, and the back-end interface forwards the data and requests to the 5000 interfaceWe’ll have time for a separate section on this later

9. Performance analysis

Test Machine configuration

The test machine is virtualized using VM in the physical machine

Entity machine:cpu: AMD Ryzen 52600X Six-core Processor memory: 32 GB OPERATING system: Window10 X64 Virtual Centos: Processor:4Core single-thread memory: 8GB Operating system: Centos7.8
Copy the code

The following parsing time is for reference only without any optimization and without considering the effect of network transmission speed, leaflet 6. In the official built-in test pictures, the parsing time is:

Docker: 11s Postman: 17s VUE: 18sCopy the code

10. References

PaddleOCR Github PaddleOCR Installation and Practice (CPU Edition)