How does the front end quickly train a form recognition model?

Article/Ali Tao F(X) Team – Tianke

This article will use Pipcook to quickly train a form recognition model and use this model to improve the efficiency of form development.

Pain points

One of the pain points when restoring a page on the front end is when the designer draws a form in the design, and you go to AntD or Fusion and find a similar form and copy the code. This is inefficient and cumbersome.

How about a quick screenshot that generates form code? The answer is Yes.

solution

We can train a target detection model where the input is a screenshot of the form and the output is the type and coordinate of all the form items. In this way, you only need to take a screenshot of the form in the design draft, and you can get all the form items in it, and with the Label identified by the text, you can generate the form code. For example, I have previously implemented the ability to generate form code from screenshots.

The red box in the figure is the form items detected by the target detection model, and the green box is the text recognized by the text recognition interface. Combine the two, and with some calculation, you can generate a form protocol or code.

Word recognition is universal, so we won’t go into it. So how does the form item detection function come about? Here are the general steps:

Sample: Collect thousands of form images and mark the items in them.
Training: Feed samples to a machine to learn.
Prediction: After training, pass a new form image to the model, and the model can predict the label.

Here’s how to do each step in detail.

sample

The form recognition sample here is a general target detection sample. Please refer to the previous section for marking method. For convenience, here is a sample data set of form identification.

http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip
Copy the code

training

Next, I will demonstrate how to run the sample pages using Pipcook to generate massive samples and train the target detection model.

Introduce Pipcook

Pipcook is a machine learning application framework for front-end developers developed by the D2C team of The Technology Department of Amoy Department. We hope that Pipcook can become a platform for front-end personnel to learn and practice machine learning, so as to promote front-end intelligence. Pipcook (github.com/alibaba/pip…). Is an open source framework, welcome everyone to build together.

The installation

Make sure your node is version 12 or later. Then execute:

/ / install CNPM mainly at the same time in order to accelerate the NPM I @ pipcook/pipcook - cli CNPM - g - registry=https://registry.npm.taobao.orgCopy the code

Next initialize:

pipcook init --tuna -c cnpm 
pipcook daemon start
Copy the code

configuration

Form recognition is a target detection task, so you can create such a configuration file in a JSON format. Don’t worry, most of the parameters in this configuration file need not be changed, only a few parameters need to be changed.

form.json

{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-object-detection-pascalvoc-data-collect"."params": {
        "url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip"}},"dataAccess": {
      "package": "@pipcook/plugins-coco-data-access"
    },
    "modelDefine": {
      "package": "@pipcook/plugins-detectron-fasterrcnn-model-define"
    },
    "modelTrain": {
      "package": "@pipcook/plugins-detectron-model-train"."params": {
        "steps": 20000}},"modelEvaluate": {
      "package": "@pipcook/plugins-detectron-model-evaluate"}}}Copy the code

You need to set the parameters in datacollect. params:

url: Your sample address.

Of course, you can also run the configuration file directly, and you can train a form detection model.

run

Because the target detection model is computationally heavy, you may need a GPU machine, otherwise the training time will be several weeks…

pipcook run form.json --tuna
Copy the code

The training might be a bit long, so go have lunch or write some business code.

When the training is complete, a model is generated in the Output directory.

To predict

After training, output will be generated in the current directory, which is a brand new NPM package, so we first install the dependencies:

cdOutput // BOA_TUNA=1 NPM installCopy the code

Once you have installed the environment, go back to the root directory and download a test image as test.jpg

cd. curl https://img.alicdn.com/tfs/TB1bWO6b7Y2gK0jSZFgXXc5OFXa-1570-522.jpg --output test.jpgCopy the code

Finally, we can start to predict:

const predict = require('./output');
(async() = > {const v1 = await predict('./test.jpg');
  console.log(v1); 
  / / {
  // boxes: [
  // [83, 31, 146, 71], // xmin, ymin, xmax, ymax
  // [210, 48, 256, 78],
  // [403, 30, 653, 72],
  // [717, 41, 966, 83]
  / /,
  // classes: [
  // 0, 1, 2, 2 // class index
  / /,
  // scores: [
  // score 0.95, 0.93, 0.96, 0.99 // scores
  / /]
  // }}) ();Copy the code

Note that the given result consists of three parts:

Boxes: This property is an array where each element is another array of four elements: xmin, xmax, ymin, ymax
Scores: This attribute is an array, with each element being the confidence of the predicted result
Classes: This property is an array, with each element corresponding to a predicted class

Visualized boxes, scores, classes:

Tao department front end – F-X-Team opened micro blog! (Visible after Posting on Weibo)

More team content awaits you in addition to articles at 🔓