Article/Ali Tao F(X) Team – Tianke
This article will use Pipcook to quickly train a form recognition model and use this model to improve the efficiency of form development.
Pain points
One of the pain points when restoring a page on the front end is when the designer draws a form in the design, and you go to AntD or Fusion and find a similar form and copy the code. This is inefficient and cumbersome.
How about a quick screenshot that generates form code? The answer is Yes.
solution
We can train a target detection model where the input is a screenshot of the form and the output is the type and coordinate of all the form items. In this way, you only need to take a screenshot of the form in the design draft, and you can get all the form items in it, and with the Label identified by the text, you can generate the form code. For example, I have previously implemented the ability to generate form code from screenshots.
The red box in the figure is the form items detected by the target detection model, and the green box is the text recognized by the text recognition interface. Combine the two, and with some calculation, you can generate a form protocol or code.
Word recognition is universal, so we won’t go into it. So how does the form item detection function come about? Here are the general steps:
- Sample: Collect thousands of form images and mark the items in them.
- Training: Feed samples to a machine to learn.
- Prediction: After training, pass a new form image to the model, and the model can predict the label.
Here’s how to do each step in detail.
sample
The form recognition sample here is a general target detection sample. Please refer to the previous section for marking method. For convenience, here is a sample data set of form identification.
http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip
Copy the code
training
Next, I will demonstrate how to run the sample pages using Pipcook to generate massive samples and train the target detection model.
Introduce Pipcook
Pipcook is a machine learning application framework for front-end developers developed by the D2C team of The Technology Department of Amoy Department. We hope that Pipcook can become a platform for front-end personnel to learn and practice machine learning, so as to promote front-end intelligence. Pipcook (github.com/alibaba/pip…). Is an open source framework, welcome everyone to build together.
The installation
Make sure your node is version 12 or later. Then execute:
/ / install CNPM mainly at the same time in order to accelerate the NPM I @ pipcook/pipcook - cli CNPM - g - registry=https://registry.npm.taobao.orgCopy the code
Next initialize:
pipcook init --tuna -c cnpm
pipcook daemon start
Copy the code
configuration
Form recognition is a target detection task, so you can create such a configuration file in a JSON format. Don’t worry, most of the parameters in this configuration file need not be changed, only a few parameters need to be changed.
form.json
{
"plugins": {
"dataCollect": {
"package": "@pipcook/plugins-object-detection-pascalvoc-data-collect"."params": {
"url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip"}},"dataAccess": {
"package": "@pipcook/plugins-coco-data-access"
},
"modelDefine": {
"package": "@pipcook/plugins-detectron-fasterrcnn-model-define"
},
"modelTrain": {
"package": "@pipcook/plugins-detectron-model-train"."params": {
"steps": 20000}},"modelEvaluate": {
"package": "@pipcook/plugins-detectron-model-evaluate"}}}Copy the code
You need to set the parameters in datacollect. params:
url
: Your sample address.
Of course, you can also run the configuration file directly, and you can train a form detection model.
run
Because the target detection model is computationally heavy, you may need a GPU machine, otherwise the training time will be several weeks…
pipcook run form.json --tuna
Copy the code
The training might be a bit long, so go have lunch or write some business code.
When the training is complete, a model is generated in the Output directory.
To predict
After training, output will be generated in the current directory, which is a brand new NPM package, so we first install the dependencies:
cdOutput // BOA_TUNA=1 NPM installCopy the code
Once you have installed the environment, go back to the root directory and download a test image as test.jpg
cd. curl https://img.alicdn.com/tfs/TB1bWO6b7Y2gK0jSZFgXXc5OFXa-1570-522.jpg --output test.jpgCopy the code
Finally, we can start to predict:
const predict = require('./output');
(async() = > {const v1 = await predict('./test.jpg');
console.log(v1);
/ / {
// boxes: [
// [83, 31, 146, 71], // xmin, ymin, xmax, ymax
// [210, 48, 256, 78],
// [403, 30, 653, 72],
// [717, 41, 966, 83]
/ /,
// classes: [
// 0, 1, 2, 2 // class index
/ /,
// scores: [
// score 0.95, 0.93, 0.96, 0.99 // scores
/ /]
// }}) ();Copy the code
Note that the given result consists of three parts:
- Boxes: This property is an array where each element is another array of four elements: xmin, xmax, ymin, ymax
- Scores: This attribute is an array, with each element being the confidence of the predicted result
- Classes: This property is an array, with each element corresponding to a predicted class
Visualized boxes, scores, classes: