The text/Su Chuan

In addition to identifying UI components, this article also introduces the correct posture for solving problems using machine learning:

1. Analysis of current situation and problems 2. Algorithm selection 3. Sample preparation 4. Model training 5

Application background

Imgcook uses visual sketches in the form of Sketch, PSD, static images as input, generates maintainable front-end code with one click through intelligent technology. Plugin is required for code generation of Sketch/Photoshop design draft. Export the JSON description (D2C Schema) of the imgCook visual draft through the imgCook plug-in in the design draft and paste it to the IMGCook visual editor. You can edit views and logic in the editor to change the JSON description.

We can choose the DSL specification to generate the corresponding code. For example, to generate the React specification code, you need to convert the JSON tree into React code (custom DSL).

The image below shows the visual Sketch on the left and the button code generated using the React development specification on the right.


The generated code consists of tags like div, IMG, and SPAN, but the actual application development has the following problems:

  • Web page development to improve reusability, page components, such as: Searchbar, Button, Tab, Switch, Stepper
  • Some native components do not require code generation, such as Statusbar, Navbar, Keyboard

Our requirement is that if we want to use a component library, such as Ant Design, we want the generated code to look like this:

// Antd Mobile React
import { Button } from "antd-mobile";

<div style={styles.ft}>
  <Button style={styles.col1}>Go into the store and grab a red envelope</Button>
  <Button style={styles.col2}>Add a shopping cart</Button>
</div>
Copy the code

To do this, we added a SMART field to the JSON description to describe the type of node.

"smart": {
  "layerProtocol": {
    "component": {
      "type": "Button"
    }
  }
}
Copy the code

All we need to do is find the element in the visual that needs to be componentized and describe it with JSON information so that when the DSL transforms the code, we can generate the componentized code by capturing the SMART field in the JSON information.

The problem now becomes: how to find the element that needs to be componentized in the visual draft, what component it is, and where it is in the DOM tree or in the design draft.

The solution

Convention generation rule

Control the generated code structure by specifying the design specification to interfere with the generated JSON description. For example, in our advanced intervention specification for design draft, the layer naming specification for components: components in layers, component properties and so on are explicitly marked.

#component: Component name? Property = value #
#component:Button? id=btn#
Copy the code

When exporting JSON description data using imgCook’s plugin, I get convention information from the layer through specification parsing.

Learning recognition component

Manpower in accordance with the agreed rules the way we set protocol specification to modify the design draft, there may be many components on a page, the artificial agreed way let developers more than a lot of extra work, does not conform to use imgcook improve the efficiency of development tenet, we look forward to using intelligent automatic identification can be componentized elements of visual draft and The result of the identification is eventually transformed and populated in the SMART field, the same content as the SMART field in the JSON generated by the manual convention component protocol.

Two things need to be done here:

  • Find component information: category, location, size, and so on.
  • Find a property in a component, such as “Submit” in a button

The second thing we can do is parse the component’s child elements based on the JSON tree. The first thing we can do automatically through intelligence, which is a typical target detection problem in the field of artificial intelligence. We can try to use deep learning target detection method to automate the process of manual convention.

Learn to identify UI components

The industry status quo

There is also some research and application of deep learning to identify UI elements in web pages. There is some discussion about this:

  • Use R-CNN to detect UI elements in a webpage?
  • Is machine learning suitable for detecting screen elements?
  • Any thoughts on a good way to detect UI elements in a webpage?
  • How can I detect elements of GUI using opencv?
  • How to recognize UI elements in image?

There are two main claims under discussion:

  • Application scenarios where Web page automated testing is expected to be done by identifying UI interface elements.
  • Expect automatic code generation by identifying UI interface elements.

Since deep learning is used to solve the problem of identifying UI elements, UI data sets with element information are required. Data sets that are currently open and widely used in the industry include Rico and ReDraw.

ReDraw

A set of Android screenshots, GUI metadata and annotated GUI component images, including RadioButton, ProgressBar, Switch, Button, CheckBox, etc. With 14,382 UI images and 191,300 labeled GUI components, the dataset was processed to bring the number of each component up to 5000. For a detailed description of this Dataset, see The ReDraw Dataset.

This is a data set used to train and evaluate THE CNN and KNN machine learning techniques mentioned in the ReDraw paper, published in IEEE Transactions on Software Engineering in 2018. The paper proposes a three-step approach to automate the transition from UI to code:

1. Detection

Firstly, extract UI interface meta information, such as boundary box (position, size), from the design draft or use CV technology.

2, Classification

Then, large software warehouse is used for mining and automatic dynamic analysis to obtain components in UI interface, and this data is used as CNN technology data set to learn to classify extracted elements into specific types, such as Radio, Progress Bar, Button, etc.

3, Assemble the Assembly

Finally, KNN was used to derive UI hierarchies, such as vertical lists and horizontal sliders.

Android code was generated using this approach in the ReDraw system. The evaluation showed that ReDraw classified GUI components with an average accuracy of 91% and assembled prototype applications that closely mirrored the target model in visual affinity while exhibiting sound code structure.

Rico

The largest mobile UI data set to date, created to support five categories of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. The Rico dataset contains 27 categories, more than 10,000 applications, and approximately 70,000 screenshots.

RICO: A Mobile App Dataset for Building Data-Driven Design Applications (RICO: A Mobile App Dataset for Building Data-Driven Design Applications) was presented at the 30th ACM Annual User Interface Software and Technology Symposium in 2017.

Since then, there have been several studies and applications based on Rico datasets. For example, Learning Design Semantics for Mobile Apps. This paper introduces a method based on code and vision to add semantic annotations to Mobile UI elements. 25 UI component categories, 197 text button concepts, and 99 icon classes are automatically identified based on UI screenshots and view hierarchies.

Application scenarios

Here are some research and application scenarios based on the above data sets.

Intelligently generated code

Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps | ReDraw Dataset

Intelligently generated layout

Neural Design Network: Graphic Layout Generation with Constraints | Rico Dataset

User perception prediction

Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning | Rico Dataset

UI Automation testing

A Deep Learning based Approach to Automated Android App Testing | Rico Dataset

Problem definition

In the application of generating Android code based on Redraw data set introduced above, we know its implementation scheme. In the second step, we need to use large software warehouse mining and automatic dynamic analysis technology to obtain a large number of component samples as training samples of CNN algorithm. To get specific types of components that exist in the UI, such as Progress Bar, Switch, and so on.

For our imgCook application scenario, the underlying problem is to find this specific type of component information in the UI: categories and bounding boxes. We can ** define this problem as a target detection problem using deep learning for target detection in the UI. ** So what is our goal?

The detection targets are the Progress Bar, Switch, Tab Bar and other page elements that can be componentized in code.

UI interface target detection

Basic knowledge of

Machine learning

How do humans learn? By inputting certain information to the brain, knowledge and experience can be learned and summarized, and decisions or actions can be made based on existing experience when similar tasks are performed.

The process of Machine Learning is very similar to that of human Learning. In essence, machine learning algorithm is to obtain a model represented by f(x) function. If a sample x is input to F (x), the result is a category, solving a classification problem; if a specific value is obtained, solving a regression problem.

Machine learning and the whole of human learning mechanism is consistent, the human brain is a little bit difference between some only need very little information can sum up the applicability is very strong knowledge or experience, such as we just met a few cat or dog can correctly distinguish between cats and dogs, but we need a lot of learning materials for the machine, But what machines can do is become intelligent without human involvement.

Deep learning

Deep Learning, a branch of machine Learning, is an algorithm that attempts to abstract data at a high level using multiple layers of processing consisting of complex structures or multiple nonlinear transformations.

The differences between Deep Learning and traditional Machine Learning can be seen in Deep Learning vs. Machine Learning, which includes data dependence, hardware dependence, feature processing, problem solving, execution time and interpretability.

Deep learning requires a high amount of data and hardware and takes a long time to execute. The main difference between deep learning and traditional machine learning algorithms lies in the way of feature processing. When traditional machine learning is applied to real tasks, the features of describing samples usually need to be designed by human experts, which is called “Feature Engineering”. However, the quality of features has a crucial impact on generalization performance, and it is not easy to design good features. Deep learning can automatically generate good features by analyzing data using feature learning techniques.

Target detection

Machine learning has many applications, such as:

  • Computer Vision (CV) is used in license plate recognition and facial recognition.
  • Information retrieval is used for applications such as search engines – including text search and image search.
  • Marketing for applications such as automated email marketing and target group identification.
  • Applications of medical diagnostics such as cancer recognition and abnormal detection.
  • Applications of Natural Language Processing (NLP) such as sentiment analysis and photo tagging.

Object Detection is a computer technology related to computer vision and image processing, which is used to detect specific semantic objects (such as people, animals or cars) in digital images and videos.

The target for our UI is some design element, be it atomic-grained Icon, Image, Text, componentized Searchbar, Tabbar, etc.

Algorithm selection

Methods used for target detection are usually divided into machine learn-based methods (traditional target detection methods) or deep learn-based methods (deep learning target detection methods). Target detection methods have undergone changes from traditional target detection methods to deep learning target detection methods:

Traditional target detection methods

For machine learning-based approaches, features need to be defined using one of the following methods and then classified using techniques such as support vector machines (SVM).

  • Viola – Jones target detection framework based on Haar function
  • Scale invariant Feature Transform (SIFT)
  • Feature of directional gradient histogram (HOG)

Deep learning target detection method

For deep learning-based methods, end-to-end target detection can be carried out without defining features, usually based on convolutional neural network (CNN). Object detection methods based on deep learning can be divided into one-stage and two-stage, and RefineDet algorithm inheriting the advantages of these Two methods.

One-stage

The target detection algorithm based on one-stage gives the classification and location information directly through the backbone network without using RPN network. This algorithm is faster, but the accuracy is slightly lower than two-stage target detection network. Typical algorithms are:

  • SSD (Single Shot MultiBox Detector) series
  • YOLO (You Only Look Once) Series (YOLOv1, YOLOv2, YOLOv3)
  • RetinaNet

Two-stage

Two-stage target detection algorithm mainly completes the target detection process through a convolutional neural network, which extracts CNN convolutional features. When training the network, it mainly trains Two parts: the first step is to train RPN network, and the second step is to train the target area detection network.

In other words, the algorithm generates a series of candidate boxes as samples, and then classifies samples by convolutional neural network. The network has high accuracy and slower speed than one-stage. Typical algorithms are:

  • R-cnn, Fast R-CNN, Faster R-CNN

Others (RefineDet)

RefineDet (Single-shot Refinement Neural Network for Object Detection) is an improvement based on the SSD algorithm. Inherits the advantages of both approaches (e.g., single-stage design approach, two-stage design approach) and overcomes their disadvantages.

Comparison of target detection methods

Traditional methods vs. deep learning

The algorithm flow of the method based on machine learning and the method based on deep learning is shown in the figure. The traditional target detection method needs to manually design features, obtain candidate boxes through sliding Windows, and then use the traditional classifier to determine the target region. The whole training process is divided into multiple steps. Deep learning target detection method obtains candidate targets through more efficient proposals or direct regression through machine learning features, which has better accuracy and real-time performance.

Currently, researches on target detection algorithms are basically based on deep learning, while traditional target detection algorithms are rarely used. Deep learning target detection method is more suitable for engineering, and the specific comparison is as follows:

One-stage VS Two-stage

Advantages and disadvantages of the algorithm

I’m not going to write the principles of each algorithm, but I’m going to look at the pros and cons.

conclusion

As the accuracy of UI element detection is high, the Faster RCNN algorithm is finally selected.

Framework to choose

Machine learning framework

Here are a few brief examples of machine learning frameworks: Scikit Learn, TensorFlow, Pytorch, Keras.

Scikit Learn is a general-purpose machine learning framework that internally implements a variety of classification, regression and clustering algorithms (including support vector machines, random forests, gradient enhancement, K-means, etc.); It also includes a library of tools for data dimension reduction, model selection, and data preprocessing, which is easy to install and use, rich in samples, and detailed in tutorials and documentation.

TensorFlow, Keras and Pytorch** ** are currently the main frameworks for deep learning, providing a variety of deep learning algorithm calls. TensorFlow, Pytorch and Keras are examples of TensorFlow, Pytorch, and Keras. I agree with the author of this article that if you run these resources once, you will quickly understand and use these three frameworks.

You can see how these frameworks are used in actual tasks in the model training code below.

Target detection framework

The target Detection framework can be understood as a library integrating target Detection algorithms. For example, the deep learning algorithm framework TensorFlow is not a target Detection framework, but it provides Object Detection API.

The main target detection frameworks are: Detectron, MaskrCNN-Benchmark, MMDetection, Detectron2. Detectron2, which was opened by Facebook AI Research on October 10, 2019, is currently widely used. Detectron2 is also used to identify UI interface components. There will be a sample code to use it later.

What about FAIR’s Detectron2 object detection framework, which opened source on October 10, 2019?

Front-end machine learning framework Pipcook

As a front-end developer, we can also choose Pipcook, which is an open source front-end algorithm engineering framework to help front-end engineers use machine learning developed by the Intelligent Group of Alibaba Front-end Committee.

Pipcook uses front-end friendly JS environment, based on tensorflow. JS framework as the underlying algorithm capabilities and packaging corresponding algorithms for front-end business scenarios, so that front-end engineers can quickly and easily apply machine learning capabilities.

Pipcook is a pipe-based framework that encapsulates seven parts of machine learning engineering links for front-end developers: data collection, data access, data processing, model configuration, model training, model service deployment, and online training.

The principle and use of Pipcook can be viewed:

Sample preparation

When the environment and model are ready, the main focus of machine learning is the collection and processing of data sets. Our samples come from two sources:

  • An image of the UI of the Alibaba application. At present, there are 25,647 pictures in mobile TERMINAL UI, and a total of 49,120 components in 10 categories are manually marked.
  • Code automatically generated pictures. Support 10 categories of sample generation, automatic annotation when generating pictures.

Component Type Definition

Currently, the delineated component types include Statusbar, Navbar, Searchbar, Tabbar, etc. No matter whether the target component is manually annotated or automatically annotated, a clear component type definition is required.

  • Manual annotations need to annotate components according to clearly defined characteristics
  • Automatic generation requires writing style code based on clearly defined characteristics.

For example, the definition of mobile Status Bar is as follows:

For example, the definition of the Tab Bar:

Sample Ali application UI interface

Ali department has many APP and product businesses, and the visual manuscripts of this business are centrally managed by the platform. We can get these visual manuscripts as sample sources. Currently, only the visual draft of Sketch is selected, because PSD files are difficult to export page-oriented pictures.

Collecting samples

It’s a little more detailed here, because I think there are some places that can be enlightening.

1. Download Sketch files

Downloading Sketch files from Alibaba’s internal platform is the first step, each step script starts with a serial number, such as 1-download-Sketch.ts. Because there are many sample processing scripts, friendly naming is easier to understand.

/**
 * 【用途】下载 Sketch 文件
 * 【命令】ts-node 1-download-sketch.ts
 */
Copy the code

2. Batch export as images using SketchTool

Sketch comes with a command-line tool, Sketchtool, that you can use to export images in batches. Click here to see more usage of Sketchtool.

#[Purpose] Export PNG images saved as 1X in Artboards of Sketch using Sketchtool
#[Command] sh 2-export-image-from-sketch$inputDir $outputDirFor file in $1"/*" do sketchtool export artboards $file --output=$2 --formats=' PNG '--scales=1.0 doneCopy the code

Sample pretreatment

Designers out of the design draft you know, Taobao live 4.0_V1, Taobao live 4.0_v2, Taobao live 4.0_V3… , the changes of each minor version may not be too big. The pages in each Sketch file may also be: the first version of the detail page, the second version of the detail page, and the final version of the detail page, which will have a lot of duplicate pictures after the exported pictures.

In addition, there are some non-standard design drafts, such as drawing an ICON on one drawing board, interactive drafts and PC visual drafts, which are not what we need. So you have to do some processing.

3. Filter according to size

Divide the picture into mobile terminal and PC terminal, remove invalid picture by size.

# 【 purpose 】 Remove pictures according to size
Python3 3-classify-by-size. Py $inputDir $outputDir

# delete images with improper size, width_list is for sizes greater than 100
if width not in width_list: 
    print('move {}'.format(img_name))
    move_file(img_dir, other_img_dir, img_name)
# Remove images less than 30 in height
elif height < 30: 
    print('move {}'.format(img_name))
    move_file(img_dir, other_img_dir, img_name)
# File by size
else: 
    width_dir = os.path.join(img_dir, str(width))
    if not os.path.exists(width_dir):
        print('mkdir:{}'.format(width))
        os.mkdir(width_dir)
        print('move {}'.format(img_name))
        move_file(img_dir, width_dir, img_name)
Copy the code

4, picture weight

If you write your own picture to repeat the logic can also do a picture similarity comparison. Duplicate Photos Fixer Pro is an existing image similarity detection tool. Here is a brief description of the use method, as shown in the red box, supporting adjustment of detection conditions and similarity.

After the Hash value of each image is calculated, the similarity can be adjusted to filter.

5. Rename the image

Here I want to talk about how to do sample management, because the data set is gradually enriched, there may be many versions of the data set. A friendly name for easy management, such as generator-mobile-sample-10cate-20200101-1.png, indicates that this is the first mobile sample automatically generated by 2020.01.01. This data set contains 10 categories.

The sample label

6. Semi-automatic labeling

Some components are automatically annotated, such as StatusBar and NavBar. Because almost every image will have the same location and size, you can automatically generate XML in VOC format for each image, containing two target component categories. Then, when manually marking other components, only a few parts need to be adjusted, which can save a lot of manpower.

At present, more semi-automatic labeling methods are being explored to reduce the cost of manual labeling.

7. Manual labeling

Manually annotate with labelImg tool and follow the installation steps provided in the link. Here is a brief introduction to the usage.

/ / / / download labelImg git clone https://github.com/tzutalin/labelImg.git after entering labelImg CD labelImg - master / / in accordance with the lot Python3 labelimg.py will open the visual interface by executing the commandCopy the code

The visual interface is as follows, supporting Pascal VOC and Yolo formats for saving annotations.

How to use the interface does not say, here recommend some shortcut keys to improve annotation efficiency, in addition, select View > Auto Save mode to automatically Save.

W create a new rectangle d next picture a previous picture del/fn + del delete the selected rectangle. My computer needs fn + del Ctrl/Command++ to enlarge Ctrl/Command-- shrink ↑→↓← move the rectangleCopy the code

Puppeteer automatically generates samples

The component is generated randomly based on the component type definition, and the style of the component is also random. As shown in the following example, each positive sample has a selector starting with an element- class, such as an element-button for subsequent retrieval of component category information.

Randomly generated pages

Write a page and randomly select some components to display. Start the service locally and open the following page, for example, http://127.0.0.1:3333/#/generator:

But this page is too different from the actual UI, with only positive samples and a very simple background. Here, the quality of the generated sample is improved by clipping out the fragments in the real UI interface and combining them with the auto-generated target components. The sample is shown below, should you see the auto-generated components in the page?

A screenshot of Puppeter is generated

Once the random page (http://127.0.0.1:3333/#/generator) is accessible, Puppeteer is used to script the page automatically, save screenshots, and retrieve the component categories and bounding boxes. The main logic is as follows:

const pptr = require('puppeteer')
// Store sample data in COCO format
const mdObj = {};
const browser = awaitPPTR. Launch ();const page = await browser.newPage();
await page.goto(` http://127.0.0.1:3333/#/generator/The ${Date.now()}`)
await page.evaluate((a)= > {
  const container: HTMLElement | null = document.querySelector('.container');const elements = document.querySelectorAll('.element');
  const msg: any = {bbox: []};
  // Get all the elements in the page with the.element selector
  elements.forEach((element) = > {
    const classList = Array.from(element.classList).join(', ')
    if (classList.match('element-')) {
      // Get the category
      const type = classList.split('element-') [1].split(', ') [0];
      // Calculate the bounding box and save it to MSGpushBbox(element, type); }}); });// Save sample data in COCO format
logToFile(mdObj);
// Save the UI screenshot
await page.screenshot({path: 'xxx.png'});
// Close the browser
await browser.close();
Copy the code

Sample evaluation

The number and richness of components in ali application UI interface samples are not balanced, so the number of each component can be balanced by automatically generating samples. How to evaluate the quality of automatically generated samples? The logic of how to automatically generate target components can be enumerated to 10,000, and it makes no sense to automatically generate 20,000 of these components.

How do you evaluate whether the richness and number of automatically generated samples are reasonable? That’s the question we’re exploring right now.

Model training

Detectron 2

Using Facebook’s open source target detection framework, Detectron 2, specify Faster R-CNN via merge_from_file.

from detectron2.data import MetadataCatalog
from detectron2.evaluation import PascalVOCDetectionEvaluator

from detectron2.engine import DefaultTrainer,hooks
from detectron2.config import get_cfg

cfg = get_cfg()

cfg.merge_from_file("./lib/detectron2/configs/COCO-Detection/faster_rcnn_R_50_C4_3x.yaml")
cfg.DATASETS.TRAIN = ("train_dataset",)
cfg.DATASETS.TEST = ('val_dataset'.)# no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 4  # Open more workers to feed data to GPU at the same time to prevent GPU idle
cfg.MODEL.WEIGHTS = "detectron2://ImageNetPretrained/MSRA/R-50.pkl"  # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.000025
cfg.SOLVER.NUM_GPUS = 2
cfg.SOLVER.MAX_ITER = 100000    # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 29  # only has one class (ballon)

# training set
register_coco_instances("train_dataset", {}, "data/train.json"."data/img")
# test set
register_coco_instances("val_dataset", {}, "data/val.json"."data/img")

import os
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

class Trainer(DefaultTrainer):
    @classmethod
    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
         ### Rewrite as required
    
    @classmethod
    def test_with_TTA(cls, cfg, model):
        ### Rewrite as required
    
trainer = Trainer(cfg)
trainer.resume_or_load(resume=True)
trainer.train()
Copy the code

The product of the Detectron 2 model training is a.pth model file. The.pth format of the model file can be seen in this Pytorch.

Pipcook

Pipcook has already wrapped up the code for data collection, data access, model training, and model evaluation. We don’t need to write Python scripts for these engineering links. The detection link in Pipcook is Detectron 2 using the Faster RCNN algorithm, you can go to Pipcook Plugins to see the implementation.

Here is sample code for target detection using Pipcook.

const {DataCollect, DataAccess, ModelLoad, ModelTrain, ModelEvaluate, PipcookRunner} = require('@pipcook/pipcook-core');

const imageCocoDataCollect = require('@pipcook/pipcook-plugins-image-coco-data-collect').default;
const imageDetectronAccess = require('@pipcook/pipcook-plugins-detection-detectron-data-access').default;
const detectronModelLoad = require('@pipcook/pipcook-plugins-detection-detectron-model-load').default;
const detectronModelTrain = require('@pipcook/pipcook-plugins-detection-detectron-model-train').default;
const detectronModelEvaluate = require('@pipcook/pipcook-plugins-detection-detectron-model-evaluate').default;

async function startPipeline() {
   // collect detection data
   const dataCollect = DataCollect(imageCocoDataCollect, {
    url: 'http://ai-sample.oss-cn-hangzhou.aliyuncs.com/image_classification/datasets/autoLayoutGroupRecognition.zip',
    testSplit: 0.1,
    annotationFileName: 'annotation.json'
  });
  const dataAccess = DataAccess(imageDetectronAccess);
  const modelLoad = ModelLoad(detectronModelLoad, {
    device: 'cpu'
  });
  const modelTrain = ModelTrain(detectronModelTrain);
  const modelEvaluate = ModelEvaluate(detectronModelEvaluate);

  const runner = new PipcookRunner( {
    predictServer: true
  });
  runner.run([dataCollect, dataAccess, modelLoad, modelTrain, modelEvaluate])
}
startPipeline();
Copy the code

Model to evaluate

Evaluation indicators

Python3 Introduction to Machine learning classical Algorithms and Applications [MOOCs]. If you don’t have time to read it, you can read the notes written by someone else. Chapter 10 goes into great detail on evaluating classification results using accuracy and recall.

Ah, let me explain it briefly.

The accuracy rate can be understood as the accuracy rate. For example, 100 buttons are predicted and 80 of them are correctly predicted. The accuracy rate is 80/100. Recall rate can be understood as recall rate. For example, there are actually 60 buttons, and 40 buttons are successfully predicted, and the recall rate is 40/60.

The performance evaluation indexes of target detection are mAP and FPS. MAP calculates the average accuracy of all categories. However, since target detection results also have a boundary box in addition to categories, how to evaluate the prediction accuracy of this boundary box? Intersection over Union refers to the concept of IoU (Intersection over Union), which is used to represent the Intersection ratio of predicted boundary box and real boundary box.

As can be seen from the following evaluation results, when IoU=0.50:0.95, that is, the intersection ratio between the predicted boundary box and the real boundary box is between 0.5 and 0.95, the boundary box prediction is correct, and the accuracy rate AP is 0.772 at this time.

Average Precision (AP) @ [IoU = 0.50:0.95 | area = all | maxDets = 100] = 0.772 Average Precision (AP) @ [IoU = 0.50 | area = All | maxDets = 100] = 0.951 Average Precision (AP) @ [IoU = 0.75 | area = all | maxDets = 100] = 0.915Copy the code

Evaluation of the code

The evaluation code is as follows:

from pycocotools.cocoeval import COCOeval
from pycocotools.coco import COCO

annType = 'bbox'
# Test set Ground Truth
gt_path = '/Users/chang/coco-test-sample/data.json'
# Test sets predict results
dt_path = '/Users/chang/coco-test-sample/predict.json'

gt = COCO(gt_path)
gt.loadCats(gt.getCatIds())

dt = COCO(dt_path)
imgIds=sorted(gt.getImgIds())
cocoEval = COCOeval(gt,dt,annType)

for cat in gt.loadCats(gt.getCatIds()):
    cocoEval.params.imgIds  = imgIds
    cocoEval.params.catIds = [cat['id']]
    print '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --' cat['name'] '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --'
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()
Copy the code

Evaluate the results

At present, ali UI interface and the combination of automatically generated samples are used for training, and the mAP is basically around 75%.

------------------------------   searchbar   ---------------------------------
Running per image evaluation...
Evaluate annotation type* Bbox * DONE (t=2.60s). Accumulating Evaluation Results with conditionals... DONE (t = 0.89 s). The Average Precision (AP) @ [IoU = 0.50:0.95 | area = all | maxDets = 100] = 0.772 Average Precision (AP) @ [ Ious = 0.50 | area = all | maxDets = 100] = 0.951 Average Precision (AP) @ [IoU = 0.75 | area = all | maxDets = 100] = 0.915 Average Precision (AP) @ [IoU = 0.50:0.95 | area = small | maxDets = 100] = 1.000 Average Precision (AP) @ [IoU = 0.50:0.95 | Area = medium | maxDets = 100] = 0.795 Average Precision (AP) @ [IoU = 0.50:0.95 | area = large | maxDets = 100] = 0.756 Average Recall (AR) @ [IoU = 0.50:0.95 | area = all | maxDets = 1] = 0.816 Average Recall (AR) @ [IoU = 0.50:0.95 | area = all | maxDets = 10] = 0.830 Average Recall (AR) @ [IoU = 0.50:0.95 | area = all | maxDets = 100] = 0.830 Average Recall (AR) @ [ Ious = 0.50:0.95 | area = small | maxDets = 100] = 1.000 Average Recall (AR) @ [IoU = 0.50:0.95 | area = medium | maxDets = 100] = 0.838 Average Recall (AR) @ [IoU = 0.50:0.95 | area = large | maxDets = 100] = 0.823Copy the code

When we look at the results using a non-Ali UI, we can see some of the things that could easily be mistaken for something else.

Model service deployment

What we expect is that when an image is entered it will be returned to the model as predicted. So once you get the model file, you need to write a model service, receive a sample, and return the predicted results of the model.

from detectron2.config import get_cfg
from detectron2.engine.defaults import DefaultPredictor

with open('label.json') as f:
    mp = json.load(f)

cfg = get_cfg()
cfg.merge_from_file("./config/faster_rcnn_R_50_C4_3x.yaml")
cfg.MODEL.WEIGHTS = "./output/model_final.pth"  # initialize from model zoo
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(mp)  # only has one class (ballon)
cfg.MODEL.DEVICE='cpu'

model = DefaultPredictor(cfg)

def predict(image):
    im2 = cv2.imread(image)
	out = model(x)
    data = {'status': 200}
    data['content'] = trans_data(out)

# EAS python sdk
import spark

num_io_threads = 8
endpoint = ' ' # '127.0.0.1:8080'

spark.default_properties().put('rpc.keepalive'.'60000')
context = spark.Context(num_io_threads)
queued = context.queued_service(endpoint)

while True:
    receive_data = srv.read()
    try:
        msg = json.loads(receive_data.decode())
        ret = predict(msg["image"])
        srv.write(json.dumps(ret).encode())
    except Exception as e:
        srv.error(500, str(e))
Copy the code

After the model is deployed, it can be called directly with the access link such as example.com/api/predict… To predict.

Model application

After the model is deployed, it can be called in our application to get the predicted results (component categories and bounding boxes in the visual draft) and compared with the JSON tree exported from the visual draft. To obtain information with a component (D2C smart.layerProtocol.com ponent of Schema fields) JSON description (after processing the final JSON as DSL input code generation).

const detectUrl = 'http://example.com/api/predict/detect';
const res = await request(detectUrl, {
    method: 'post'.dataType: 'json'.timeout: 1000 * 10.content: JSON.stringify({
        image: image,
    }),
});
const json = res.content;
Copy the code

The deployment and invocation of model services can be found in PAI’s documentation.

future

As the deep learning algorithm is selected, a large number of training set samples are needed, so the quantity and quality of samples are urgent problems to be solved.

So far we have more than 25,000 UI samples in 10 categories, and the auto-generated samples support 10 categories. However, the UI interface samples manually marked are ali products. Although the sample pictures are different, the design styles are similar and the design specifications are relatively uniform, which makes the richness of component styles insufficient and the generalization ability of design drafts outside Ali is worse than that of Ali. In addition, the layout and style of the samples automatically generated in accordance with certain randomization rules are different from the actual samples, so the quality of the samples automatically generated cannot be evaluated.

In the future, we will consider adding data sets with a large number of samples in the industry, optimize the logic of automatic sample generation, and explore methods to evaluate the quality of automatically generated samples.

The appendix

The data format

Two format specifications for object detection domain management datasets were used: MS COCO and Pascal VOC.

MS COCO

Data sets are managed in COCO format, with an IMG folder for storing images and a JSON file for storing target information. All sample information is stored in data.json file, which is difficult to manage when there is a large amount of data.

.├ ─ data. Json ├─ img ├─ ├─ 07.02.pngCopy the code

‘Images’ represents the image data,’ Annotations’ represents the tag data, and ‘categories’ represents the classification data. The image, tag and category of a sample are correlated with image_id and category_id.

{
    "images":[
        {
            "file_name":"demoplus-20200216-1.png"."url":"img/demoplus-20200216-1.png"."width":750."height":2562."id":1
        },
        {
            "file_name":"demoplus-20200216-2.png"."url":"img/demoplus-20200216-2.png"."width":750."height":1334."id":2}]."annotations":[
        {
            "id":1."image_id":2."category_id":8."category_name":"navbar"."bbox": [0.1.750.90]."area":67500."iscrowd":0}]."categories":[
        {
            "id":8."supercategory":"navbar"."name":"navbar"}}]Copy the code

Pascal VOC

The VOC-formatted dataset has two folders, Annotations (XML files) and JPEGImages (image data).

. ├ ─ ─ Annotations │ ├ ─ ─ demoplus - 20200216-1. XML │ └ ─ ─ demoplus - 20200216-2. XML └ ─ ─ JPEGImages ├ ─ ─ demoplus - 20200216-1. The PNG └ ─ ─ demoplus - 20200216-2. PNGCopy the code

Sample XML file contents:

<annotation>
  <folder>PASCAL VOC</folder> 
  <filename>demo.jpg</filename>/ / file name<source>// Image source<database>MOBILE-SAMPLE-GENERATOR</database>
    <annotation>MOBILE-SAMPLE-GENERATOR</annotation>
    <image>ANTD-MOBILE</image>
  </source>
  <size>// Image size (length, width and number of channels)<width>832</width>
    <height>832</height>
    <depth>3</depth>
  </size>
  <object>// Target information: category and bounding box<name>navbar</name>
    <bndbox>
      <xmin>0</xmin>
      <ymin>0</ymin>
      <xmax>812</xmax>
      <ymax>45</ymax>
    </bndbox>
  </object>
<annotation>
Copy the code