background
In the last article “Detailed explanation of ali 99 promotion activity page content recognition technology”, we introduced what kind of algorithm model we used to identify and complete automated tests in Taobao 99 promotion.
The difficulty of the sample problem
Taobao large presses have hundred module, thousands of pages, with similarity between modules, and perform a variety of status inside the module. If you want to accurately identify each module type, single module samples number to at least level, and manual annotation high cost, low efficiency, the data quantity is little, pure by manpower is cannot satisfy the demands of model. Based on this, today, I would like to introduce the technical scheme of generating large quantities of data samples behind model recognition.
Train of thought
The overall technical scheme is as follows, which will be described in detail respectively later:
Sample requirements for the model
The input identified by the algorithm model is the screenshots of each venue in 99, and the output is the name of the target module and its coordinate position in the screenshots. Model training is to input module rendering diagram, corresponding coordinate position and module type to the model for supervised learning. What the model needs is a large number of picture samples of each module.A module is composed of View and ViewModel, and View is fixed, ViewModel follows the scene is different, is dynamic change. So, if we can take this layer of DSL describing the View of the module, assist with dynamic ViewModel data, and render the View and ViewModel into images, we can generate an infinite amount of sample data.
Describe the View
After careful combing, the View is split into atomic elements (Text, Image, Shape) and groups of atomic elements, that is, the same logic as the nesting of containers and leaf node types at various levels in the HTML DOM tree. Based on the node type and node style DSL, we can describe a complete View.
{
"layers": [{
"frame": {
"y": 354 ,
"x": 44 ,
"height": 32 ,
"width": 312
},
"id": 2 ,
"type": "text" ,
"value": "Adidas Stan Smith" ,
"textStyles": {
"fontFamily": "Helvetica, sans-serif" ,
"fontSize": 24
}
}, {
"frame": {
"y": 0 ,
"x": 384 ,
"height": 342 ,
"width": 342
},
"id": 3 ,
"type": "image" ,
"value": "//img.alicdn.com/bao/uploaded/i1/TB1.mcuNpXXXXctXFXXSutbFXXX.jpg_350x350Q50s50.jpg_.webp" ,
"styles": {
"height": 342 ,
"width": 342
}
}, {
"frame": {
"y": 0 ,
"x": 384 ,
"height": 342 ,
"width": 342
},
"id": 4 ,
"type": "shape" ,
"styles": {
"height": 342 ,
"width": 342 ,
BackgroundColor: rgba(0, 0, 0, 0.1)
}
}].
"frame": {
"y": 0 ,
"x": 0 ,
"height": 4920 ,
"width": 750
},
"id": 1 ,
"type": "group" ,
"moduleName": "pmod-zebra-recommand-item"
}
Copy the code
In addition to node type and node style, the outermost moduleName represents the moduleName, id is to mark each child element, frame is the coordinate position of each child element, auxiliary algorithm model identifies the internal child element of the module, and value is only available in text and image. Corresponding text values and image links.
Get the DSL of the module View
There are three ways to obtain the DSL of the module View: 1. 2. Generated from sketch visual draft; 3. Obtain it from the page rendered by the browser.
I finally chose the third option, but gave up the first option because the code writing method is very different, a lot of presentation logic is also included in the JS code, and it has to deal with the mapping relationship of various for loop sub-views and styles, etc., which is too complicated. The second scheme, imgCook, is the existing technical scheme in the group. It is said that the accuracy of this part is good and has been continuously optimized. The reason for the final choice of the third scheme is that it can restore the module DSL with 100% accuracy and only needs to pay attention to the shape of the module when it is finally shown to users. You don’t have to worry about all the complicated business logic that developers do in the process, and the complexity is much lower.
Technical solution
In the development process, each module will have a corresponding module preview page after the completion of development. I used puppeteer to simulate a real browser and extract the node information of the module and save it as a standard DSL.
Cleaning the window. GetComputedStyle
Retrieving the style of the DOM node with window.getcomputedStyle returns an object containing 280 style attributes. Storing all 280 style attributes of each DOM node in the DSL causes two problems:
-
DSL files are redundant and large, which takes time to parse.
-
Increase algorithm students’ understanding of DSL and adjust costs.
First, hide the default property values. Most of the style attributes are default values, so we’ll strip the default style attributes out first.
css
{
alignSelf: 'auto',
.
}
Copy the code
The second step is to eliminate invalid attributes. There are about 20 style attributes commonly used by developers, and many of them are not effective. Remove the ineffective style attributes, for example:
{
zoom: '1',
writingMode: 'horizontal-tb',
.
}
Copy the code
The transform attribute value obtained by getComputedStyle is a matrix method matrix(). Interested students can learn to understand 2D transformation matrix. We use the puppeteer analog browser to set the screen width to 750, that is to say, the two values of translateX and translateY in the obtained transform value are converted to a number based on 750. If you want to render the DSL into a graph as described below (the algorithm student expects to simulate a variety of screen sizes to generate samples), you must convert the obtained transform value to the value of the corresponding screen device.
# In order to facilitate the algorithm students to better use the DSL rendering tool into the graph, here uses Python to implement
# screenshotShape is an array representing the screen width and height.
if 'transform' in style and 'matrix' in style['transform' ]:
matrix = style[ 'transform'][7 :-1].split( ',')
translate = list(map(float, matrix[- 2:]))
translateResult = list(map(str, [distance*(screenshotShape[ 0]/750 ) for distance in translate]))
matrix[- 2:] = translateResult
Copy the code
Through the above three steps, the final number of DOM node style attributes is generally maintained within 20, which can simplify the output DSL very much.
The DSL renders into images
Similarly, we can use puppeteer to manipulate pages and render the DSL into target module pages and take screenshots.
First, establish the mapping between the DSL and HTML tags
Second, if it is a DSL of Group type, it iterates recursively through all of its children, and so on. The complete rendering flow chart is as follows:
ViewModel Dynamic data
A module, applied to the 99th promotion conference, double 11 conference, the back style is the same, only the corresponding data is different, the dynamic data is generally commodity pictures and commodity information.
Idle fish have more than one hundred million data of goods, if the goods right away with the View to render data into modules, each module has tens of thousands of kinds of show form, and joint algorithm model identification in the process of actual input, can meet the requirements of sample size, also can accord with the actual identification of models scene, makes the model more improved accuracy.
The effect
Through such a sample generation channel, each module can provide algorithm students with tens of thousands of high-quality sample screenshots, so that the accuracy of the model can reach more than 98%.
Looking forward to
The above paper describes how to generate samples in batches to help solve the algorithm model’s identification of each module in the 99th promotion and Singles’ Day conference.
At present, the dynamic adjustment of the module DSL depends on the students’ understanding of the module, eg. Change the rounded borderRadius to generate more positive samples, or increase noise, EG. Delete commodity content nodes and generate negative samples. All these operations require the algorithm students to customize the CONFIGURATION of DSL. In the future, we hope to try to hand this part of the work over to the model as well, allowing the model to make decisions about sample generation, tweak parts of the DSL, and produce samples with richer and more reliable styles.
For more details, please continue to follow the Xianyu official account.
The Idle Fish Team is the industry leader in the new technology of Flutter+Dart FaaS integration, right now! Client/server Java/architecture/front-end/quality engineer recruitment for the society, base Hangzhou Alibaba Xixi Park, together to create creative space community products, do the depth of the top open source projects, together to expand the technical boundary achievement!
* Send resumes to small idle fish →[email protected]
More series of articles, open source projects, key insights, in-depth interpretation
Please look for the idle fish technology