background

Imgcook’s ability to automatically generate code does two things: identify information from visual sketches, and then express that information into code.

The essence is to extract THE JSON description information from the design draft through the design tool plug-in, process and transform the JSON through the rule system, computer vision, machine learning and other intelligent restoration technologies, and finally get a JSON that conforms to the code structure and code semantics, and then use a DSL converter to convert it into front-end code. The DSL converter is a JS function, the input is a JSON, and the output is the code we need.

For example, the React DSL output is the React code that conforms to the React development specification. The core part of the React DSL output is the JSON to JSON section.

In the design draft, there are only meta information such as Image and Text, and the location information is the absolute coordinate. The directly generated code is composed of labels with the granularity of elements such as DIV, IMG, SPAN or View, Image and Text. However, in the actual development, we will componentize the materials with different granularity of UI interface. Basic components such as search boxes and buttons, or components with business attributes such as timers, coupons, videos, and rotators, or larger granular UI blocks.

If you want to generate component-grained code, you need to be able to identify the components in the visuals and translate them into componentized code. For example, the position of the rice cooker in the following visual draft is a video, but only picture information can be extracted from the visual draft and the code on the right is generated.

 

The actual generated code needs to be expressed in the Rax component RAX-Video as follows:

import { createElement, useState, useEffect, memo } from 'rax';
import View from 'rax-view';
import Picture from 'rax-picture';
import Text from 'rax-text';
import Video from 'rax-video';

<View className="side">.<Video 
    className="group" 
    autoPlay={true} 
    src="//cloud.video.taobao.com/play/u/2979107860/p/1/e/6/t/1/272458092675.mp4" 
   />.</View>
Copy the code

So we need to do two things:

  • Identification: Identify the componentized parts of the design draft, down to specific DOM nodes.
  • Representation: Representation with front-end components, including the introduction of component packages, replacement of component names, setting of component properties.

Technical solution

Identification scheme

According to the classification definition of intelligent ability, I1 level can be artificial auxiliary component code is generated through the design draft agreement, I2 level is pattern recognition analysis element component generated by rules algorithm component code, I3 level using the target detection model to identify the component, but target detection scheme is unable to avoid the complex background of the design draft the problem of low accuracy of the designed model. After the image classification scheme is explored, the accuracy of the trained model is high even though the design draft is very complex in a specific business domain. At present, the algorithm engineering link is optimized at I4 level to reduce the cost of business access.

(Component identification capability Model)

Phase L1 manual assisted generation: design draft component protocol

Tag the component name directly in the design draft and get the manual component information in the layer by parsing the tag when exporting JSON description data using imgCook.

(Manually set component protocol to generate componentized code)

Phase L2 rule automatic generation: style rule matching

This method requires manual markup of the visual draft to fill in component names and attributes. There may be many components on a page. This manual convention method gives developers a lot of extra work. We expect to automate and intelligently identify the UI that needs to be componentized in the visual draft.

The rule algorithm can automatically detect some components with general style characteristics. For example, four nodes with rounded corners whose width is greater than their height can be considered as button rule judgment. However, the generalization ability of rule judgment is poor and it cannot cope with complex and diverse visual representations.

L3-l4 intelligent assisted generation: learning to identify UI components

Finding the elements that need to be componentized in the visual draft, what components it is, its position in the DOM tree or in the design draft, is a problem that deep learning techniques are well suited to solve, can accept a large amount of rich sample data, learn and generalize experience, predict the categories of similar component samples, The similarity is no longer limited to the width and height style of regular algorithm, and the generalization ability is strong.

How to use deep learning to identify UI components? This article I define the problem as a problem of target detection, target detection using the depth study of UI interface can be found in the category of the code page componentized UI and bounding box, but this article is main UI component recognition as an example to introduce the use of deep learning the way to solve the problem, not considering the practical application. This article focuses on solving the practical problem of D2C componentized coding and shares how to apply component identification capabilities in real projects.

We are difficult to collect samples of all users to provide a common component recognition model accuracy is higher, the other components of different teams use category and style differences, may have the same type of sample but UI difference is very big, or samples of different categories of the UI is very similar, this will lead to identify the effect will be very poor. Therefore, it is necessary to support users to train a proprietary component recognition model with their own components as training sets. This paper introduces the application scheme of component identification by taking several components commonly used in Amoy marketing as examples.

Image – based object detection scheme

How to use deep learning to identify UI components? In this paper, the knowledge of target detection is introduced in detail. The image of visual draft is used as input to train a target detection model, which is used to identify components in the picture.

(Target detection model training and prediction path)

Pictured above, training target detection model need to input a large number of samples, sample is visual version of the whole image, and the need to tag to the picture you want to identification of the model components, training model can identify the components of the target detection, when some new need to identify the design draft, the design draft image input to the model identification, model recognition results in the end.

There are some problems with schemes that use target detection:

  • Samples need to be completely manually marked, and UI images need to be collected to tag the components in the images. If you want to add a category, you need to re-mark each picture, marking cost is very large;
  • You need to identify both the right location and the right category. The background of visual draft image is very complex and easy to be misidentified.
  • Even if the type identified is accurate, there will be positional bias.

In the scene of imgCook intelligently generated code, the result of component recognition needs to be accurate to specific DOM nodes, and the target detection scheme needs to identify both the exact position and the correct category. The model accuracy of offline experiments is not high, and the accuracy of online applications will be relatively low. It is almost impossible to determine which DOM node the result of the final recognition should go to.

Image classification scheme based on layout tree

Since we can obtain the JSON description information of the image from the design draft, each text node and image node in the image already have location information, and a reasonable layout tree can be generated after imgCook intelligent restoration. So we can tailor the possible component nodes to the granularity of container nodes based on this layout tree.

(Image classification model training and prediction path)

For example, we can crop out all the div/ View nodes here to get a small collection of images, and then send these images to a picture classification model for prediction, so we can turn an object detection problem into an image classification problem.

The model will assign a probability value to each image in each category. The higher the probability value of a certain category is, the higher the probability of the model predicting that the image is of this category. We can set a confidence level of 0.7, and when the probability value is greater than 0.7, it is considered as the result of final classification. For example, in the figure above, only two images are finally credible recognition results. If the accuracy of the classification is very high, you can set the confidence level higher.

Compared with target detection, using the image classification scheme, samples can be automatically generated by the program without manual marking; You only need to identify the category, and if the category is accurate, the location information is absolutely accurate. So we use the image classification scheme based on the result of layout recognition, and the recognition accuracy is greatly improved.

Express scheme

The LAYOUT algorithm generates the JSON Schema after the layout and enters the component classification recognition layer. The component recognition result will be updated to the JSON Schema and passed to the next level.(Component classification identifies where in the technology hierarchy)

We can visually see the result of component recognition. The recognition result will be hung in the smart property of this node.

(Component classification results)

It can be seen from (Image classification model training and prediction path) that multiple videobTN nodes may be identified due to model accuracy problems after component classification and recognition of images cut out according to layout structure.

Now we need to find the node that needs to be replaced with component Video according to the result of component identification. These problems will be encountered:

  • How do I find DOM nodes that should be replaced with components based on component identification results?
  • Once you find the node that should be replaced with a component, how do you know what the component name is?
  • How do I extract component property values from the visual draft after component names have been replaced? For example, the time required by the VideoBtn component here, or other properties such as the poster property of the cover image of the video component, or the text property of the Button component?
  • What if identification is inaccurate?
  • After the recognition results are applied, nodes are replaced with components, and can you render the wySIWYG effect on the canvas?

Based on these problems, in order to finally apply the results of component identification to engineering links and support users’ personalized component requirements, we need to provide a set of open intelligent material system to support components configurable, identifiable, intervening, rendering and coding.

  • Configurable – Users can customize component libraries for component identification, intervention, rendering, and coding.
  • Identifiability – Enables component identification and expression by training samples of custom components.
  • Interventionability – Editor intervention, where component types and component properties can be manually changed.
  • Renderable – Custom canvas rendering capability that allows users to package custom components into a canvas and support component rendering.
  • Codesable – Supports component-granularity codes. DSL receives D2C Schema and user-defined components and generates component-granularity code.

The whole process of component identification application is as follows. After the user configures his component library, he also needs to configure the model service that identifies the component. The model service identification component is invoked in the component identification stage of visual draft restoration. Call the configured logic library to express the component recognition result (SMART field) into component (componentName), and detect the component property information that can be obtained in the visual draft to supplement component property. Finally rendering in a canvas requires pre-configuring the canvas resources to support component rendering.

(Component identification application process)

Here is a detailed description of how the business logic library takes on the application expression of component identification results during the business logic generation phase, and how the canvas supports component rendering to visually express identification results.

Logical applications express x business logic libraries

One of the core functions of the business logic library is that users can customize the identification function and expression function, and call the identification function and expression function for each node in the business logic generation stage. The recognition function is used to determine whether the current node is the desired node, and if so, the corresponding expression logic is executed.

For example, the result of component identification will be placed on the SMART node of D2C Schema protocol. We can customize the identification function to determine whether the current node is identified as a component. The difficulty here is that there may be multiple nodes identified as components, so it is necessary to accurately determine the nodes to be expressed as components, because some nodes are incorrectly identified, and some nodes are correctly identified but need to find appropriate nodes instead of directly changing the componentName of this node.

For the video time display component videobtn here, there are multiple identification results. According to this result, it is necessary to find the corresponding node that needs to be replaced by the front-end video time component videobtn, and replace the componentName of this node with videobtn. The component name is associated with the videOBTN category of the component and the label input to the component when the component is recorded, that is, the category of the component needs to be recorded for component identification.

Therefore, we need to add some filtering rules in the custom recognition function. For example, if multiple nodes with nested inclusion relation are identified as VideobTN, only the node at the innermost layer is taken as the recognition result.

/* * allSchema Original data schema * CTX context * execution time: execute once for each node
async function recognize(allSchema, ctx) {
  // ctx.curSchema - Currently selected node Schema
  // ctx.activeKey - The current Key is selected

  // Determine whether the node is recognized as videobtn
  const isVideoBtnComp = (node) = > {
    return _.get(node, 'smart.layerProtocol.component.type'.' ') = = ='videobtn';
  }
  // Whether any child nodes are identified as videobtn
  const isChildVideoBtnComp = (node) = >{
    if(node.children){
      for(var i=0; i<node.children.length; i++){
        const _isChildVideoBtn = isVideoBtnComp(node.children[i]);
        if (_isChildVideoBtn) {
          return true;
        }
        returnisChildVideoBtnComp(node.children[i]); }}return false;
  }
  // If the current node is the Videobtn node we need (the node itself is identified as videobTN and its children are not identified as Videobtn),
  // Returns true, putting the object into the logic that expresses the function
  constisMatchVideoBtn = isVideoBtnComp(ctx.curSchema) && ! isChildVideoBtnComp(ctx.curSchema);return isMatchVideoBtn;
}
Copy the code

Then customize the expression function. If the output is true after a node executes the recognition function, the corresponding expression function will be executed. Below change componentName to VideoBtn in the custom expression function and extract the time information as the attribute value of the VideoBtn component.

/* * JSON original data schema * CTX context */
async function logic(json, ctx) {
  getTime = (node) = > {
    for(var i=0; i<node.children.length; i++) {
      if(_.get(node.children[i], 'componentName'.' ') = = ='Text') {
        return _.get(node.children[i], 'props.text'.' '); }}return "Prefer";
  }
  
  // Set the node name to VideoBtn of component @ali/ pcom-imgcook-Video-58096
   _.set(ctx.curSchema, 'componentName'.'VideoBtn');
  // Get the time as the component property value
  const time = getTime(ctx.curSchema);
  // Set the captured time as the value of the component's VideoBtn data property
  _.set(ctx.curSchema, 'props.data', {time: time});
  // Delete the child node under VideoBtn
  ctx.curSchema.children = [];
  return json
}

Copy the code

After the component-identified results are expressed through the business logic layer, the componentized Schema can be obtained and the componentized code can be generated.

This is an example of identifying the location of videobTN in the visual draft through the component classification model and finally applying the code with the front-end component @ali/ pcom-imgCook-Video-58096.

If you want to identify the Videobtn category in the visual draft, you can replace the commodity picture in the visual draft with the video. For example, with the RAx-Video component, we can add a custom expression function to find the image node at the same level as the Videobtn node. And replace this node with the RAx-Video component.

/* * JSON original data schema * CTX context */
async function logic(json, ctx) {
    const getBrotherImageNode = (node) = > {
      const pKey = node.__ctx.parentKey;
      const parentNode = ctx.schemaMap[pKey];
      for(var i=0; i<parentNode.children.length; i++){
        if (parentNode.children[i].componentName == 'Picture') {
          returnparentNode.children[i]; }}}const videoNode = getBrotherImageNode(ctx.curSchema);
  	_.set(ctx.curSchema, 'componentName'.'Video');
    _.set(ctx.curSchema, 'props.poster', _.get(ctx.curSchema, 'props.source.uri');
    _.unset(ctx.curSchema, 'props.source');
    return json
}
Copy the code

Using business logic library to the benefits of the application components recognition result is: the component recognition can decouple and business logic, and the user of the component is not affirmatory, the name of each component and attribute is different, after the application logic is different also, the business logic library can support user-defined component application demand, component identification results won’t be able to use the ground.

Visual rendering expression X canvas build

If the editor canvas does not support rendering components, component nodes will be rendered as empty nodes and cannot be displayed in the canvas, and the WYSIWYG effect after the visual restoration will not be visible. Component rendering support on the canvas is optional but necessary.

Component support is currently packaged into canvas resources as NPM packages, giving imgCook users the ability to customize editor canvas with the help of iceluna’s open rendering engine SDK. The user can select the required components to package and build, and the canvas resources obtained after the construction will take effect through the configuration.

(Editor Canvas build architecture)

The landing

At present, a specific component recognition model has been trained for the common rote and video components in Taosystem. The online full-link support component can be configured, identified, rendered, intervened and coded, and has been applied in the double 11 venue and juhuasuan business. The model recognition accuracy of this kind of component sample training for specific domain is high, up to 82%, and the online application is feasible.

(Full link demonstration of component identification application)

future

The application of component recognition capability requires users to configure components, train models for identification, and construct canvas resources for rendering. Component configuration and canvas construction are relatively simple, but for the component library owned by users, corresponding component sample images need to be generated for this component library for training model. At present, the samples used for the training of the special component identification model of Amoy system need to be collected manually or automatically generated by writing the program. If users are allowed to collect or write the sample generation program themselves, the cost is large.

There are some users hope to be able to access the component recognition ability, but the ability to recognize generalization ability depends on the model, the model generalization ability rely on training model USES the samples, we are unable to provide an universal model can identify all of the components, so you need to provide users with a custom model and the ability to automatically generate samples, reduce the cost of access to the greatest extent.

(Sample management, model training, model service application one-stop management prototype diagram)

At present, the sample manufacturing machine has the ability to automatically generate training samples for users by uploading design draft, and the algorithm model service also has the ability of online training, but it has not colluded with the whole process online. In the next step, the whole process needs to be online to support self-iteration of the model according to online user data feedback.