Text/Ali Cloud Flow Product team – Xu Yu

With the continuous iteration of the training algorithm, the recognition accuracy of the model is also improving steadily. However, in view of the current intelligent process, the recognition accuracy cannot reach 100%. Moreover, some scenes are too close in expression form, leading to little difference in recognition. And the data generated is identified as a flat array, such data cannot be directly used in the downstream, whether the canvas engine rendering, or code generation, is required to have a three-dimensional data, namely DSL. This paper will introduce the design ideas and implementation of Dumbo in this aspect.

background

Dumbo is an intelligent development platform that uses image recognition algorithm to generate front-end code with one click. At present, it has landed in several Ali Cloud console and middle and background projects.


Firstly, the basic link of Dumbo is to identify various information data in a picture by using intelligent technology, and then convert it into JSON description (Schema) in accordance with convention specifications through DSL engine. After that, the visual platform is used for manual fine-tuning and correction, and finally the React module code is generated.

Whether building a visual platform or directly using Schema to generate React module code, it is necessary to convert the data obtained by image recognition algorithm into a Schema that can be recognized by the building platform/code generation module. This part is the content to be explained in this paper.

The solution

Considering Dumbo has the open model training ability, but the data and DSL transformation after model training and recognition may be independent between different teams, and the basic component library is also different. In the early stage, we made the internal Adapter transformation engine, mainly for Fusion/Antd transformation. However, many teams will have their own component libraries and business components that can be annotated during model training, but our default DSL does not support such identification, so a specific module is required to handle this data and support subsequent extensions. Therefore, the DSL transformation layer is independent, as a separate project, the goal is to become a simple configuration can achieve business customized DSL, quickly build a perfect Adapter service.

Before we can implement the DSL transformation engine, we need to know what the input is. What are the goals and outputs? This is a prerequisite for transformation, we know what the structure of the input is, and then we can process the input according to that structure. First, take a look at our input structure:

[
    {
        "name": "Message",
        "props": {
        },
        "probability": 1,
        "position": {
        },
        "id": "534df201-bdcb-11eb-8e5e-8fd6c9c0318c"
    }
]
Copy the code

The input structure is flat as a whole, so the input of our Adapter is a set of identified large arrays.

{"version": "1.0.0", "componentsMap": [{"componentName": "Page", "exportName": "Page", "package": "@alifd/ Fusion ", "version": "2.19.0", "Destructuring ": true}], "componentsTree": [{"componentName": "Page", "children": [], "props": {} } ] }Copy the code

Sechma is a three-dimensional structure that can be used directly by the platform and its associated dependencies. Therefore, based on the above input/output characteristics, we define the Adapter engine, the overall design is as follows:


implementation

Below, we expand each module and briefly elaborate the specific implementation process.

The pre-processing of Node Node is the first part of the pre-processing. The original Node Node is a flat data array, but our final goal is a structured and three-dimensional data, so we need to do some pre-processing of the data, so as to judge the location and identify the characteristics of the subsequent process more accurately. Here we take Table as an example, and it needs to undergo the following pre-processing Plugins:


First of all, we have processed the text, including but not limited to the abnormal separation of text, specific invalid text filtering, etc. There may be invalid characters and abnormal sentence separation in the text recognized by the machine, which may affect the judgment of the subsequent location information. Therefore, in the text preprocessing, we will merge the whole text, and merge and filter some words with high merging degree through the algorithm. In addition, some de-correlation processing will be carried out. The de-duplication mainly involves some specific components, and there are no children in the interior. Therefore, we need to remove some invalid copywriting in this part according to specific scenes, such as some special or numeric characters identified by characters of Icon buttons. This requires a repeat step to complete. Next is the relationship processing. We will deal with the relationship between nodes and nodes through the node depth, whether the child node contains the child node, etc. Through the position, overlap degree and similarity algorithm, the relationship between the child node and the parent node is preliminatively processed, and a vertical data is obtained.

Adapter processing involves processing the properties of each component and subdividing the specific internal attribute information. Roughly, the following operations are performed internally. Functions are determined by the complexity of the component. Let’s take Table as an example


First of all, we will find the node data related to Column Title in the child node, and reorganize the Column Title into table. Column node, which will be used as each Column of the Table. And we will extract the English Text inside the relevant child Text node as the Key value for subsequent data matching. Table.Column search will judge the existence of Table.Column according to the height of the first row and the overall recognized width information, and encapsulate it into a single node data. Next, the Table has a variety of attributes. Taking the Fusion component of our group as an example, the Table contains a series of sorting, filtering and selection fields. This part needs to determine whether the Table contains such attributes through the analysis of Icon. We will also process the data in the Table and generate a dataSource of identified type for downstream data rendering. We will also complete some basic attributes of the Table in our project.

The post-processing of Node nodes has basically completed some special attributes of nodes and some missing attributes of nodes, followed by adaptation of component libraries and dependency extraction, etc. Taking Table as an example, our process nodes are as follows:


During the execution, we will determine the underlying component library that we need to rely on for output, and find the corresponding fields from the corresponding library. Some attributes need to be converted according to the component library, and some internal attribute names are converted into the attributes of the corresponding component library. We will maintain a large component Map, annotating the attributes supported by the components and corresponding identifiers to determine the attributes needed for the final Sechma. Finally, the extraction of dependencies is also based on the basic component library selected by the user, the component information used by the user is obtained, and the DSL dependency description in line with the group specification is automatically generated. In addition, the overall format of the final DSL is also realized through this part of post-processing, and the user’s customized DSL will also be realized here to achieve the ability of users to customize the output DSL.

details

We have optimized DSL transformation for many times, during which we continue to find some bad cases. There will be a lot of internal logic to deal with this part, which cannot be handled in a unified way.

At the very beginning of the processing of Button, we defined the children inside the Button according to the design specification. Generally, the children inside the Button with Icon is an Icon Button, and the children inside the Button with text is an ordinary Button. However, what we found in the implementation is that recognition may misidentify “green” as a ranked Icon because it is close. This caused Button to change from ordinary to Icon Button, which was obviously unexpected. Therefore, we carried out refinement processing to distinguish the length of children, special strings and location information to predict whether a Button is an Icon Button or not, and finally achieved the goal that the conversion effect is consistent with the design.

There are vertical and horizontal label presentation modes for detail list in our component. Because the particularity of this model is not obvious, the overall presentation effect is not good after model recognition. Some items are not accurately identified. In this case, we need to judge whether the unidentified Item exists according to the location information on the basis of the identified Item, and what is the position of the label? We have carried out a series of processing, to maximize the realization of label location search. 1. There will be two branches for processing. The first branch is divided into groups according to rows to determine whether there is an Item in the row. Label is identified by the left and right/upper and lower nodes based on location. 2. Continue to process the data of other rows. In general, our interaction specification will not appear both vertical and horizontal. Therefore, we take the label positions of most items as all the label positions, and then search for lines that do not contain items twice, finally achieving the effect of optimized recognition.

There are other details, I don’t want to give you an example, of course, this is a very small point, there are many algorithms to optimize.

Looking forward to

Above, we have introduced the existing capabilities and solutions supported by our DSL engine. In the future, we will continue to optimize our DSL engine. We will process the features of a set of components with multiple algorithms according to our scenarios and component characteristics, and calculate the overall effectiveness. In this year, we will also plan to take the whole DSL as a link of intelligent identification service and make some customized functions for our group. Access parties can effectively reuse our characteristic algorithm or customize component characteristic algorithm. In the future, we will consider this as an intelligent DSL transformation market. It even allows users to freely combine corresponding plugins and Adapters in a visual interface to achieve online custom DSL scenarios. Of course, at present, our project is still in high-speed iteration, DSL engine also has a lot of inconsiderate places, there is more room for optimization, but also a lot of ideas to be implemented, welcome to have ideas to join us, you can contact [email protected], look forward to you and us all the way through, all the way to explore.



Tao department front – F-X-team opened a weibo! (Visible after microblog recording)
In addition to the article there is more team content to unlock 🔓