The text/net
This article is about sharing material with the external community, as follows:
To introduce myself
Everybody is good, introduce myself first, I am from ali tao is a technology of net, graduated in 2010 to join the taobao front, has been responsible for taobao front-end wireless infrastructure library, taobao wireless performance optimization system, build taobao marketing system, is now in charge of taobao shoppers and Tmall brand marketing team, responsible for the front-end intelligent platform imgcook code generation, Imgcook today. Check out the video to see what imgCook can do. Currently, you can generate view code, data binding, and some logic code from your design draft. Imgcook is open for public use. Visit imgcook.com.
Imgcook background
The origin of
Imgcook is the main battlefield of Ali e-commerce, and the front desk products are in various forms. According to incomplete statistics, the annual increment of light building modules is more than 1W, and the annual increment of pages is more than 100W (a lot of marketing activities are included here). Can reuse as far as possible to build reuse, a large number of special business nature difficult to reuse, this 1W multiple modules and a variety of C-end strong mind product pages or need front-end development.
The development course
Under huge business pressure, in 2017, the front-end team from Taobao was born through the design draft directly generated maintainable front-end code tool Da Vinci (imgCook), and then gradually optimized, in order to reduce the constraints on the design draft specification, gradually introduced CV, NLP to solve the problem of layer recognition and semantic problems, Imgcook officially opened in January 2019. After that, we continued to optimize the restore effect and extended it to generate logic code. At the same time, we improved the platform extension capability and developed into IMGCook 2.0. Imgcook 3.0 is currently being planned and implemented, which mainly addresses enhanced generation of PRD (requirements Document understanding) based on NLP, assisting generation of business logic code and part of server side business logic code. The generated artifacts range from view code, data field bindings, to business logic code. At present, the module development time recorded on the platform is reduced by about 40%, and the overall front-end development efficiency is about 40% after combining with the weighted average of internal and external user research.
Front-end industry efficiency improvement analysis
Front-end research and development efficiency improvement, this is a platitude at the same time is an enduring topic, front-end framework, front-end engineering, visual construction and other fields to solve the problem are a variety of dimensions of efficiency improvement. From proCode, LowCode, noCode at the very beginning to hpaPaas (High productivity application platform) platform, on the one hand, through the reuse of materials, visual configuration, full-link online R&D deployment and other means, comprehensively improve the efficiency of research and development. On the other hand, by enabling other roles (operations, PD) to transfer production relations, the front-end manpower is released. AUTOCODE defined by ImgCook generates code directly based on design draft or future PRD. However, due to accuracy issues such as design draft recognition, it also incorporates LowCode’s visualization capabilities to intervene. The cost of visual modification is relatively low. Meanwhile, the modified results can be used for self-iteration of the model. However, due to some complex business logic, some visualization forced construction is more than the cost of writing code, which is also the reason lowcode does not force Nocode, so we also integrate the advantages of ProCode, the generated maintainable code can be twice iterated, and finally can be applied to the business of different complexity scenarios. Here, our main efficiency improvement strategy is AUTOCODE one-click code generation, ultimate efficiency improvement, visual construction in the link more for accuracy intervention correction and model iteration.
Efficiency improvement analysis of intelligent related industries
Here, the improvement of front-end R&D efficiency brought by intelligent one-click code generation is the most direct, and the influence of intelligence on other industries is also very large. For example, industry 4.0 proposed in recent years with intelligence as the background, the core strategy of industry 4.0 probably includes intelligent manufacturing process, visualization, standardized management collaboration, cross-field integration to achieve efficient production; Similar to the front-end industry, the underlying facilities are becoming cloud-based and gradually standardized (data standardization, service standardization, etc.), the front-end engineering is becoming mature, the integration of collaborative research and development across PD/designer/server, and the visualization and intelligence of business personalized customization production. All these strategies also exist in the front-end industry. Looking at the results brought by intelligence, its leading project xiamen Yuanhai Fully automated intelligent wharf has reduced front-line personnel by 70% and increased efficiency by 20%. The financial sector, known as the mother of the industry, saw a 15 percent drop in the proportion of operating desk staff in its typical smart bank branches. After the transformation of operation counter personnel, the compound talents were increased to 90%; After the addition of super counter machines, self-service foreign exchange machines and virtual cash machines, the reduction of network area and teller staff will further reduce costs. In summary, after the intellectualization of various industries, efficiency will be improved and cost will be reduced. Certain types of personnel will be reduced, and certain types of personnel will be transformed and upgraded, and quality will be improved or business will be increased. So the analogy is that after a few years of intelligence in the front-end industry, the overall performance will be greatly improved, and some simple repetitive tasks will be replaced by intelligence. There may also be some new positions, such as business logic configurator (working with code generation robots), and the front-end upgrade will do more challenging tasks.
Imgcook introduction
ImgCook
The target
Imgcook’s goal is to generate code directly from (designs, prototypes, PRD, APIHub, CodeHub, etc.) in an intelligent way. At present, the accuracy of design draft to code is continuously optimized, which can generate all view codes (HTML + CSS), guess binding of data fields, and intelligent identification of some logic points with high reuse; What is left to be covered is a large amount of personalized business logic. Currently, the generation of business logic code is supplemented by structured PRD documents, including the generation of server-side business logic code, so the resources currently rely on are from design draft to PRD. Our ultimate goal is to intelligently generate more and more accurate code from (design draft, prototype, PRD, APIHub, CodeHub, etc.) to become codeRobot.
The core function
Imgcook’s current core function is to generate code directly from design draft through comprehensive means such as CV/NLP deep learning, traditional machine learning, expert rule system and algorithm engineering.
The product uses moving wire
To be specific, the current product uses dynamic lines. After importing the design draft, you can generate code with one click. You can intervene in the visual editor to edit what you see and what you get. Imgcook’s advanced team customization capabilities support all dimensions of customization capabilities. For example, if you want to customize your own projects, you can customize your own projects using the VS Code Plugin, imgCook-CLI, and so on. To meet the needs of generating code for different scenarios.
Step 1: Import the design draft
Step two: Visual intervention
Step 3: View the generated code (optional)
Step 4: Import project link (VS Code plug-in directly import)
Optional Step: Advanced team customization
High-frequency customization:
To summarize, the current product capability picture looks like this:
The current usage is as follows:
- Availability = code generated directly through imgCook last retained code posted by GitLab/all code posted.
The technical architecture
Imgcook’s technical architecture is as follows: From bottom to top, it is based on algorithmic engineering framework and product, CV analysis and NLP analysis of visual draft respectively. Multi-dimensional identification of elements to generate code is presented visually after identification. Visual intervention is performed if identification error occurs, and additional logic is added visually. The application is then integrated into its respective engineering links (VSCode plug-in, WebIDE plug-in, imgcook-cli). The generated code also supports customization, most commonly DSL (React/ Vue/applets DSL… And plugins (different teams have different directory specifications, etc.). At the same time, in the whole technical system, we pay the most attention to technical measures, such as code availability, model accuracy and efficiency improvement data.
Core technical difficulties
Let’s take a look at some of the more complex parts of the technology diagram:
Intelligent recognition expression disassembly
First of all, there is the disassembly problem of intelligent recognition expression. Currently, there is no direct end-to-end solution available in the industry. There are also solutions similar to Pix2Code screenshot2code in the industry. Especially for c-side visual reproduction degree, C-side designers do not accept pixel-level deviation. Specifically, let’s look at how to disassemble into solvable problems. Intuitively, in order to generate the code required for expression, multi-dimensional information input and extraction are required, and various detailed meta-information (image, text, style, attribute, etc.) are required, and reusable units (materials) with different granularity are also required to be extracted. And the dynamic logic behind extraction (dynamic fields, loops, interaction logic, and so on).
The specific technology above how to achieve layered, first from top to bottom, first input the design draft, layer information processing, each layer is as follows:
- Layer processing layer: mainly separate the layers in the design draft or image, and sort out the layer meta-information combined with the recognition results of the previous layer.
- Material identification layer: mainly through the image identification ability to identify the material in the image (module identification, atomic module identification, basic component identification, business component identification).
- Layer reprocessing layer: Further normalizes the layer data of the layer processing layer.
- Layout algorithm layer: Convert absolute positioning layer layout in 2d to relative positioning and Flex layout.
- Semantic layer: semantic expression of layers through multi-dimensional features in code generation side.
- Field binding layer: the static data in the layer is combined with the data interface to do interface dynamic data field binding mapping.
- Business logic layer: Generates business logic code protocols for configured business logic through business logic identification and expressers.
- Visual choreography: finally output the code protocol that has been intelligently processed by each layer, and use the visualization engine to present and deal with what you see and what you get, which can be used for visual intervention and supplement.
- Code engine layer: finally, after manual intervention, the more accurate protocol inside, through the expression ability (protocol to code engine) output various DSL codes.
- Engineering link layer: Finally, output each engineering project environment through VS Code plug-in and webIDE plug-in.
UI information architecture identification
The above material is the most complicated in recognition layer, involves the whole UI information architecture, according to the traditional front-end processing way, intuitive experience is also commonly application project contains more than one page, the page is divided into several blocks, and internal split into atomic again the block, block may contain the business component, may contain the foundation of the reusable components, Components are also composed of non-detachable components. These are intuitively easy for a brain with a front-end industry background to process, but not so easy for a machine to understand. Direct end-to-end extrapolation of results is currently quite complex, and especially for business components, each business domain has a completely different definition. So our current strategy is to break down the whole project from big to small, from the outside to the inside; Multiple models cooperate, and a single model is responsible for identifying a certain type of feature. Here the business component model is open, with each business domain defining its own business component recognition capabilities.
The following takes the land-based Juhuasuanpage as an example. The page segmentation model is first used to divide the page into multiple blocks from top to bottom. The second Block in the middle is more complex. Then proceed to identify whether there are business components and base components in the block, such as Slider for the first, Tabbar for the third, and BottomBar for the fourth.
Basic component identification – Target detection model
Tabbar is a very common base component in C-side business. Here is an example of how the base component identification model is done.
- Problem definition: Problem definition is the most critical step. Firstly, basic component identification is a target detection problem. Then, it is necessary to define the targets to be detected and the core features, so as to provide a basis for making positive and negative samples in the future.
- Sample generation: the cost of front-end domain oriented sample creation is still relatively low. Various screenshot mocks and other methods can be used to generate samples, but the samples are not rich and balanced, and need to be constantly adjusted according to the model effect. If necessary, it is necessary to collect more samples of real scenes for annotation. At present, our sample storage format is Pascal VOC.
- Modeling: There are many models for target detection in the industry, such as SSD YoLo, etc., but most of the models are really for the detection of physical characteristics, such as detection of flowers, cats, dogs, etc. The front-end component recognition focuses more on the detection of contour features. Through the tests of various models, the final effect of Detectron2 is the best among them, about 75% mAp. It is currently under continuous optimization.
- Training prediction: Detectron2 is a relatively large model and is currently trained and predicted independently on the GPU server.
Here’s what imgCook actually looks like:
Logical point problem analysis
Next, talk about intelligent identification and generation of logic points. First of all, from the most intuitive to get a design draft to analyze what business logic may be hidden behind it. In turn, you can see that there is circular logic, white background diagram logic, large probability is what field logic, but also can see that there may be a coupon receiving logic… Explain that a lot of logic can be identified, and that these small logic points are relevant to the domain background experience of the business domain.
The above business logic is analyzed from the sense, how to unify so many logics? Next, the technical analysis is carried out, and the front-end business logic analysis is in-depth. The concept of finite state machine is first introduced for modeling and analysis, which generally includes several elements. Page what condition timing (event timing), what action (action), from what state (state) to another state (secondary state). The corresponding data-driven front-end domain can be mapped as: both the state and the secondary state can be expressed by data state. Data change drives the change of UI, triggers events (life cycle, events, scheduled tasks, etc.) to generate actions (Action, OP/operation) and further changes the data, and data drives the change of UI. In this way, the whole front-end business logic can be included, and there are three core elements: one is trigger timing, one is trigger action, and one is data binding, which is treated as a special OP.
So after the induction, we use these two elements can express all the business logic, deep learning here is very good at in the field of identification, each logical point here can use different means of identification for, after all, the accuracy of the model can not guarantee 100%, in real business floor, also introduced more rules recognizer to assist in the identification, Such as regular and custom function recognizers, so that the recognition and expression of the entire logical point in the business is intelligently assembled through the logical point. It has finer granularity than traditional business component packaging, while assembling intelligently according to business requirements.
Finally, the ability to identify and express logic points is deposited into a set of productized capabilities. Each business can customize its own logic library and then automatically apply it to its own business development. The whole link, from new logical classification, sample production, model training to online deployment, can be directly completed by productization, without the background of deep learning threshold. Due to the high resource cost of model training deployment server, only the corresponding service is provided internally.
conclusion
Imgcook will continue to focus on front-end intelligent development. We expect AI to further empower front-end development. Currently, in addition to Design2Code, What is being done is PRD2Code, and Service2Code for the Service understanding of the server data Service, to further generate intelligent generation covering more front-end code and part of the server logic code. At the same time, under the background of front-end intelligence, we will make further attempts in the front-end intelligent domain. Currently, there are many applications in image perception and real-time user intention prediction. At the same time, in order to better popularize the front end intelligence and let more front ends participate in it, we cooperated with Google TFJS team to open source the front end algorithm engineering framework Pipcook, providing the front end with machine learns-related qiangzhi ammunition, and also making the front end enter the field of machine learning with a lower threshold. At the same time, we will gradually improve and integrate javascript ecology and machine learning ecology. At present, there are few front ends with machine learning background at home and abroad, but there are still some people with machine learning background who know javascript. We expect more front end students to pay attention to the direction of front end intelligence and join us
Front-end intelligence is one of the four directions of Ali Economic Committee. We are a virtual team, and Tao D2C intelligence team is one of the core teams. We are a front-end team that creates smart and International with the most understanding of AI with the romance of poets and the rigor of scientists. Define and create a new future together.
Fun Team Card
Ali – Tao -D2C intelligent front end team