F(X) Team of Alitao Department – Minchao

background

The semantization of UI elements has always been a puzzle to D2C and AI circles. Semantization is a key step in artificial intelligence in code generation products (such as D2C), and plays a crucial role in humanized design. At present, most of the common semantic technologies in the world are developed from pure fields, such as TxtCNN, Attention, Bert, etc. Although these methods have good effects, they still have certain limitations when applied to D2C products, because D2C vision is to become an end-to-end system. It is difficult to semantize pure fields. For example, it is difficult to bind appropriate semantics for the field of “¥200”, which may mean original price or active price. Even the exhaustive method of existing rule algorithms has certain limitations. Therefore, in order to solve the semantic task of interface elements, D2C needs to solve at least two problems: 1. Ability to generate element semantics that match the current interface 2. Reduce the constraint of users and do not require additional input of auxiliary information by users.

In recent years, reinforcement learning based on game theory has performed well in many fields, such as AlphaGo, robotics, automatic driving, games and so on. Its excellent performance has attracted many scholars’ research. In this paper, for the semantic problems of interface elements, Deep Reinforcement Learning (DRL) is introduced, a semantic solution based on Deep Reinforcement Learning is proposed, and a kind of Reinforcement Learning training environment suitable for semantic problems is innovatively constructed. Experimental results demonstrate the effectiveness of the proposed method.

In this paper, the semantic problem of interface elements is regarded as the decision problem in this situation. We directly start with the interface picture and take the interface picture as the input based on deep reinforcement learning. Deep reinforcement learning learns the optimal strategy (i.e., optimal semantics) through continuous “trial and error” mechanism. In addition, for those fields whose semantic names are difficult to output, a text classification model is adopted to ensure semantic effects. This paper is organized as follows: 1. In order to help readers identify the purpose of this paper, background and problems of field semantics in D2C products are introduced in detail. 2. In order to better describe the technical scheme of this paper, the key technology reinforcement learning and text classification model based on Attention mechanism are introduced in detail. 3. The semantic decision model based on reinforcement learning and text classification model based on attention mechanism are introduced in detail. 4. Summary and outlook.

Problem analysis

Up to now, D2C has introduced artificial intelligence model to solve and reduce the limitation and error rate brought by rule algorithm in field semantic function. Imgcook semantization describes in detail the rules-based semantization technology scheme currently used by D2C. The ability classification of field semantics in D2C big map is shown in the figure below. Last year, we enhanced the code semantics by using image and text classification model, and this year we introduced reinforcement learning to analyze the semantics in combination with UI context.

(Hierarchical semantic analysis ability)


D2C, as the benchmark of front-end intelligent products, has both innovation and problems on the way forward. Semantic transformation of graphic interface elements is a big stumbling block for intelligent code generation, which hinders D2C to become a real end-to-end intelligent system. Therefore, effective intelligent methods are urgently needed to help the implementation of D2C semantic functions. The problems faced by intelligent semantematization can be understood by the following figure:


By the above knowledge, the AI community existing technology is difficult to to coexist in graphic intelligent semantic case of interface elements, whether from the perspective of images or from the perspective of language has its own disadvantages: from the perspective of image, image recognition of the current detection, multimodal technology is difficult to deal with D2C often encountered by coexist; From the perspective of text, simple text classification model is difficult to solve some fields with polysemy, such as price. Therefore, in view of the difficulty of semantic semantics of graphic interface elements, this paper proposes a scheme that combines multiple AI models to solve such problems. At the same time, this scheme is a set of framework, which is not only suitable for semantic semantics of interface elements, but also suitable for multiple tasks that are difficult to be solved by a single AI model.

Technical solution

To solve the semantic problem of interface elements, this paper adopts a two-step scheme of semantic decision + text classification. The specific process is as follows: firstly, the interface elements are filtered according to the style rules based on reinforcement learning, and semantic names are determined for the polysemy elements that cannot be processed by the text classification model; Then, the non-polysemy elements that can be processed by the text classification model are classified. The specific framework is as follows:


Interface element semantic model of text classification task to handle the general is a polysemy fields, such as the “85” selections above, but the field has certain regularity on style, such as “selections” 85 followed by a line with deletion of price, you can think of semantic is activities, so this article technical solution ideas summarized as: Firstly, the semantic names of the polysemy elements with style rules are determined from the images based on the reinforcement learning decision model, and then the semantic names of the non-polysemy elements are recognized by the text classification model. The practice and experience of these two models are described in detail below.

Semantic recognition of polysemy elements

Introduction to deep reinforcement learning

Reinforcement learning is mainly used to learn a strategy to maximize the long-term reward and punishment value obtained by the interaction between an agent and the environment. It is commonly used to deal with tasks with small state space and action space. In the era of rapid development of big data and deep learning, traditional reinforcement learning cannot solve the problem of high-dimensional data input. In 2013, Mnih V et al. introduced Convolutional Neural Networks (CNN) in Deep Learning [1][2][3] into reinforcement Learning for the first time, and proposed DQN (Deep Q Learning Network) [4][5] algorithm. So far, the research on Deep Reinforcement Learning (DRL) has been started in the world. In addition, a milestone event in the field of deep reinforcement learning is the Go Century War between AlphaGo and Lee Sedol in 2016 [6][7]. The Go program AlphaGo developed by Google’s ARTIFICIAL intelligence team DeepMind based on deep reinforcement learning defeated the world’s top Go master Lee Sedol, which shocked the world. Therefore, the curtain of deep reinforcement learning from academia to public cognition has been opened.

Deep Reinforcement Learning combines the feature extraction ability of Deep Learning [8] (DL) and the decision-making ability of Reinforcement Learning (RL) [9]. It can directly make the optimal decision output according to the input multidimensional data. Is an end-to-end (end – to – end) decision control system, widely used in the dynamic decision-making, real-time prediction, simulation, games, game, etc, through the real-time interaction with the environment constantly, the environmental information as input to get the failure or success experience to update the decision network parameters, to study the optimal decision. The framework of deep reinforcement learning is as follows:


In the framework of deep reinforcement learning in the figure above, the agent interacts with the environment. The agent extracts the features of the environment state through deep learning and transmits the results to the reinforcement learning to make decisions and perform actions. After the actions are performed, the new states, rewards and punishments feedback from the environment are obtained to update the decision algorithm. This process is iterated over and over again until the agent learns the strategy that maximizes the long-term rewards and punishments.

Practical effect and experience

The practice of reinforcement learning in D2C semantic field binding is mainly divided into the following three parts: The first part is the construction of reinforcement learning training environment; The second part is algorithm model training, the third part is model testing. Details will be given one by one below.

Decision model training environment construction

The important factors of reinforcement learning include agent, environment, reward and punishment function design and step design. The idea of this paper is as follows: the semantic field recognition task is regarded as a process of playing a game, in which the algorithm model constantly updates model parameters according to the environmental feedback, and learns a law of how to maximize the punishment and reward functions. Specific as follows: Directly choose the module picture as the environment, and the border of the elements in the module as the agent. During algorithm training, the agent (the border of the elements) will move from top to bottom and from left to right, just like walking through a maze. Every step requires a decision to select the Action (the Action is the semantic field we want). Only when the Action is selected correctly can the agent “go” down, when the agent “go” through all the elements, it means that the algorithm model has learned the winning way.

This approach models the semantic field binding problem as a situational decision problem. Reinforcement learning model is used to find the semantic properties of fields. The key factors of reinforcement learning applied in semantic field binding tasks are defined as follows:

The whole module picture is regarded as the environment of reinforcement learning, and the state of the agent in the environment is the current environment state.

The frame of the element in the module is regarded as an agent;

The definitions of both are shown below:


Is the action (semantic name) selected by the agent (element border) as it moves from top to bottom and left to right in the environment (module picture);

An agent has to make a behavioral choice (semantic name) for every step it moves, earning points for correct choices and minus points for wrong choices. The definition is as follows:Where, if x is true, x is falseandIs the action taken in the current state and the action taken in the previous state (i.e., the selected semantic name), and c is the real semantic name of the element.


Reinforcement learning is a kind of unsupervised learning, but the semantic field recognition task in order to facilitate the creation of reward and punishment functions to label artificially (if there is this module image CSS code does not need to mark), the data set can be marked with LabelImg first, each element has a corresponding semantic field. The marking method is as follows (the first picture is module picture, the second picture is element marking information in module, and the third picture is semantic field information) :




Decision model training

Training the semantic model requires importing the environment constructed above, in which the intelligence will constantly “try and error” to learn the optimal strategy. In this paper, the value function-based deep reinforcement learning algorithm is used as one of the specific implementation technologies for semantic field recognition. The value function-based deep reinforcement learning algorithm uses CNN to approximate the action value function of traditional reinforcement learning, and the representative algorithm is DQN algorithm. The framework of DQN algorithm is as follows:


There are two neural networks with the same structure in DQN, which are called target network and evaluation network respectively. Output value in the destination networkRepresents the attenuation score when action A is selected in state S to evaluate the output value of the networkRepresents the value of action A when in state S.

DQN training is divided into three stages:

  • At this moment, the experience pool D is not full, and the experience tuple is obtained by randomly selecting behaviors at each moment T, and then stores the experience tuple for each step to the experience pool. This stage is mainly used to accumulate experience, and neither of the two networks of DQN is trained at this stage.

  • This phase uses– Greedy strategy (Gradually decreasing from 1 to 0) to obtain action A, while making decisions in the network, other possible optimal behaviors can be explored with a certain probability, so as to avoid falling into the problem of local optimal solution. In this stage, the experience tuple in the experience pool is constantly updated, and as the input of the evaluation network and the target network, is obtainedand. Then the difference is used as the loss function to update the weight parameters of the evaluation network by gradient descent algorithm. In order to make the training convergence, the weight parameters of the target network are updated as follows: the weight parameters of the evaluation network are copied to the target network parameters every fixed number of iterations.

  • In this stageFalls to 0, that is, the selected actions are all from the output of the evaluation network. The evaluation network and the target network are updated in the same way as in the exploration phase.

DQN, a deep reinforcement learning algorithm based on value function, carries out network training according to the above three stages. When the network training converges, the evaluation network will approach the optimal action value function to achieve the purpose of optimal strategy learning.

Decision model evaluation

The evaluation method of reinforcement learning model is different from the classification detection problem, and there is no measurement index such as accuracy. The measurement index of reinforcement learning model is that the score keeps rising and does not decrease, that is, the game does not start again. The experimental result of this paper is indeed that the score has been rising, there is no redo. The overall score changes as follows:


Decision model effect demonstration

The training results of reinforcement learning algorithm are shown as follows:


Semantic recognition of non-polysemy elements

Text categorization based on Attention mechanism

In this paper, Attention mechanism in natural language processing is used for text classification task. Attention Model (AM) is an Encoder-Decoder framework with Attention mechanism. Suitable for processing tasks where one sentence (or paragraph) generates another sentence (or paragraph). The Attention mechanism can make weighted changes to the target data and has the function of semantic translation output, while the fields in the field binding task have “gravity points”, such as field + style data. Therefore, the Attention model is better than the traditional text classification model, so the experiment is carried out.

Attention (AM) model: Weighted changes to target data. It is a resource allocation model, which is sensitive to some key words in the text and has a high understanding of the text. AM model is also an Encoder-Decoder framework in essence, which differs from traditional Encoder-Decoder in that the intermediate semantics after Encoder encoding are dynamic, and each word in the target sentence should learn the probability information of attention distribution of the corresponding word in the source sentence. This means that for every word Yi generated, the original intermediate semantic representation of C is replaced by Ci, which is constantly changing based on the currently generated word. The key to understanding the AM model is here, where the fixed intermediate semantic representation of C is replaced by a Ci that adjusts to the current output word and adds a variation of the attention model. Add AM model Encoder-Decoder framework to understand as shown in the figure:


Practical effect and experience

The attention-based text classification model can be divided into three parts in the semantic application of interface elements: training data set construction; Model training; Model testing. The following will introduce them.

Text classification model training data set construction

To embed the field, make 2 JSON files. First JSON file: The input is the pinyin of the field, and the output is the bound semantic field.


The second JSON file functions as a dictionary. The input fields find the corresponding key value from this file, which is convenient for embedding.


Text classification model training

The AM model network structure used in this paper is as follows:


The attention layer:


In the second step, calculate the weighted sum between the weight of attention and the input (called context) :


Step 3, RNN layer:


Text classification model evaluation and effect demonstration

Changes in loss during model training are as follows:


Here are some test examples:





Summary and Prospect

The scheme is a set of style decision + text classification two-step scheme, which uses reinforcement learning decision ability to solve the semantic name recognition problem of polysemy elements from the perspective of pictures. From the perspective of text, the text classification model is used to solve the semantic name recognition problem of non-polysemy elements. In addition, the scheme is a model training framework with wide application scenarios, which can be applied to many other tasks. In addition, the attention-based text classification model has the advantage that its label can be unlimited, and it can directly output each letter of the bound field instead of a fixed number of categories.

In the future, this scheme will focus on the environment construction of reinforcement learning, because this part is the core of decision model, and the training environment of reinforcement learning should be designed to be universal and easy to learn by agents. In addition, the method based on reinforcement learning is not only aimed at the semantic problem of interface elements, but also expected to be added into D2C layout group recognition in the future. It is expected to achieve: interface picture -> group module -> styled field binding -> unstyled field binding scheme framework. Another prospect is that in this scheme, the text interface can first use OCR text detection technology to extract the text, and then judge whether the text should be sent into the decision model or text classification model, so as to carry out semantics and realize a more ideal end-to-end system.

reference

[1] Ketkar N . Convolutional Neural Networks[J]. 2017. [2] Aghdam H H , Heravi E J . Convolutional Neural Networks[M]// Guide to Convolutional Neural Networks. Springer International Publishing, 2017. [3] Gu, Jiuxiang, Wang, et al. Recent advances in convolutional neural networks[J]. PATTERN RECOGNITION, 2018. [4] MINH V, KAVUKCUOGLU K, SILVER D, et al.Playing atari with deep reinforcement learning[J].Computer Science, 2013, 1-9. [5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement Research progress,2015,518(7540):529-533. (in Chinese) [7] Liu Shao-tong. Research on the development direction of artificial Intelligence from AlphaGO’s victory over Lee Sedol [J]. Computer Fan, 2018, 115(12):188. [8] Lecun Y, Bengio Y, Hinton G. Deep learning.[J]. 2015, [9] Zhao Dongbin, SHAO Kun, ZHU Yuanheng, LI Dong, Chen Yaran, Wang Haitao, Liu Derong, ZHOU Tong, WANG Chenghong. Control theory & applications,2016,33(06):701-717.9



Tao department front – F-X-team opened a weibo!
In addition to the article there is more team content to unlock 🔓