Ali Cloud DataWorks team – Jifeng, Qin Qi

Abstract

In this wave of AI changing every aspect of people’s life, artificial intelligence technology is also changing the technology research and development ecology of all posts. Ali is still very ahead in this regard. Take Ali’s front intelligent group for example, it not only has Imgcook (Design to Code), Front-end algorithm engineering system Pipcook also has C2C (Code to Code), intelligent UI and other capabilities. This paper focuses on introducing some practices of C2C in Aliyun Feitian big data platform, and expects to let everyone have a deeper understanding of front-end intelligence through the introduction of practical schemes.

Business background

First, a brief introduction to the business background. Alibaba Cloud Feitian Big data platform is the crystallization of alibaba’s 10 years of best practices in big data construction. Tens of thousands of data and algorithm engineers are using The Feitian big data platform every day, bearing 99% of Alibaba’s data business construction. At the same time, it is widely used in the construction of big data in various fields such as urban brain, digital government, electricity, finance, new retail, intelligent manufacturing, smart agriculture and so on. The specific development history, product architecture and front-end page overview can be seen in the figure below



Business challenges

From the above pictures, we can clearly see several characteristics of the front-end page of Ali Cloud big data r&d platform:

  • Reprogramming scenarios, there are plenty of WebIDE and editor scenarios, and more than 70% of users use programming every day
  • Re-visualization interaction, there is a large amount of visual display of data, the layout of the task of the scene.

And as a research and development platform in addition to stability, to improve the efficiency of customers, that is, to do research and development personnel above is the most important thing, so the core of our product front-end intelligence to solve the problem of efficiency.

The solution

From the perspective of business challenges, the front-end intelligent solution mainly solves two problems in business implementation:

  • Intelligent upgrade of each product component
  • The unified algorithm engineering capability ensures the continuous updating and iteration of the algorithm and the ability of rapid deployment

In view of these considerations, we make the layout of the entire intelligent construction as follows


The following will be mainly elaborated from the aspects of intelligent editor, intelligent visualization and algorithm engineering.

Intelligent editor

Editor is the core component of big data research and development. How to enable developers to develop data quickly has always been our core requirement. With the help of intelligence and machine learning, we added the core capabilities of the editor, such as intelligent code recommendation, code diagnosis, etc.


Intelligent code recommendation

Code recommendation refers to the fact that when writing code, the editor lists possible candidates for the user to choose from based on the current context. Once the user selects the recommended item, the corresponding input will be automatically completed, which can greatly improve the coding efficiency of developers. With the help of intelligent algorithms and the usage habits of most users, we have realized the intelligent model of code recommendation, combining with language grammar rules, we can recommend the current most syntactic code. The algorithm recommended for code is generally the Language Model algorithm, commonly used Model algorithms include N-GRAM, LSTM and recently more popular GPT and CodeGPT (GPT is based on programming Language pre-training Model). Given that everyone who writes code has different coding styles and habits, a generic recommendation algorithm may not be the best solution. Inspired by taobao’s “thousands of faces” mechanism, can code recommendation also be recommended according to individual coding habits? To this end, we have implemented a lightweight end recommendation model, which can conduct code training and recommendation on the end, so as to recommend the code in line with personal habits.

The code in the diagnosis of

Code defects have always been a headache for developers, and the causes of defects are also diverse. If code defects can be found in the writing stage, it can save a lot of manpower and resources, and code diagnosis is born for this. With the ability of engine side, a large number of grammar rules, and the relevant information of code review, we have summed up a large number of defect types and corresponding defect codes. After training with intelligent algorithm, we have basically acquired the ability of automatic defect detection. Training model can refer to supervision model, and Support Vector Machine (SVM) is commonly used.


Intelligent visualization

In the big data platform, data is the most core content, and visualization is the best tool to show the characteristics and value of data. Especially in the scenario of data analysis, visualization can help users quickly and intuitively discover the rules of data.

Data detection

Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to: Find out whether existing data can be easily used for other purposes. — Wikipedia


Data exploration is the process of obtaining statistics and summaries of data. How to realize automatic analysis of data types and features and automatic selection of charts is the core problem. The data type analysis is based on the Analyzer and Statistic modules of DataWizard to analyze the field types and basic field characteristics. Chart recommendations refer to the following classic decision charts based on four dimensions of comparison, distribution, composition and relationship. Throughout the probe process, we have encapsulated components that can be quickly invoked to implement our own probe products.


And we are training the intelligent model based on the correspondence between data and chart to realize the intelligent recognition of data features and intelligent recommendation of chart.

Algorithm of engineering

With the help of Ali Cloud machine learning platform, we have formed a universal model training, evaluation and deployment link for the training of many intelligent models mentioned above.

With the interactive modeling capability of PAI DSW, we can quickly realize the model training process based on the form of Notebook, including data loading, preprocessing, training set, test set segmentation and other operations. Then based on the perfect model of Tensorflow, you can quickly realize your own training process.


Model evaluation Model evaluation is an important method to reflect the effect of the model, model is to solve the actual production problems, so the model evaluation scheme must conform to the definition of the problem, can truly reflect the effect of the model on the problem. Common evaluation methods include accuracy rate, recall rate, etc. At the same time, it is often necessary to customize evaluation indicators according to actual problems. For example, for code recommendation scenarios, in addition to the accuracy rate (top-N), there are also indicators such as recommendation time and recommendation length. Only by combining these indicators to comprehensively evaluate, can the real effect of the model be reflected.

PAI EAS provides the ability to deploy models online. Models can be uploaded with one click and then quickly invoked through the API.

future

  • Based on intelligence, can keep up with the progress of the industry, will be good algorithm applied to our products, to provide users with more powerful services.
  • Explore more intelligent scenarios, use machine learning methods to solve problems,

Write in the last

Machine learning brings a way to solve problems, and we find that many business problems can be considered by using machine learning ideas. Further, I think machine learning is the trend of the future. Like data analysis, machine learning will become more accessible. For example, Pipcook is a framework that enables front-end students to quickly learn machine learning. I hope there will be more and more such tools. Group front-end intelligent construction in P (RRD) 2C, AS well as D (Desion) 2C and C (Code) 2C, aiming at front-end scenarios, intelligent means to solve business problems, but also welcome interested students to cooperate and build.



Tao department front – F-X-team opened a weibo! (Visible after microblog recording)
In addition to the article there is more team content to unlock 🔓