How do you turn a team of data engineers into AI experts?
Although everyone is talking about artificial intelligence now, I believe there will still be some doubts: Artificial intelligence is so mysterious as many people say, can it really be established and bring value to everyone?
Hu Shiwei, co-founder of The Fourth Paradigm, gave an answer at the FMI AI & Summit forum held by Peima, and told us how companies can turn data engineers into AI experts in the AGE of AI.
When hu wei
Hu Shiwei told us that today, AI has not only been able to do some recognition in the field of images, can play go or games and other aspects of some good attempts, but also has a very eye-catching performance in the field of commercial and industrial shopping malls, can obviously let people feel the convenience effect brought by AI.
Why is AI efficient?
Why is AI so good? According to Teacher Hu Shiwei, we mainly understand in the following three directions:
First, fine. Artificial intelligence systems require more analysis and prediction capabilities for personalized and micro business scenarios than traditional enterprises can imagine. Traditional manual extensive classification and detection methods have appeared inefficient and laboriously in today’s big data; If we use AI technology, we can achieve the ability to analyze and predict personalized and micro business scenarios, greatly improving the accuracy of analysis and prediction.
Second, intelligence. In the past, we used the traditional way to make models and generate some business intelligence or functions. We usually used BI to analyze big data or find some strong variables for the rules in the database. However, with the change of time, the strong variables may change to some extent. If machine learning or artificial intelligence is used to make a system based on data, it is a closed loop. It can replace people with machines to screen out rules from a wide range of data, and the whole system will have the ability of intelligence.
Three, efficient. Generally speaking, in the past, when we did some enterprise intelligence systems, the way we used was to come up with some lists, and the characteristic was that we could solve a problem in batches. But actually like in some computing customer service areas such as advertising or intelligence, is also to the requirement of efficiency will be high, ask us to within a few milliseconds to deal if there is a problem, or willingness to spend on a certain goods to the customer to determine, in a sense, the enterprise needs to establish a real-time or near real-time data collection and transmission, the response of model prediction and decision-making ability, So intelligence goes from a batch phase of new behavior to a behavior that can be reached in real time.
Big data is white rice, machine learning is rice cooker, AI is cooked!
As for the relationship between big data and AI, opinions on the market are also divided. Hu Shiwei said the relationship between the two very vividly: big data is white rice, machine learning is electric rice cooker, artificial intelligence is made can be directly to eat the food. In the field of AI, it uses big data in various fields and machine learning to output the capabilities of ARTIFICIAL intelligence.
Five elements for building business AI capabilities
Hu Shiwei introduced five elements of the ability to build commercial AI, namely, big data, external feedback, algorithm, computing resources and demand. Three of these are problems that the enterprise scenario itself has to solve: meaningful process data, requirements, and external feedback.
Teacher Hu Shiwei gave us a detailed explanation through the example of ordering food.
1. Meaningful process data. That is, we usually talk about big data. If I am a provider of Pad for ordering food, covering a lot of restaurants, that is, covering some of his footprints at different times at one time, AND I have collected the data of ordering food, what kind of data should I collect? That’s part of the big data process that we’re talking about. What kind of data are we going to collect? That is, what kind of food this person has actually ordered, what kind of foot traffic his current restaurant has, and what kind of historical footprint this person has.
Second, demand. We only know that we need to do some layout in this ordering Pad, what are the actual requirements? We know the specific restaurants operating indicators is remake rate, serving time, and time to eat food, and the waiter order based on the Pad, we want to increase the rate of the remake, I am already prepared dishes, and offer to eat some he prefers, so can reduce the time of his order, at the same time can reduce the kitchen cooking time. In this process, we have requirements. Suppose there are 25 kinds of dishes, and I recommend him which one he chooses as much as possible. If the recommendation is wrong, he doesn’t like it and his experience will not be affected. This requirement can be translated into, after collecting data, how to use the data to determine whether a diner likes a certain dish or a certain dish.
3. External feedback. For machine learning, in addition to collecting process data, we also need to collect external feedback. For example, today, I pushed the dish X to A, he liked it, he chose it, and I got 1 feedback. If he didn’t like it, it was 0 feedback. In fact, for today’s enterprises, these three have been prepared in A large number of scenarios.
Why aren’t today’s AI capabilities widely used? Hu Shiwei believes that two factors are missing, one is algorithm, the other is computing resources.
So the question is, what problem does the algorithm solve?
For example, the data is in the database, and I finally hope to generate A service. I ask it whether the person A likes the dish B at this moment and in the context of the scene, and the machine will give A probability. Without A good algorithm today, the probability will become inaccurate. What if I make this probability accurate? For today’s more feasible machine learning algorithms, including Alphago, unmanned driving or face recognition, they actually use a lot of resources to calculate data. Excellent algorithms also need strong resource support.
So in an ideal world, what conditions should enterprises have if they want to enter the ERA of AI? What about the actual reality?
▲ First of all, from the point of view of how to use data, the ideal situation would require very large AI to conduct large-scale feature engineering exploration. But the reality is that modelers do little feature engineering exploration.
The ideal model size should be tens of millions to billions of dimensions, but the reality is often tens to thousands of dimensions.
▲ In the model algorithm, the ideal situation is to use large-scale machine learning algorithm, through feature engineering to adapt to the scene, but the reality is often used shenjian network repeatedly refining, through model changes to adapt to the scene.
▲ In the aspect of model error correction, the ideal situation is that the old driver leads the team to eliminate all kinds of risks in the modeling process with rich experience; However, the reality is that there are often problems such as traversal and over-fitting graph. The offline modeling effect is good, but it is disappointed after online.
How do you turn a team of data engineers into AI experts? (Some technology needs to be introduced)
Feature engineering: The original data can be derived in some way, the population can be divided into tens of millions of groups of such a variable derived method.
Enable data engineers to effectively explore sufficiently valid feature sets.
Model scale: a machine learning system is introduced to support the training of ultra-high-dimensional models.
In terms of the algorithm and debugging of the model, there also needs to be something formable that data scientists and engineers can call directly to produce the following model. Enable data engineers to quickly see if the model has errors and eliminate them.
In his speech, Hu Shiwei took the prophet platform of the Fourth paradigm as an example to explain some thoughts in the process of making the prophet platform:
Hu shiwei said that machine learning, in fact, the most important or algorithm. The choice of this algorithm, just like the four quadrants of the industry application algorithm in the figure, is usually used in the lower left or some logistic regression model or decision tree made with saas. It is characterized by fewer variables and fewer layers, which is more useful when the amount of data is relatively small.
The top right corner is a microscopic feature and complex model, which you can think of as a very deep network with multiple models integrated, or a very complex network structure. And we’re using a lot of variables in here. In fact, today, we can’t do the upper right corner, because the upper right corner we need thousands of machines to do a problem, too expensive.
Now there are two directions to choose from, one is macro features and complex models. In other words, this is what deep neural network is doing today. The input of this network may only have hundreds of variables, but the depth of the network is relatively deep. It can be debugged by human to get a good result.
The other way is actually the way Google and Baidu are going, with micro features and simple models. For example, we still use the logistic regression algorithm, but we raise the number of a variable to hundreds of millions or even billions, so that through the combination of variables, we can also use linear model to express nonlinear problems, so as to achieve better results.
The following is the Q&A of Hu Shiwei’s speech and the participants:
Q: When using neural network to find the global optimal, I thought about using integration algorithm to do the full value of the global optimal, but found that the efficiency is lower. Do you have a more efficient or effective way to recommend?
A: So, generally speaking today we do a lot of optimization, such as you talked about the integration of learning, or do some characteristics of the fold, do some sampling technology, actually all these problems is to solve the problem of run not to come out on a project, you just talk about some integrated learning ways to solve these things, in fact you are hope through multiple models, In other words, multiple models can be combined to achieve a good result. If the amount of data is large enough, you can actually use a separate network to obtain a better model by improving its dimension and the complexity of model training. So in general, if we use it in industry, we usually turn this problem into a shallow network with high expected dimensions, for example, one layer, like LR, and then a very complex combination of features to achieve a good effect. You talk about the process, if it is to use something like the way you talk about, may still need to Case By Case for data to see, if you use just the way I talk about, in fact is likely to be more violent search ways to complete, is a computational problems, rather than a math problem.
Q: What if there is not enough data and the integration algorithm seeks global optimality?
A: If the amount of data is not enough, GENERALLY speaking, I will choose to reduce the scale of some features. Generally speaking, if the amount of data is not enough, in fact, the more complex the network structure is, the worse the effect is expected to be. Even if the effect is good, it will be unstable in the monitoring process. The problem of data, if the amount of data is not particularly large, some traditional methods will even be better than the neural network method.
Q: May I ask what is the difference between Wevin platform and other similar platforms?
A: Well, now all cloud computing platforms and AI platforms have the same problem to solve at the product level. Let people complete the machine learning process through some drag-and-drop methods and simple ways. But on the algorithm behind it, people actually take a different route. For other cloud machine learning platform, usually they will choose to take some open source algorithm integration, means that you are going to learn how to create a good network structure, or to choose a good algorithm, and finally to achieve a good effect, prophet platform for a great deal of energy on GDBT above, In fact, in the same machine in the same time, we can use five or six times as much data, as well as tens of times or hundreds of times as much scale, so as to make a good guarantee for the computational power of the algorithm.
We because the underlying computing power is strong, so on this basis the effect will be better, also automatic tuning parameter, for we can finish of the hundreds of times in the unit, for other platforms or based on open source, because of his efficiency is one of the GDBT a few minutes, so the main force by calculation on the level of ascension to further reduce the threshold, After we actually drag and drop on this platform, we can have a better initial result than if we were coding or writing the structure of the network. The part after the interface is the most important.
In addition, Wevin platform is a whole process from online to offline. All the models we make offline can be published into online application services with one click, which is also a big difference from other machine learning platforms.