What are the application scenarios of various machine learning algorithms (such as Naive Bayes, decision tree, K-nearest Neighbor, SVM, logistic regression maximum entropy model)?

How often do you use each of these in general work? What are the general uses? What should I pay attention to?

According to the question, the core keywords are basic algorithms and application scenarios. The more concerned point is whether these basic algorithms can be learned and used. After all, the algorithms mentioned above have been around for a long time. You may still have a few classic books like Introduction to Data Mining and Pattern Classification on your bookshelf, but you are also worried about whether basic algorithms have a place in the era of deep learning.

There have been a lot of content on the Internet to answer the basic ideas of classical algorithms and theoretical application scenarios, these scenarios are more like the scope of application of the model, which is actually very different from the actual implementation of industrial algorithms in the scene.

From an industry perspective, business value is the golden key to measure the performance of an algorithm, and business scenarios often contain business objectives, constraints and implementation costs. If we just look at the target, the cutting-edge algorithms tend to dominate, but if we need to consider algorithm complexity, rapid iteration, and imposed business constraints, the classical algorithms tend to be more useful and therefore take their place.

In view of this problem, tao technical algorithm engineers perceive students to write detailed answers in this paper.

In terms of actual business value, the impact of the algorithm model is approximately 10%

In industry, algorithms go from idea to ground, and you’re not alone. Using the recommendation algorithm as an example, suppose we are now tasked with supporting recommendations for a channel page feeds stream. The first thing to realize is that for the business, the impact of the model is roughly 10%. The other important impact factors are product design (40%), data (30%), and representation and modeling of domain knowledge (20%).

This means that if you upgrade a normal LR model to a deep model, even if you improve it by 20%, you might only contribute about 2% to the business. Of course, 2% is still a lot, but it’s a painful discount.

Of course, the above proportion is not invariable. Since 2015, the first year of Ali recommendation algorithm, personalized recommendation often compares with the standard operation rules. If you do not improve 20%, you are embarrassed to say hello to people. The routine of an algorithm engineer is to do optimization instead of taking requirements: the business feeds me logs, features, and optimization goals, and I do the rest.

But with the rising tide year by year, the model we use is naturally more and more complex, from LR->FTRL->WDL->DeepFM->MMOE, we are also along the path of the predecessors laid down step by step. Ask anyone who still uses LR or ordinary decision trees and you will get an awkward smile.

But gradually, we also realize that model optimization is ultimately a matter of diminishing marginal returns. When we shield the business side from the outside and only optimize in a confined space, the ceiling is predetermined. So gradually, tao department recommendation slowly into the second stage, algorithm and business co-construction stage. Business requirements and algorithmic optimization, while still separate, are beginning to converge.

The group’s requirements for algorithmic engineers are also changing: a lofty, deep model that does not articulate business value or deliver significant improvements is considered self-fulfilling. At this point, an excellent algorithm engineer needs to be familiar with the business and be able to figure out the business pain points through repeated communication with the business.

Note that the business side may even be in the middle of the game, giving you both a take and a take, and you need to really focus on the problem that is most worthwhile. And then, the algorithm description of the problem. When you do this, you will find that it is not you who decides which model is the best, but you follow the question and choose the model. The first version of this model is most likely a classical algorithm, because you need to run through the link as quickly as possible to quickly verify that your idea is valid. It is only a matter of time before the model is improved iteratively.

Examples of application scenarios of classical algorithms in panning systems include TF-IDF, K-nearest Neighbor, naive Bayes, logistic regression, etc

Therefore, at this stage, in most of the Amoy scenarios, not algorithms to drive business, but with business to complete the growth. An algorithm engineer who only knows technology can only get a perfect 10 percent. To give you a sense of body, here are a few more examples:

For example, the business problem is to mark the user group, including phishing fanatic, cardamom girl, earphone fan, male god style, etc. In practice, we not only need to consider the user’s age, gender, purchasing power and other attributes, but also need to consider the user’s long-term behavior in the taobao system, so as to obtain a multi-classification task. If the model is based on monthly visits, then nutmeg could potentially have a very large audience, since women’s wear is the most frequented category.

For example, if a user visits both earphone fans and nutmeg girls four times in a month, assuming that the average number of visits for earphone fans is 3.2 times and that for nutmeg girls is 4.8 times, then the user’s preference for earphone fans should be higher. Therefore, the model features should not only use the absolute behavior frequency of users for this group, but also provide the relative behavior frequency with reference to the water level of the market.

At this time, selected teacher Wu Jun “the beauty of Mathematics” TF-IDF algorithm will come in handy. By introducing tF-IDF construction features, the model effect of crowd tag can be significantly improved, while TF-IDF is a very basic text classification algorithm.

Increasing click-through rates or conversion rates in the feed stream is a common scenario in the search recommendation scenario. Business will always give you a surprise: such as goods inventory only one auction (ali), such as the recommended products are mostly new product (Tmall vpi), such as through the small kind to attract users after purchase authentic, most of these users is the first time to (Tmall U first), or in improving efficiency at the same time also aspires to category richness (scenes).

In the context of the above different business constraints, this is the real application scenario we face. Facing this situation, the first step is to set the direction. For example, the problem in Ali auction can be described as “how to make personalized recommendation under the constraints of shallow inventory”. If you decide that this is a flow control problem, you need to list the optimization goals and constraints, and investigate how to solve them using the Lagrange multiplier method. Importantly, the final result needs to be combined with a personalized recommendation system. For details, see: The first disclosure of Ali auction link guide shopping strategy.

For the above application scenarios, you need to understand that your strategic goal is to prove that recommendations under shallow inventory constraints are a traffic control problem and to be able to verify the results quickly. It is a wise choice to adopt a mature and classical method to conduct experiments quickly and then iterate gradually.

For example, the K-nearest neighbor algorithm seems to be simple enough to be described in a single graph or sentence. But it can be useful in solving the problem of negative and positive sample imbalance. Upsampling refers to making multiple copies of data from a few classes (often positive samples), but the existence of duplicate data sets after upsampling may lead to overfitting. One reason for over-fitting is the difference in the proportion of positive and negative samples in the local range, as shown in the figure below:

We only need to upsample the local sample of class C, which uses the K-nearest neighbor algorithm. The classical algorithm in this case is only a part of the link, or even a corner, but it cannot be separated from it.

Naive Bayes is a little simple, but the future development of Bayesian theory, including Bayesian networks and causal diagrams, is not so simple. For example, whether LR scorecard model in financial business or depth model in recommendation algorithm refinement, cross features are often configured by manual experience, which is even the least automatic part of the algorithm.

Structure learning in Bayesian network and industry knowledge input by business are used to construct bayesian probability graph, from which relevant features are found to be crossed, which will bring certain improvement compared with manual configuration and have better interpretability. This is a good example of combining business domain knowledge with an algorithmic model. But it’s hard to get this far without a solid Bayesian theory.

No matter how popular deep learning is, LR is a backup for most scenarios. For example, in the double Eleven promotion scene, if all scenes follow the depth model during the peak period from 0:00 to 0:30, machine resources will certainly be insufficient. At this time, a downgrade plan using LR or FTRL should be made.

How to become a good algorithm engineer in the industry?

In Amoy, the algorithm does not exist in isolation, and the algorithm engineer is not just a closed swordsman. How to cut to the business pain point, quickly verify your idea, how to conduct the appropriate algorithm selection, which requires you to have a good basic algorithm skills, as well as a wide algorithm vision. No matter how complex a model is, the final answer is business value. An algorithm engineer with good basic skills, quick learning ability and good at exploring business value will have great room for growth.

Well, assuming that’s our career goal, how do we get there? Ye fang, what do you think? In fact, there is only one sentence, the combination of theory and practice.

The theory of

It may only take you 20 minutes to understand the basic idea of the algorithm, but it may only take you two weeks to forget it. The idea here is that if you look at a decision tree, for example, you can simulate the information gain calculation, feature selection, node splitting in your mind, and know whether it’s good or bad. In the end, it’s all about quickly identifying in practice that it’s the best model to solve the problem at hand.

Getting the basics solid is a slow process. This can be done by reading classic books, learning video sharing, or using a high-level language. The important thing is the attitude, can be calm, do not set expectations, to maintain enthusiasm; Also, if you have a hobby group to share, congratulations. Because the process of studying a book or reading a paper can be boring, but you can go a long way if you are motivated to share.

practice

We all know that practice is the mother of wisdom, but practice is often cruel. Because there are so many requirements and constraints, problems need to be discovered, and the window of time is very short. That is, the algorithm is a partial deterministic, applicable boundary, standardized thing; Business is a divergent, multi-objective, experience-driven affair.

You first need to have the eyes of double discovery, find the most worthy of the point, this requires data analysis with business experience, peeling the most important optimization objectives, constraints, as far as possible to simplify the problem; The second is to have a smooth mouth, otherwise the business is not willing to give you this window of time to try; Finally, after betting on your human design, it’s time to test the fundamentals of your algorithm under pressure to get results in this window as soon as possible.

conclusion

Finally, let us guess a riddle: it is a greedy guy, computational complexity is not big, can do dynamic feature selection, whether feature is discrete or continuous; It can also be pruned before or after pruning to avoid overfitting; It integrates the calculation of information entropy and can also introduce random factors irrationally. It can exist in isolation, can also do integrated learning, can also cooperate with LR to solve the problem of feature combination; Small samples, uneven positive and negative samples, online data stream learning can also be supported; The result is highly interpretable rules. It’s a decision tree.

Every basic algorithm is like a seed with wisdom, which is not ordinary icing on the cake, but the original thought of breaking the sky. Sometimes we might as well go back to the past, with the master’s original pace of walking, this original wisdom for the future long-term algorithm road, is of great benefit. Neural network from planting to flowering, nearly 50 years; Bayes is also on its way to fruiting. Who will be next?