1. Introduction
Originally this title I think is the skills of algorithm engineers, but I think if add machine learning in the title, estimate point people will be a little more, so the title into such, ha ha, and was included by the search engine when more than a popular word, estimate exposure will be more. But rest assured, the article is not off topic, let’s get down to business.
Today is about machine learning this the last two years of the most popular topics in computing, it is not an article on machine learning techniques, just tell us the inside of the machine learning pit is really too much, and many haven’t started or beginner, friends, in fact is a hole in front of you, if you wanted to go on in this way, please prepare.
2. The purpose of learning machine learning
To be honest, most people take various courses to learn machine learning and big data. After all, they still hope to find a good job and get a higher salary. Of course, part of the reason is that they are interested in this field and want to know more about it.
Personally, I think the first reason is more important.
3. What are we talking about when we talk about machine learning
First, what would a machine learning system look like
Almost all machine learning system is composed of the above system diagram, different type is to monitor the system of training data may require manual intervention rather than supervision system does not need manual intervention, is simply to give a number of training data to the machine learning model, get a prediction model, and then use this model to forecast the new unknown data.
There are articles about machine learning on the Internet, there are blogs all over the place, there are all kinds of books on the market, and it’s one of the hottest areas of online education right now, and it’s a very expensive class for all kinds of machine learning online education.
But you see, all this talk about machine learning is about models, articles and books like “In-depth Understanding of XXX Model”, “Probably the best article to understand XXX”, “Machine Learning is not difficult, DETAILED explanation of XXXX model” and so on. Various introductions to logistic regression, deep learning, neural networks, SVM support vector machines, BP neural networks, convolutional neural networks….. And so on and so on.
So, when we talk about machine learning, we’re really talking about models of machine learning, which are all kinds of machine learning algorithms. And everyone thinks that once you learn the theory of models and algorithms, you’re an expert in machine learning. I’m sure most people think so.
4. Xiao Ming has become an “expert” in machine learning
There is a child, is a computer, xiao Ming, watching alphaGo abuse Lee Sedol video, although he does not know go, but he is still shocked, determined to learn this legendary machine learning. So everywhere on the Internet to find tutorials, looking for blog articles, looking for books, good learning for half a year, finally feel their entry. Every model algorithm in machine learning can tell you why.
I wonder how many of you have at this stage?
But xiao Ming want to go further, and began to study various models of the code and tools, hadoop and the spark that is standard, and all kinds of articles, all kinds of books, all kinds of online classes, ok a big put a lot of these things, now online classes in particular, if there were no big data processing class, no hadoop class, then don’t open.
Along the way, most of the half year passed, and finally Xiao Ming felt that he had learned, had his theory, and had the big data processing tools. He was invincible!
How many people are at this stage? And think they already know how to machine learn. By this stage, if you’re good at it, you’re ready to start a machine learning class. But if you think that’s gonna get you a job as an algorithm engineer.
Because xiao Ming had strong theoretical knowledge, could deduce all the formulas, knew Hadoop and Spark, and had strong expression ability, it was very easy for him. Several interviewers were admitted to a big company. He was an algorithm engineer working in an e-commerce company, and his salary was very high. Give me a percentage point increase in search clicks with your awesome knowledge.
If you were Xiao Ming, if you just got out of a machine learning class, what would you do? Are you being silly?
5. Machine learning is more than models
The reason for this problem is that everyone thinks that the model of machine learning is machine learning itself, that understanding those algorithms is the leader of machine learning, but it’s not the case at all.
Who is playing with the model? Model is invented by scientists, is each big company individual scientists, researchers invented, this invention will be out of the paper, as they used to abuse our IQ, in general, you invented the model * * (if you can, you can don’t look down, you can go the way of academic) * *? You can’t modify the model, can you?
So you've learned the model, but you're just getting started, not even getting started
So what are all these algorithmic engineers doing in companies? Let’s take an algorithmic engineer for search sorting. What are they doing? They are in
Observation data – > find characteristics — — — — – > > design algorithm validation algorithm — — — — – > > wash data engineering – see the effect > online – > goto observation data
And in a mature system, the general model has been roughly determined, if the effect is not particularly bad will not change the model, for example, a company search ranking system uses machine learning logistic regression model, you want to change to another model is generally unlikely, so you can only do some characteristics of the supplement.
Ok, so let’s go through this process and see what a machine learning algorithm engineer really needs.
5.1 Observed Data
Xiao Ming looked at the data at his workstation every day, looked up the data, looked at the table, drew the curve, and found that such conceivable features as sales, favorites, clicks and so on were already used, so it took three months, no progress, people are broken, come so long, machine learning code has not seen any.
In the fourth month, he found a problem. He found that some products had good reviews and felt the product quality was also good, but the sales volume did not increase, so the row was always behind. Therefore, he filtered out the products with five-star reviews but low sales volume to see what they had in common.
Look at the data phase, what capabilities did you say you wanted? Ha ha, can only tell you, need data sensitivity, in fact, also tell you need comprehensive ability, need experience, need product manager ability.
In addition to these, you also need to make up the ability to script code, encountered some data needs to be preliminary processing, may need conveniently code process, and make up faster, because the code may be used once or twice is no, so I need more powerful scripting language ability, so at least familiar with python, the shell will be.
5.2 to find characteristics
Data observation found the problem, now to find features, to find features, that is, to find what factors lead to sales decline, first of all, you need imagination, and then to verify your imagination.
Xiao Ming’s imagination is overflowing, even so, it took a month to find that these goods have a common feature, that is, the pictures are relatively bad, let a person do not want to point. Boy, what if you could add picture quality to the ranking factor? Image quality as a feature, which no one had ever done before, finally found a feature.
Therefore, at this stage, after all, everyone’s imagination is limited, and it is more experience value to find features that fit the current scene.
5.3 Design Algorithm
I found the feature, but how do I add the feature to the ranking model? How good is the picture, how good is it, how do these machines understand it? If you can’t turn image quality into a mathematical vector, you’ll never be able to add it to the ranking model.
This stage is where the real test algorithm engineer, that will feature vectorization, xiao Ming observed the better image color change more often, and poor quality images often color no change, so he thought of a way to make Fourier transform of the image data and data into frequency domain, according to the nature of the Fourier transform, High frequency part of high amplitude said the color of the image changes obviously, if the low frequency part is high, the color change is not obvious, and this basic can match the observed image information, the image is good or bad, can use the range of the high frequency part after Fourier transform said, and then doing some changes of the normalized, the image vectorization, Vectorization can then be added to the sorting model.
For this step, you may use the machine learning model you are learning, but it is definitely only a small part of the situation. Most of the situation requires you to build your own mathematical model based on the current situation, rather than the machine learning model. What skills do you think this stage requires? Although the examples I gave here are quite extreme, mathematical abstraction, mathematical modeling and mathematical tool proficiency are essential, and strong programming ability is also required, which is not the scripting ability of the previous step, but the real computer algorithm programming ability.
5.4 Algorithm Verification
Algorithm is designed, and design an algorithm of offline validation method to show your boss that my algorithm is effective, otherwise which so many opportunities for you to line up to try ah, this step is also the combination of all kinds of comprehensive ability, the key is in this step, you want to use a popular language theoretically convince your boss, this is a kind of ability? Strong language skills.
In addition to this, you also need to design an AB test plan after the launch, which can well test whether your algorithm really works.
5.5 the data
The feature is found, the algorithm is designed to almost reflect the feature, the manual work, that is to wash the data, this is a required course for algorithm engineers, data is not what you want it to look like, so it is a manual work to transform the data into what you want it to look like, and then remove invalid data.
Like the above example, first of all, everyone’s picture size may be different, and it is easy to transform them into one size. Some goods have multiple pictures, and it may be necessary to find the best quality reprocessing and so on.
This stage is also to script language processing ability, but also need to master the use of some data processing tools, the key to have enough patience and confidence, of course, is essential to excellent programming ability.
5.6 engineering
Well, the pit in front of you across all came over, came to this step, ha ha, algorithm design, data is ready, estimates that in the past six months, then hurry up go out on the line, do you think holding a bunch of scripts can online ah, have to consider the engineering, if your algorithm is embedded into the original system, if you ensure the efficiency of the algorithm, don’t run a day, Consider the robustness of your code, and if it’s an online algorithm, consider performance, don’t run out of memory.
In this step, you can actually use the machine learning tools hadoop and Spark that you have learned above. To complete the engineering step, you need to have the ability to do it. I don’t need to say it.
5.7 Check the effect on line
All done, 10 months before and after, finally can go online, well, the real test, see the effect of the launch of bai, product manager said, do an AB test, the results ha ha, click rate reduced, Xiao Ming ah! 10 months of survival and a drop in clicks?? The boss still doesn’t beat you to death, so you need to be resilient.
Ha ha, get off the line quickly, see from the beginning where the problem, and spent a month to modify the algorithm, back online, well, this is good, the click rate increased 0.2 percentage points, continue to work hard, see what can be mined, so, you goto see the data of that step.
Despite that 0.2, an increase of 0.2 is a very good increase in a large data set, so it’s worth the money spent on algorithm engineers to produce 0.2 a few times a year.
6. Let’s sum up
So much of the above process, one is complete can be a bit difficult, I said is a bit exaggerated, some steps is someone with observation data with product manager to cooperate with you, when the data is the data engineer to cooperate with you, engineering systematic engineer to cooperate with you, but as a machine learning algorithm engineer, You need to be able to handle the process, so you should be able to handle the process even if you’re alone.
This is just a standard algorithm engineer should have the ability, of course I am here in search algorithms, for example, the algorithm of other engineers also not too much, always run above a few process, however, of course, if you people, can according to the scene to modify the machine learning model, even I can think of a model, that is even worse.
All right, so let’s pull out the highlights and put them together, and let’s see what skills an algorithm engineer needs
Data sensitivity, observation
Mathematical abstraction, mathematical modeling and the ability to use mathematical tools
Can readily script code ability, strong computer algorithm programming ability, the quality of senior development engineer
Imagination, patience and confidence, strong language expression ability, anti-attack ability
Then there is the key point, you need to be very clever but, of course, if you can do more then what time, basically also very clever, if really can do, instead of the machine learning model, the theory and tools is less important, because those are the knowledge and tools, can learn at any time.
Do you think you can do this by reading a few blogs, a few books, and a few classes?
Of course, we’re talking about the general situation here, but if you’re focused on doing research, you’ll need to increase your proficiency by an order of magnitude.
Finally, are you ready to step into these pits as a machine learning and algorithm engineer?
Welcome to follow my official account, mainly talk about search, recommendation, advertising technology, and nonsense. The article will be posted here first 🙂 scan or search wechat XJJ267 or search Spanish language