Hello everyone, today’s article also comes from your questions. Before, a friend asked me in the message that he wanted to work as an algorithm engineer, but did not know where to start. Could you tell me what I should do?

It’s an old topic and a big concept. It’s hard for me to tell you exactly how to do this, but I can tell you what a qualified algorithm engineer should at least have, and maybe help you figure out where to go.

basis

Algorithms, data structures

As a qualified algorithm engineer, you may not be able to write a red-black tree, or understand network flow, but the most basic sorting, recursion, dynamic programming, tree, stack, queue and other algorithms and data structures must be able to, and also need to have a certain degree of knowledge.

A lot of people who don’t know what’s going on will hold these ideas that aren’t important and perpetuate them. However, based on my personal experience and observation, I find that the performance of an algorithm engineer is positively correlated with the level of his algorithm and data structure in Internet companies of all sizes. And these two are also the focus of the interview investigation, if you want to go to a good company to learn, algorithm and data structure as a basic skill is a must. In addition, these two capabilities are very helpful for our continuous learning and research of other technologies in the Internet industry, such as distributed systems and machine learning, where a lot of content is related to data structures and algorithms. It’s not for nothing that you find the speed and quality with which algorithmic bullies learn other techniques to be terrifying.

For the average practitioner, these two pieces are not very demanding, so you can check out the first 300 LeetCode questions, which basically cover all the commonly used algorithms. You can also read the algorithm of the public number, data structure topics, basically also covered all the basic content.

Machine learning

As an algorithm engineer, knowledge of machine learning is also indispensable.

A good place to start is Andrew NG’s machine learning course. He explained it very clearly and covered almost all the points needed to be covered. If you are a beginner, it is very normal that you will find it difficult. At this time, you can find a paper book to confirm and read together. After listening to Andrew’s lecture, compare the descriptions in the book with other blogs or related content. In this way, the quality and efficiency of learning are the highest.

For the common models of machine learning, Python is not very difficult to implement. After understanding the relevant principles of the model, it is best to use Python to implement the model and experience the details. In addition, models are not the whole story of machine learning. Many other things are important as well. For example, the reason of over-fitting, such as regular term, such as the derivation process of loss function, and the calculation method of AUC and other indicators, etc.

There are a lot of models in the field of machine learning, but there are not many that are commonly used and often asked in interviews. Speaking of which, LR, logistic regression, Bayes, decision tree, random forest, GBDT, XGboost, KNN, Kmeans and more than ten others. Instead of biting off more than we can chew, knowing a little bit of each and not mastering any of them, we can learn by focusing on just a few of them and eating them out.

Deep learning

Deep learning needs to distinguish between fields. For recommendation, advertising, search and other CTR estimation fields, the requirements are relatively low, and the interview will not be too in-depth. This is also because models and practices in these areas are more fixed and require more features, data, and systems than models.

Another reason is that deep learning-related topics are not easy to examine in an interview, for example, if the candidate has not previously worked in a relevant industry. He probably has no idea what models and practices are used in the recommendation space, and we can’t ask him about DIN or FM. Moreover, the current framework of deep learning is divided into two sides. If he has not used the same framework in depth, he cannot ask more details about the so-called framework. And this is not the key, the framework will not learn quickly, a solid foundation in a few days can have a good shape.

So as a beginner, if you’re determined to be an algorithm engineer in recommendation, advertising, etc., you don’t even need to know about convolutional neural networks (I’ve never even seen convolution at work). Save time to read some paper in the industry, and do some competition practice, you will get better results.

The data processing

A lot of people don’t talk about this when they talk about how to be an algorithm engineer, but it’s actually very important, and it’s also the basic skill of an algorithm engineer.

Data processing mainly consists of two parts, one is the data processing process before model training. For example, feature processing, sampling, outlier filtering, and feature distribution analysis can be done by learning NUMpy and PANDAS. You can have a look at some kernels or articles of Kaggle to learn about them. This part is relatively simple. Generally speaking, those who have done Kaggle will be more or less familiar with it.

The second is the big data processing platform based on Hadoop cluster, such as MapReduce, Spark, Flink, Hive and other platforms and tools. These contents in addition to the industry generally understand little, let alone have the awareness to learn, but it is actually used in the actual work of things. The pure MapReduce approach is a bit outdated now, with spark, Hive, and Flink commonly used in the industry. It is not necessary to learn all of these skills, but to understand and master one of them. Because the practice of different companies is generally not quite the same, when entering the job is to learn again, the interview is generally not required to be exactly right.

Ability to develop

This is not mentioned much, but it is important. After all, algorithmic engineers are engineers, and they write code. In daily work, algorithm engineers mainly develop three areas, which are models, scripts and systems.

A model is easy to understand, like a reproduction of a paper, or a reproduction of a classical model or something. But when we implement the model, we don’t just implement the model itself, we often need to implement a lot of additional things. Such as breaking up training and validating data, such as logging model training, reading data, formatting, and so on.

Scripting refers to scripting for features and data processing, depending on the data processing platform used by the company. For example spark needs to write Scala, for example Hive needs some SQL, etc. These scripts often involve very complex feature generation and data correlation logic, which can be very troublesome and can get you wrong if you are not careful.

Finally, systems, algorithmic engineers also need to participate in the development of some systems. For example, on-line sorting system, on-line call model scoring system and so on. The details of these systems are often linked to models and algorithms, and ordinary developers often do not understand these details, so it is still necessary for algorithm engineers to participate in the development, so the most basic development ability is essential.

The framework

Framework refers to the framework of deep learning model. There are many existing frameworks in the market, such as Keras, MXNet and Caffe, in addition to the commonly used TensorFlow and Pytorch. In general, you can choose between TensorFlow and Pytorch to learn in depth; it is much easier to learn the other once you have mastered one.

As I said earlier, frameworks are not the core focus of technology, and it doesn’t matter what framework you use. Personally, I recommend Pytorch if you’ve never learned a framework before. Pytorch has a flatter learning curve, and its object-oriented support is friendlier and the syntax cleaner. You will find the experience of learning Pytorch much better than TensorFlow, and it will be faster to learn.

You don’t have to worry too much about interview questions, because most of the requirements for candidates are TensorFlow, Pytorch, Keras and other commonly used frameworks. If you’re asking about TensorFlow, you can just tell the interviewer that I use Pytorch a lot. I’m not familiar with TensorFlow.

In actual combat

After we have learned a lot of theoretical knowledge, we still need to use it in practice to test the effect of our learning and understand it better. There are many practical channels for machine learning-related applications, such as the famous Kaggle and Alibaba tianchi Big Data, etc. In addition to these two companies, in fact, many companies are also holding their own algorithm competitions, for Xiao Bai, these are very precious opportunities to practice.

There are many competitions in Kaggle and not all of them are meaningful. We can find ones that are closely related to the direction we are applying for. For example, if you want to face search, advertising, you can do CTR estimation, if you do NLP, you can also look for text processing problems. There are so many games in Kaggle that you can hardly miss them. In comparison, Tianchi big Data has fewer topics, but it can use alibaba’s real platform and desensitized data, which I personally feel more realistic than Kaggle. If we do it ourselves, we can basically understand the whole process of making models for such a large company as Ali, and also experience the powerful computing power of Ali Cloud.

Finally, when we do a certain competition or topic, our goal is not just to make a good result. Instead, try to think about the context of the problem and how the model works in that context. That is to say, we can’t do it. We have to think about it and summarize it after thinking about it. Only in this way can we gain real growth.

How, see these requirements do not feel that the threshold of the algorithm position is still very high, to learn a lot of things? In fact, it is true that algorithm engineer is a very special position, which requires dabble in various aspects such as model, data, algorithm and system. However, we do not need to cover all areas, focusing on the large and small, starting from these core areas can achieve twice the result with half the effort.

That’s all for today. I sincerely wish you all a fruitful day. If you still like today’s content, please join us in a three-way support.

Original link, ask a concern