Hi, everyone. My name is Chris. I worked as an algorithm in a listed game company for five years before joining the company.

At present, I am also responsible for algorithm work in Ali, covering CV, NLP, architecture, etc., and business lines have also been extended to advertising, operation, customer service, risk control and other aspects.


Why is it difficult for the algorithm to recruit people?


In the eyes of the uninitiated, an algorithm engineer might get a new Paper issued by some big guy recently, or he might work on his own theory, derive formulas and produce theoretical results, realize it through parallel programming to support large-scale data training, and then beat the existing model, increase CTR by 200%, increase income by 200%, and make millions of dollars a year. But here’s the thing:



As the head of the algorithm department, I have interviewed many candidates. Generally, I evaluate them from logical thinking, basic algorithms and data structures, mathematics, deep learning, expression ability and engineering experience.


I found that in fact, many people just think they know the algorithm, brush a watermelon book dare to come out of the interview, in addition, there is a mathematical basis of this year’s students, the algorithm is also good, but the actual 3 years may write less than 1000 lines of code, practical operation ability is very poor.


After interviewing several young people with excellent resumes, I was surprised to find that many beginners did not have a good understanding of the actual workflow of data mining/algorithm engineer, resulting in professional skills deviation. This is why companies are receiving more and more resumes, but there are only one or two available, and the asking price is 50% over the budget, so the painful sign may be poached by peers.


So what is the specific working process of the algorithm position?


Let’s start with a small NLP project flow to give you an idea of the larger context of machine learning projects:


1. Understand requirements and obtain data.Meet with the product and operations to understand the requirements, and then extract the vast amount of data that the company has accumulated and that you download and crawl from the Internet.


2. Data preprocessing.Data processing probably accounts for 50%-70% of the total workload, and the corpus preprocessing can be completed through data cleaning, word segmentation, part-of-speech tagging and word stopping.


3. Feature engineering.After the corpus pretreatment, we need to consider how to represent the words and expressions after word segmentation into the type that the computer can calculate. There are two commonly used representation models for converting Chinese word segmentation strings into numbers, namely word bag model and word vector.


4. Feature selection.To construct a good feature vector, it is necessary to select suitable features with strong expression ability. Feature selection is a challenging process that relies more on experience and expertise, and there are many ready-made algorithms for feature selection.


5. Model training.For different application requirements, we use different models, including traditional supervised and unsupervised machine learning models, such as KNN, SVM, Naive Bayes, decision tree, GBDT, K-means, etc. Deep learning models include CNN, RNN, LSTM, Seq2Seq, FastText, TextCNN, etc.


6. Evaluation indicators.The trained model should be evaluated before going online, so as to make the model have better generalization ability to corpus.


7. Model on-line application.The model is applied online, the model is trained offline, and then the model is deployed online and published as interface service for the use of business systems.


From the perspective of business process, machine learning project is basically to understand business needs -> investigate industry solutions -> check whether it is applicable -> online effect. It is not difficult to find that how algorithm engineers improve their machine learning level through “practice” and how to improve the business level and revenue capacity of enterprises through the practical application of machine learning/deep learning is of great importance to specific businesses.


I often say that algorithms are just tools, and it’s important to achieve business goals with the right understanding of the industry and product.


So the fear that algorithm engineers will be replaced by their own algorithms is ludicrous. Although machines can do a lot, they can’t replace people’s understanding of data, which is the value of the existence of algorithm engineers. Although Deep Learning can replace human extraction of features to some extent, it can only solve the problem of feature transformation at most, and still cannot deal with the situation that domain knowledge is needed in data cleaning and preprocessing.


In my experience, I tend to think that algorithm engineer is a comprehensive talent integrating technology and product manager.


For students/practitioners of different majors, crossover is an advantage rather than an obstacle. Itself, especially if you as a is a in other industries (physics, engineering, chemistry, medicine, agriculture, satellite images recognition, network security, the social sciences) of the average programmer, in the industry have a deep theoretical and experimental background, access to huge amounts of data, then you can do some innovation and its work, this is artificial intelligence + talent.


There are a lot of machine learning courses and textbooks out there, and they’re mostly about how to build ovens from scratch, rather than how to cook and innovate recipes. This learning path is not only difficult, but 90% of learners are not deep in one direction, do not have core competencies, and do not conform to the talent concept of the enterprise.


The industry’s best AI boot camp


For beginners to learn more about machine learning/data analysis/work process of data mining such as position, find the breakthrough point of entry, I specially invited two experts in the field of artificial intelligence is different, a BAT data mining engineer @ panda sauce, an expert on computer vision in the direction of the @ Angela, and my senior algorithm engineer ali @ Chris, With its specific workflow as the core, held four consecutive ai introduction sharing sessions.



This is a rare introductory course designed to provide a solid foundation for AI enthusiasts and cross-industry learners. The courses are based on python data analysis, machine learning theory, machine learning mathematics and algorithm-based workflow

                         

This sharing session will answer the following questions:


Am I suitable to study ARTIFICIAL intelligence? I am a medical student, how is the AI medical employment situation now? Data analysis/data mining/algorithm engineer distinction and capability model? How well does an algorithm engineer need to understand an algorithm? Are model selection and parameter tuning techniques common? Application scenarios of deep learning algorithms…… (~ all your doubts will be solved here!)


Don’t miss an opportunity when you should be growing fast. Join this training camp, the first-line tutors will answer your questions online with all their heart, and your peers will supervise and encourage each other! This period is free to help you analyze specific professional progress direction!



Learning materials

Machine learning from entry to actual video courses


In addition, the first 500 students who successfully signed up for this course can get the video Course of Machine Learning from entry to actual combat, which is written by the tutor from Shanghai Peking University, Tsinghua University, Jiaotong University and other famous universities as well as the front-line engineers of Dachang. The video course is worth 1388 yuan. Contains python basics, data analysis, big data, machine learning, combat and other five categories of essential dry goods video, courseware and source can be downloaded, the following is the catalog.

Video Course of Machine Learning from Introduction to Actual Combat

— Five chapters, 63 lectures


Fundamentals of Linux and Python programming

1. Install vmware VMS

2. Install centos6.9

3. Use basic Linux commands

4. Python is introduced

5. Python installation

6. Install python

7. First Python program

8. Use of PyCharm

9. Variables, integers, floating point, and strings

Nulls, Booleans, lists, tuples, dictionaries, collections

11. If conditional statement, input function

12. Loop statements

13. Function introduction, function definition, function call, function parameters

14. The return value of the function

15. Global and local variables

16. Framework of student management system

17. The addition of student management system and the compilation of viewing modules

18. Revision and deletion of student management system, homework


Python data analysis

19. Introduction to Python Data Science

20. Introduction to common Python libraries

21. Data analysis environment construction

22. Numpy data type and index handling

23. NumpyAPI and matrix operations

24. Numpy advanced features and generic functions

25. Panda Overview and Serise

26. Pandas_DataFrame earnestly

27. DataFrame and Series indexes


Big data and data processing

28. What is big data

29. The relationship between big data, artificial intelligence and machine learning

30. Data volume and high concurrency (does high concurrency necessarily mean large data volume?)

Hadoop Introduction :HDFS introduction, architecture composition, practical operation drill

Hadoop Introduction :Mapreduce, Wordcount instance, framework process

33. Introduction to Spark, Environment construction, cluster installation, and example demonstration


Introduction to machine learning

34. Introduction to machine learning

Machine learning development environment

36. Introduction to MACHINE learning IDE

Basic theory and Philosophy of machine learning

38. Machine learning algorithm classification

39. Machine learning common tasks

40. Data cleaning

41. Standardization of data

42. Python and Sklearn data standardization practices

Similarity measurement in machine learning

44. The KNN algorithm

45. Case: Iris flower data classification based on KNN (SKlearn)

46. Case: Iris flower data classification based on KNN (Python)

47. Unary linear regression

48. Multiple linear regression

49. Polynomial regression

50. Sklearn linear regression practice

51. Python linear regression practice

Case: Advertising revenue analysis based on linear regression

Logistic regression classification algorithm

54. Dichotomous classifiers deal with multi-classification problems

55. Case: Iris flower data classification based on Logistic regression (SKlearn)

56. Case: Data classification of Iris flowers based on Logistic regression (Python


5. Machine learning

57. The preface

58. Preparation

59. High-end but generic word clouds

60. DCgan face image generation

61. Stock price forecast

62. Tensorflow object detection

63. Deep Dream

Of course, any information is only auxiliary, the most important thing is to follow the teachers to hands-on practice, learn the ai thinking of front-line development, understand the specific workflow of Dachang, and take the most solid step of artificial intelligence!