Since I served as the leader of algorithm group in 2015, I have interviewed many students as an interviewer. I interviewed more than 200 students, many of whom went on to develop well and strengthened their confidence in the selection criteria.

It is especially difficult to find a job in 2020. I have compiled 80 important interview questions from my years as an interviewer, hoping to help you.

For your convenience, I have made a classification, which is divided into 6 categories: machine learning, feature engineering, deep learning, NLP, CV, recommendation system. This knowledge is not only a common question in the interview, but also can be used as a reference to organize your thoughts. (Students who need to get free at the end of the paper)

Machine Learning Theory:

1. Write the total probability formula & Bayes formula

2. Why do we introduce bias and variance into model training? the

CRF/ Naive Bayes /EM/ Maximum entropy model/Markov random field/Mixed Gaussian model

4. How to solve the over-fitting problem?

5. What is the function of one-hot? Why not just use numbers

6. What is the difference between decision tree and random forest?

Why are you so naive?

8. Kmeans method other than random selection of initial points

9. LR is clearly a classification model, why is it called regression

10. How to parallelize gradient descent

11. What is the L1/L2 regular item in LR

12. Describe the decision tree construction process

13. Explain the Gini coefficient

14. Advantages and disadvantages of decision trees

15. What if the estimated probability of occurrence is 0

16. The generation process of random forest

17. Introduce the idea of Boosting

18. What is the tree in GBDT? What are the characteristics

19. Xgboost compares GBDT/Boosting Tree with which orientations are optimized

20. What is optimal hyperplane

21. What are support vectors

22. How does SVM solve the problem of multiple classification

23. What does the kernel function do

Feature Engineering:

1. How to remove missing values from DataFrame?

2. Common operation methods of feature dimensionless

3. How to conduct unique thermal coding for category variables?

4. How to segment the “Age” field according to our threshold?

5. How to draw a thermal map according to the correlation of variables?

6. How to modify the distribution to a normal-like distribution?

7. How to divide data and visualize data simply using PCA?

8. How to use LDA simply to divide data and visualize it?

Deep learning:

1. What do you think the batch-normalization process is like

2. What is the use of the activation function? What are the common differences between activation functions?

3. How does Softmax work? What does it do? What is the translation invariance of CNN? How is it done?

4. What are the differences between VGG, GoogleNet, ResNet, etc.?

5. Why can residual network solve the problem of gradient disappearance

6. Why can LSTM solve the problem of gradient disappearance/explosion

7. When comparing RNN and CNN for Attention, what do you think are the advantages

8. Write the formula for Attention

9. Attention mechanic — what do q, K,v stand for

10. Why can self-attention replace seq2seq

Natural Language Processing (NLP) classes

1. GolVe’s loss function

2. Why does GolVe use less than W2V

3. Hierarchical Softmax process

4. Negative sampling process

5. How to measure the quality of embedding learned

6. Explain the CRF principle

7. Detail the principle of LDA

8. How to calculate the topic matrix in LDA

9. The difference between LDA and Word2Vec? LDA and Doc2Vec are different

10. Where is Bert’s bidirectional expression

11. How is Bert pre-trained

12. What are the reasons for randomly selecting 15% markers in the data, among which 80% are transposed [mask], 10% remain unchanged, and 10% are randomly substituted with other words

13. Why does BERT have 3 embedding layers and how are they implemented

14. Write a multi-head attention by hand

Recommended Systems:

1. Differences between DNN and DeepFM

2. How do you deal with under-fitting and over-fitting problems when using deepFM

3. What’s interesting about deepFM’s embedding initialization

4. How is the YoutubeNet variable length data processed

5. How does YouTubeNet avoid millions of Softmax issues

6. What are the common evaluation indicators of the recommendation system?

7. What is the principle of MLR? What optimizations have been made?

Computer Vision (CV)

1. Common model acceleration methods

2. How to effectively solve the common problem of less foreground and more background in target detection

3. Is there any situation in target detection that cannot be solved by SSD, YOLOv3, Faster R-CNN, etc., assuming infinite network fitting ability

4. Differences between ROIPool and ROIAlign

5. Introduce common gradient descent optimization methods

6. Detection What else do you think you can do

7. What are the advantages of Mini-Batch SGD over GD

8. What are the two main methods of human pose estimation? Just a quick introduction

9. The realization principle of convolution and how to quickly and efficiently realize the convolution operation mode of local weight sharing

10. Why the CycleGAN generation effect is generally position invariant texture change, why can not produce different position generation effect

The answers to these questions, we sorted out a package for you, you can scan code add to receive, I wish you a smooth job search ~

Long press scan to add

There are only 100 seats in this number

Left left left

Note 【 interview questions 】 add????

Note: Please select [Interview questions] for expected data