Random forests
1. What is the definition of strong learner and weak learner and the basis for division? Strong learner and weak learner - the relative concept, there is no very clear demarcation line - is reflected in the ability of the learner to deal with the most complex scene ** compared with the decision tree model - random forest can be called strong learner. Compared to other more complex models, it is weak learner model integration - multiple weak learners are constructed into a strong learner 1. Explain the concepts of model integration and model fusion, The following examples are given: model integration - combining multiple weak learners [base models] - improving the learning generalization ability of models homogeneous integration model - integrating the same kind of models heterogeneous integration model - integrating different kinds of models - Bagging and Boosting - Random Forest and GBDT are their respective representatives 1. In model integration, the results of each base model need to be combined to get the final result. This process is called the common model fusion method of model fusion: 1. Average method: Prediction problem - The results of each base model are averaged as the final result 1. Voting method: Classification problem - Select the most ** predicted ** category in the base model as the final result 1. Explain the basic principles of random forest model integration - typical representative of Bagging method - n random sampling of samples/variables ** - n sample sets are obtained - each sample set ** independent training decision tree model ** - n results of decision tree models - ** set strategy ** the final output - N decision tree models are relatively independent [not completely independent, The method of Booststrap Sample - random sampling of samples - LIM 1-(1-1/n)^n = 1-1/ e ≈ 63.2% about 63.3% of the samples are selected in each sampling - also applies to random sampling of variables 1. Why does the random forest model achieve better results than the decision tree model?Copy the code
The answer applies to all integration methods
Model error variance = deviation + each decision tree model - the same deviation and variance - more than the average/vote is the result of the decision tree model - ensure random forest model deviation * * * * with a single decision tree model in the same deviation * * * * Due to the relative independence between the decision tree model - the results are weighted average / - Greatly reduce the ** variance ** of the RANDOM forest model - reduce the errorCopy the code