Decision tree model
Non-parametric models – without making too many assumptions about objective functions and variables – are more flexible to use – and can handle more complex scenarios
Each node: ** test ** of ** feature ** branch of the tree: ** result ** of each test ** feature ** leaf node: categoryCopy the code
Each step of branch selection and node determination
1. How to determine what features should be selected for each node? What are the common methods and their respective characteristics? ID3 and C4.5 - The selection of features of each step is based on the information entropy [indicating the uncertainty of random variables] - the new branches generated on the nodes reduce the information entropy - Classification problem +ID3 cannot deal with continuous values +C4.5 can deal with continuous values but much more complex than CART information entropy: H(X)= -σ PILogpi ** PI: The probability of the sample falling on each leaf node - σ PI = 1 ** n = 2-P1,p2 is 1/2 - the maximum information entropy p1 = 1/P1 = 0 - the minimum information entropy - in the decision tree model, Selection of appropriate features as nodes - reducing information entropy CART method - Gini coefficient [Gini impimpness] instead of information entropy - Gini(D) = 1 - σ PI ² - Selection of features to reduce Gini impimpness as nodes CART method - support prediction of continuous values (regression) - The default decision tree model of Sklearn in Python is also using CART method to select branches 1. Similarities and differences of ID3, C4.5 method ID3 when selecting features - choose to maximize the information gain g (D, A) as the characteristics of A node - g (D, A) = H (D) - D H (D) - (D | A) decision tree model of the current information entropy; H (D | A) - after the creation of new nodes information entropy - will choose more characteristic of the branch as A node - fitting C4.5 - information gain ratio maximization - g '(D, A) = g (D, A)/H' (D) = (H (D) - H (D | A))/H 'H' (D) (D) = - Σ | Di / | | D | log2 | Di / | | D | - | Di / | | D | - samples in each classification number * * * * node in proportion - classification number - H '(D) larger incremental information than smaller - avoid choose to have too much branch characteristics as node 1. Compared with linear regression models and logistic regression models, nonparametric models - ** do not ** make ** assumptions about samples in advance ** - can handle ** more complex ** samples ** - computes ** faster ** - results ** are ** easier to interpret - ** Can handle ** classification problem and prediction problem simultaneously - ** is insensitive to ** missing values ** very strong interpretability - draw branches - clearly see the overall model selection process - quickly discover the factors affecting the results - guide the business to modify accordingly, adjust the weak learner - tune the method for optimization - still easy to overfit - Large error in the final result - the data with strong feature correlation does not perform well 1. What are the commonly used tuning methods for decision tree model? 1. Control tree depth, number of nodes and other parameters -- avoid overfitting. By means of model integration, a more complex model is formed based on decision treeCopy the code