“This is the 22nd day of my participation in the Gwen Challenge.

Decision tree model is one of the best models of machine learning algorithms. Its basic principle is to realize relevant decisions through if/else derivation of a series of problems.

The figure below shows a simple demonstration of a typical decision tree model — employee turnover prediction model. The decision tree first judges whether the employee’s satisfaction is less than 5. If the answer is “Yes”, the employee will be considered to quit; if the answer is “no”, the employee’s income is less than 10,000 yuan; if the answer is “yes”, the employee will quit; if the answer is “no”, the employee will not quit.

The figure shows the core principle of the decision tree model. The employee dimission prediction model to be explained later is a slightly complicated model built on big data. In the actual business practice, the dimission probability is predicted based on multiple characteristics, and then the corresponding threshold is used to judge whether to dimission. For example, if the dimission probability exceeds 50%, the employee will be considered to leave.

Several important concepts of the decision tree model are explained below: parent and child, root and leaf nodes.

The parent node is opposite to the child node. The child node is split from the parent node according to a certain rule, and then the child node continues to split as the new parent node until it cannot be split. The root node is opposite to the leaf node. The root node is the node without parent node, namely the initial node, while the leaf node is the node without child node, namely the final node. The key of decision tree model is how to choose appropriate nodes to split.

In the figure above, “satisfaction < 5” is the root node as well as the parent node, which is split into two child nodes “dimission” and “income < 10000 yuan”. The child node “leaves” because it no longer splits into child nodes, so it is a leaf node. Another child node “income < 10000 yuan” is the parent node of the following two nodes; “Quit” and “not quit” are leaf nodes.

In practice, the enterprise will be departing employees through the analysis of the existing data to conform to the characteristics, such as the view on their satisfaction, income, age, working hours, number, etc., and then select the corresponding characteristics of node split, can build a similar decision tree model of the above, reuse the model to predict employee turnover, and the adoption of countermeasures according to the forecast results.

The concept of decision tree is not complicated, mainly through continuous logical judgment to reach the final conclusion, the key lies in how to establish such a “tree”. For example, which feature should be selected by the root node? Choosing “satisfaction < 5” or “income < 10000 yuan” as the root node will have different effects. Secondly, income is a continuous variable, so if you choose “income < 10000 yuan” or “income < 100000 yuan” as a node, the results will be different. The following is to explain the tree basis of the decision tree model.

Classification decision tree model

The demonstration code of classification decision tree model is as follows.

From sklearn. Tree import DecisionTreeClassifier X=[[1,2],[3,4],[5,6],[7,8],[9,10]] y=[1,0,0,1,1] model = DecisionTreeClassifier (random_state = 0) model fit (X, y) print (model, predict ((5, 5)))Copy the code

The prediction results obtained after running are as follows.

[0]

As you can see, data [5,5] is predicted to be category 0.

If you want to predict multiple data at the same time, you can use the following code.

Print (model predict ([[5, 5), (7, 7], [9, 9]]))Copy the code

The predictions are as follows.

[0, 0, 1]

Regression decision tree model

Decision tree can not only carry out classification analysis, but also carry out regression analysis, that is, predict continuous variables. The decision tree at this time is called regression decision tree. A simple demonstration of the regression decision tree model is shown below.

Tree import DecisionTreeRegressor X=[[1,2],[3,4],[5,6],[7,8],[9,10]] y=[1,2,3,4,5] model = DecisionTreeRegressor (max_depth = 2, random_state = 0) model. The fit (X, y) print (model, predict ([[9, 9]]))Copy the code

The predictions are as follows.

[4.5]