preface
A decision tree is a very common machine learning classification algorithm, called a decision tree, so its model is actually like a tree. Through the study of sample sets, useful rules are mined. A decision tree can be thought of as a collection of if then conditional statements. This model is equivalent to the conditional statement we wrote, so its predictive classification speed is very fast.
example
For example to understand the decision tree classification process, the girl to pick “Grosvenor LTD shuai”, for example, had married is certainly not exchanges, in the case of unmarried then to see whether there is a house property, if not also aside, have house property then continue to see the height, above 180 cm, 180 cm below the look have two suites, you can make up for a lack of height, Otherwise, it is rejected.
Generally, a decision tree consists of a root node, several internal nodes (circular nodes in the figure) and several leaf nodes (square nodes in the figure). The internal node describes a property, and the leaf node represents the result of classification.
How to learn
For example, suppose this is a sample of whether a loan is approved or not, how do you create a decision tree based on this sample?
Serial number | Real estate | marriage | Monthly income | Whether to approve |
---|---|---|---|---|
1 | There are | single | 30k | is |
2 | There is no | married | 20k | is |
3 | There is no | single | 10k | is |
4 | There are | married | 30k | is |
5 | There is no | divorce | 8k | no |
6 | There is no | single | 5k | no |
7 | There are | divorce | 10k | is |
Attribute set is {” real estate “, “marriage”, “monthly income”}. If the property is selected as the optimal segmentation attribute, then the sample set is divided into {1,4,7} and {2,3,5,6}.
If there is no real estate, it continues to be divided, {2,3,5,6}. At this time, the attribute set is {” marriage “and” monthly income “}. If marriage is selected as the segmentation attribute, it is divided into {2}, {3,6} and {5}.
Finally, {3,6} is left to be segmented, and the monthly income attribute is used to take 10k as the boundary to finally complete the creation of the whole decision tree.
Key points:
- All data starts at the root node
- Divide and conquer from the top down
- The sample is segmented recursively based on the attribute set
- Properties are selected by certain rules or algorithms
- When the data on each node is the same class, the segmentation is stopped
- As far as possible, the decision tree trained from samples has no contradiction with the sample set and has predictive ability
- Decision tree generation only considers local optimal, pruning is global optimal.
Attribute partitioning selection
The core is to select partition attributes by information gain, looking at information entropy before looking at the definition of information gain. Set sample set D, for k classes, the proportions are respectively, the information entropy of sample set D is,
If attribute A is selected for segmentation, suppose it has m possible values, i.e. Then there will be m branches, among which the KTH branch node contains all the attribute A of set DAnd the new set that I’m going to form here is going to be. At this point, the information gain can be defined as,
Go back to the previous example and calculate the information gain of “real estate”. Firstly, calculate the information entropy of set D.
Select “real estate” as the attribute and the sample set is divided into {1,4,7} and {2,3,5,6}, then
Among themThe convention is 0, at which point the classification has reached “pure”. Gain for
Other attributes are similarly calculated, and then the maximum information gain is selected as the segmentation attribute.
Information gain rate
As mentioned above, the selection of partition attributes is based on information gain, but it will be biased to the attributes with more possible values, so it will bring bias to the final selection of attributes, so the information gain rate is introduced. It is defined as follows,
Among them,
Where v is the number of possible values of the a attribute.
ID3 and C4.5 algorithms
ID3 decision tree algorithm In the process of decision tree generation, each node uses information gain to select segmentation attributes. The general procedure is to start from the root node and assume that each attribute is the information gain during segmentation, that is, the attribute with the maximum information gain is selected as the segmentation attribute of the root node. Is completed according to the properties and can be divided into several branches, each branch corresponding to a child node, and then according to the above steps to calculate different attribute information gain, recursive down continuously, until a certain node in each property as at the time of the partitioning information gain are small or has no attributes can choose, then stop counting and get a final decision tree.
C4.5 is similar to ID3 in the process of creating a decision tree, except that the selection of attribute partitioning is based on the information gain rate instead of using the information gain.
pruning
Why pruning? In the process of decision tree generation, the tree generated by continuous iteration is easy to produce over-fitting phenomenon. Although the classification accuracy of training samples is high, the generalization ability may be low, so pruning operation needs to be carried out by certain means.
The main basis for judging whether a branch should be cut is whether a node can improve the generalization ability before and after division. If it can be divided, the branch should be cut otherwise. So now the job becomes how do you tell whether generalization is improving? Therefore, it is necessary to extract some data from the training set as the verification set and not participate in the training. Then, in the training process, the accuracy is calculated by the verification set before and after the attribute division. If the accuracy is higher after the division than before, it is divided; otherwise, it is equivalent to pruning.
————- Recommended reading ————
My 2017 article summary – Machine learning
My 2017 article summary – Java and Middleware
My 2017 article summary – Deep learning
My 2017 article summary — JDK source code article
My 2017 article summary – Natural Language Processing
My 2017 Article Round-up — Java Concurrent Article
—————— advertising time —————-
The public menu has been divided into “distributed”, “machine learning”, “deep learning”, “NLP”, “Java depth”, “Java concurrent core”, “JDK source”, “Tomcat kernel” and so on, there may be a suitable for your appetite.
My new book “Analysis of Tomcat kernel Design” has been sold on JINGdong, friends in need can buy. Thank you all.
Why to write “Analysis of Tomcat Kernel Design”
Welcome to: