4.2 Decision tree ID3 practice

The most original version of decision tree algorithm is ID3 algorithm, which was invented by Ross Quinlan and based on Occam’s Razor: the smaller the decision tree is, the better the large one is (be Simple theory). In ID3 algorithm, according to the information gain evaluation and feature selection, each time the feature with the maximum information gain is selected as the judgment module to establish the child node. ID3 algorithm can be used to divide nominal data sets without pruning. In order to eliminate the problem of excessive data matching, adjacent leaf nodes that cannot generate a large amount of information gain can be combined by pruning (such as setting the information gain threshold). The disadvantage of using information gain is that it is biased towards attributes with a large number of values. That is to say, in the training set, the more the number of different values of an attribute, the more likely it is to be used as a split attribute, which is sometimes meaningless. In addition, ID3 cannot deal with continuously distributed data features, so there is the C4.5 algorithm. The CART algorithm also supports continuously distributed data features.

The previous article explained the basic knowledge of decision tree, decision tree realization mainly consists of three parts: feature selection, decision tree generation, decision tree pruning. Next, we combine algorithms and examples to realize the decision tree step by step.

4.2.1 Simple example of decision tree computer purchase

First of all, the author gives sample data in Table 1. We judge whether to buy a computer by a person’s age, whether he is a student or not. Well, we combine this example to judge whether to buy a computer by using decision tree.

Table 1 Sample data table

RID age student credit_rating income class_buys_computer
1 youth no fair high no
2 youth no excellent high no
3 middle_aged no fair high yes
4 senior no fair medium yes
5 senior yes fair low yes
6 senior yes excellent low no
7 middle_aged yes excellent low yes
8 youth no fair medium no
9 youth yes fair low yes
10 senior yes fair medium yes
11 youth yes excellent medium yes
12 middle_aged no excellent medium yes
13 middle_aged yes fair high yes
14 senior no excellent medium no

4.2.1.1 Feature Selection

The feature selection problem hopes to select the features with good classification ability to training data, so as to improve the efficiency of decision tree learning. If the result of classifying a feature is not significantly different from the result of random classification, the feature is said to be incapable of classification. In order to solve the problem of feature selection and find the optimal feature, it is necessary to introduce some concepts in information theory. ** Entropy is a measure of the uncertainty of a random variable. Let X be a discrete random variable with a finite number of values, and its probability distribution is:

P ( X = x i = p i , i = 1 , 2… N \ color {red} P (X = x_i = p_i, I = 1, 2… , n P (X = xi = PI, I = 1, 2… ,n

The entropy of the random variable is defined as: H(X)=−∑ I =1np I logp I \color{red}H(X)=-\sum_{I =1}^{n}p_ilogp_i H(X)=−∑ I = 1Npilogpi

Where n n n is X X X X n n different discrete values. And p, I, p_i, PI is the probability that X, X, X is equal to I, I, I, l, g, log, log is log base two, two, two, or e, e, e. When the logarithm base is 2, 2, 2, the unit of entropy is bit; When the value is e e e, the unit is NAT. The higher the entropy, the greater the uncertainty of the random variable. It can be verified from the definition:

0 &lt; H ( p ) &lt; l o g n \color{red}0&lt; H(p)&lt; logn 0<H(p)<logn

These are the previous article has been explained, the author here put here, is for the needs of the article. All right, so let’s calculate entropy. Of course, this is a small amount of data, the probability distribution can be easily calculated, but programming is not so simple, see the code for the specific process, the following is calculated according to the probability of entropy.



So entropy is 0.94. Let’s do this in Python.

We first annotate the attributes of the data set.

Age: 0 for youth, 1 for middle age, 2 for old age;

Yes students: 0 means no, 1 means yes;

Credit rating: 0 for bad, 1 for good;

Income: 0 means average, 1 means good, 2 means very good;

Category: No for no, yes for yes.

With that in mind, how do you do it in Python? The code is as follows, the comments are very clear, I will not explain one by one.

# -*- coding: UTF-8 -*-
from math import log
Parameters: No Returns: dataSet - dataSet allelages-feature tag ""
def createDataSet() :
    # data set
    dataSet = [[0.0.0.2.'no'], 
               [0.0.1.2.'no'], 
               [1.0.0.2.'yes'], 
               [2.0.0.1.'yes'],
               [2.1.0.0.'yes'], 
               [2.1.1.0.'no'], 
               [1.1.1.0.'yes'], 
               [0.0.0.1.'no'],
               [0.1.0.0.'yes'],
               [2.1.0.1.'yes'], 
               [0.1.1.1.'yes'], 
               [1.0.1.1.'yes'], 
               [1.1.0.2.'yes'],
               [2.0.2.2.'no']]
    # Feature tag
    labels = ['age'.'student'.'credit_rating'.'income']

    # return data set and category attributes
    return dataSet, labels 	

Parameters: dataSet Returns: shannonEnt - Empirical entropy (Shannon entropy) ""
def calcShannonEnt(dataSet) :
    Return the number of rows in the dataset
    numEntires = len(dataSet)                        
    
    Keep a dictionary of occurrences for each Label
    labelCounts = {}                                
    
    # Perform statistics on each group of feature vectors
    for featVec in dataSet:                            
        Extract the Label information
        currentLabel = featVec[-1]                    
        
        If the Label is not in the count dictionary, add it
        if currentLabel not in labelCounts.keys():    
            
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1   # Label counting
        
    shannonEnt = 0.0   # Empirical entropy (Shannon entropy)
    
    # Calculate Shannon entropy
    for key in labelCounts:                            
        
        The probability of selecting the Label
        prob = float(labelCounts[key]) / numEntires    
        
        # Use the formula to calculate
        shannonEnt -= prob * log(prob, 2)            
        
    # return empirical entropy (Shannon entropy)
return shannonEnt 
# test
if __name__ == '__main__':
    dataSet, features = createDataSet()
    print(dataSet)
    print(calcShannonEnt(dataSet))
Copy the code

The results are shown below.

As you can see, this is the same as the manual calculation above.

** Conditional entropy ** sets random variables (X,Y) (X,Y) (X,Y), The joint probability distribution is P (X = xi, Y = yj) = P I j, I = 1, 2, n; J = 1, 2,…, m \ color {red} P (X = x_i, Y = y_j) = p_ {ij}, I = 1, 2,… ,n; J = 1, 2,… , m P (X = xi, Y = yj) = pij, I = 1, 2,… ,n; J = 1, 2,… ,m

Conditional entropy H (Y) ∣ X H (Y | X) H (Y ∣ X) said under the condition of known random variable X X X the uncertainty of a random variable Y Y Y. A random variable X X X given under the condition of random variable Y Y Y condition entropy H (Y) ∣ X H (Y | X) H (Y ∣ X), defined as X X X Y Y Y under the condition of given conditional probability distribution of the entropy of mathematical expectation of X X X. H (Y) ∣ X = ∑ I = 1 n p I H (Y ∣ X = I) X \ color H (Y | X) = {red} \ sum_ {I = 1} ^ np_iH (Y | X = x_i) H (Y ∣ X) = ∑ I = 1 npih (∣ X = Y xi)

Taking age as an example, the calculated conditional entropy is as follows.



The Python code is shown below.

# -*- coding: utf-8 -*-
from math import log
import operator

Parameters: No Returns: dataSet - dataSet allelages-feature tag ""
def createDataSet() :
    # data set
    dataSet = [[0.0.0.2.'no'], 
               [0.0.1.2.'no'], 
               [1.0.0.2.'yes'], 
               [2.0.0.1.'yes'],
               [2.1.0.0.'yes'], 
               [2.1.1.0.'no'], 
               [1.1.1.0.'yes'], 
               [0.0.0.1.'no'],
               [0.1.0.0.'yes'],
               [2.1.0.1.'yes'], 
               [0.1.1.1.'yes'], 
               [1.0.1.1.'yes'], 
               [1.1.0.2.'yes'],
               [2.0.2.2.'no']]
    # Feature tag
    labels = ['age'.'student'.'credit_rating'.'income']

    # return data set and category attributes
    return dataSet, labels 	

Parameters: dataSet - dataSet to be partited axis - dataSet to be partited value - dataSet to be returned Returns: none ""
def splitDataSet(dataSet, axis, value) :		
    Create a list of returned datasets
    retDataSet = []										
    Walk through the data set
    for featVec in dataSet: 							
        if featVec[axis] == value:
            # Remove axis features
            reducedFeatVec = featVec[:axis]
            Add those that meet the criteria to the returned dataset
            reducedFeatVec.extend(featVec[axis+1:]) 	
            
            retDataSet.append(reducedFeatVec)
	
    Return the partitioned data set
    return retDataSet		  							

"" "calculation X_i function description: the given conditions, Y the conditional entropy of the Parameters: the dataSet - to partition data set uniqueVals I - I - dimension data set character set Returns: newEntropy - conditional entropy "" "
def calcConditionalEntropy(dataSet, i, uniqueVals) :
    
    Empirical conditional entropy
    newEntropy = 0.0
    for value in uniqueVals:
        subDataSet = splitDataSet(dataSet, i, value)
        prob = len(subDataSet) / float(len(dataSet))  # Maximum likelihood estimation probability
        newEntropy += prob * calcShannonEnt(subDataSet)  # Calculation of conditional entropy
    return newEntropy

# test
if __name__ == '__main__':
dataSet, labels = createDataSet()
# Number of features
    numFeatures = len(dataSet[0]) - 1	
    # Walk through all features
    for i in range(numFeatures): 						
        Obtain the ith all feature of the dataSet
        featList = [example[i] for example in dataSet]
        # create set {}, elements cannot be repeated
        uniqueVals = set(featList)     					
		
        Empirical conditional entropy
        newEntropy = 0.0
        newEntropy = calcConditionalEntropy(dataSet, i, uniqueVals)
        print(newEntropy)
Copy the code

The results are shown below.



As you can see, the first one is the conditional entropy of age characteristics, which is the same as the manual calculation.

Information gain refers to the extent to which the uncertainty of the information of class Y, Y and Y is reduced by knowing the information of feature X, X and X. Information gain g(D,A) g(D,A) g(D,A) g(D,A) g(D,A) Experience is defined as the set D D D entropy H (D) (D) H H (D) and features A A A given under the condition of D D D experience of conditional entropy H (D ∣ A) H H (D | A) ∣ D (A), the difference between the G (D, A) = H (D) – H (D ∣ A) \ color {red} g (D, A) = H (D) – H (D | A) g (D, A) = H (D) – H (D ∣ A)

This difference is also called mutual information. Features with large information gain have stronger classification ability. The feature selection method based on the information gain criterion is to calculate the information gain of each feature of the training data set (or subset) and select the feature with the maximum information gain. Let’s try to explain it with examples. The age gain is as follows.



Similarly, Gain(income) = 0.029, Gain(student) = 0.151, Gain(credit_rating)=0.048

So, age is chosen as the first root node

Figure 1

The Python code is shown below.

# -*- coding: utf-8 -*-
from math import log
import operator

Parameters: No Returns: dataSet - dataSet allelages-feature tag ""
def createDataSet() :
    # data set
    dataSet = [[0.0.0.2.'no'], 
               [0.0.1.2.'no'], 
               [1.0.0.2.'yes'], 
               [2.0.0.1.'yes'],
               [2.1.0.0.'yes'], 
               [2.1.1.0.'no'], 
               [1.1.1.0.'yes'], 
               [0.0.0.1.'no'],
               [0.1.0.0.'yes'],
               [2.1.0.1.'yes'], 
               [0.1.1.1.'yes'], 
               [1.0.1.1.'yes'], 
               [1.1.0.2.'yes'],
               [2.0.2.2.'no']]
    # Feature tag
    labels = ['age'.'student'.'credit_rating'.'income']

    # return data set and category attributes
    return dataSet, labels 	

Parameters: dataSet Returns: shannonEnt - Empirical entropy (Shannon entropy) ""
def calcShannonEnt(dataSet) :
    Return the number of rows in the dataset
    numEntires = len(dataSet)                        
    
    Keep a dictionary of occurrences for each Label
    labelCounts = {}                                
    
    # Perform statistics on each group of feature vectors
    for featVec in dataSet:                            
        Extract the Label information
        currentLabel = featVec[-1]                    
        
        If the Label is not in the count dictionary, add it
        if currentLabel not in labelCounts.keys():    
            
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1   # Label counting
        
    shannonEnt = 0.0   # Empirical entropy (Shannon entropy)
    
    # Calculate Shannon entropy
    for key in labelCounts:                            
        
        The probability of selecting the Label
        prob = float(labelCounts[key]) / numEntires    
        
        # Use the formula to calculate
        shannonEnt -= prob * log(prob, 2)            
        
    # return empirical entropy (Shannon entropy)
    return shannonEnt                                						

Parameters: dataSet - dataSet to be partited axis - dataSet to be partited value - dataSet to be returned Returns: none ""
def splitDataSet(dataSet, axis, value) :		
    Create a list of returned datasets
    retDataSet = []										
    Walk through the data set
    for featVec in dataSet: 							
        if featVec[axis] == value:
            # Remove axis features
            reducedFeatVec = featVec[:axis]
            Add those that meet the criteria to the returned dataset
            reducedFeatVec.extend(featVec[axis+1:]) 	
            
            retDataSet.append(reducedFeatVec)
	
    Return the partitioned data set
    return retDataSet		  							

"" "calculation X_i function description: the given conditions, Y the conditional entropy of the Parameters: the dataSet - to partition data set uniqueVals I - I - dimension data set character set Returns: newEntropy - conditional entropy "" "
def calcConditionalEntropy(dataSet, i, uniqueVals) :
    
    Empirical conditional entropy
    newEntropy = 0.0
    for value in uniqueVals:
        subDataSet = splitDataSet(dataSet, i, value)
        prob = len(subDataSet) / float(len(dataSet))  # Maximum likelihood estimation probability
        newEntropy += prob * calcShannonEnt(subDataSet)  # Calculation of conditional entropy
    return newEntropy

Returns: bestIndex: best feature index bestInfoGain: Best information gain ""
def calcInformationGain(dataSet, baseEntropy) :

    # Optimal feature index value
    bestIndex = -1
    # Information gain
    bestInfoGain = 0.0  	
    
    # Number of features
    numFeatures = len(dataSet[0]) - 1	
    # Walk through all features
    for i in range(numFeatures): 						
        Obtain the ith all feature of the dataSet
        featList = [example[i] for example in dataSet]
        # create set {}, elements cannot be repeated
        uniqueVals = set(featList)     					
		
        Empirical conditional entropy
        newEntropy = 0.0
        # Calculate conditional entropy
        newEntropy = calcConditionalEntropy(dataSet, i, uniqueVals)
        # gain
        infoGain = baseEntropy - newEntropy  # Information gain is the reduction of yes entropy, which is also the reduction of YES uncertainty
        if (infoGain > bestInfoGain): 	# Calculate information gain
            Update the information gain to find the maximum information gain
            bestInfoGain = infoGain 		
			
            Record the index value of the feature with the maximum information gain
            bestIndex = i 
    
return bestIndex, bestInfoGain

# test
if __name__ == '__main__':
dataSet, labels = createDataSet()
ent = calcShannonEnt(dataSet)
    
index,gain = calcInformationGain(dataSet, ent)
print(gain)
Copy the code

The results are shown below.

The result is exactly what we calculated. That our code is no problem. Ok, so let’s summarize the algorithm for computing information gain. Algorithm (computing information gain) input: training data set D D D and feature A A A; Output * : Information gain of feature A A A to training data set D D D G (D,A) g(D,A) g(D,A).

  • Step1: Calculate empirical entropy H(D) H(D) H(D) H(D)

  • Step2: computing features A A for A data set D D D experience of conditional entropy H (D ∣ A) H (D | A) H (D ∣ A)

  • Step3: calculate the information gain



    [Note] The algorithm that uses information gain as the feature of training data set is called ID3 algorithm.

4. Information Gain Ratio

The information gain ratio of feature A A A to training data set D D D gR(D,A) g_R(D,A) gR(D,A) is defined as its information gain G (D, A) g(D,A) g(D,A) the ratio of entropy HA(D) H_A(D) HA(D) of the value of feature A A A of the training data set D D D, i.e



Where n, n, n is the number of values of feature A, A, A.

The algorithm of feature selection by information gain ratio is called C4.5 algorithm. C4.5 will be introduced later.

4.2.1.2 Specific process of ID3 algorithm

We have learned from the data set needed to construct the decision tree algorithm subroutine modules, including the experience of entropy calculation and selection of the optimal characteristics, its working principle is as follows: to get original data sets, and then the attribute value of the best based on the data set, because of the characteristic value may be more than two, so there may be more than two branches dataset partition. After the first partition, the data set is passed down to the next node in the tree branch. At this node, we can divide the data again. So we can deal with the data set recursively.

There are many algorithms for building decision trees, such as C4.5, ID3, and CART, which do not always consume features at run time each time they divide groups of data. Since the number of features does not decrease every time a data group is divided, these algorithms may cause some problems in practical use. We don’t need to worry about this right now, just count the number of columns and see if the algorithm uses all the attributes before it starts running.

The decision tree generation algorithm recursively generates the decision tree until it can no longer continue. The tree generated in this way is often very accurate in the classification of training data, but not so accurate in the classification of unknown test data, that is, over-fitting phenomenon occurs. The reason of over-fitting lies in the fact that too much consideration is given to how to improve the correct classification of training data during learning so as to construct an over-complex decision tree. The solution to this problem is to consider the complexity of decision tree and simplify the generated decision tree.

The core of ID3 algorithm is to construct the decision tree recursively according to the information gain criteria selection characteristics on each node of the decision tree. The specific method is as follows: from the root node, the information gain of all possible features is calculated for the node, the feature with the maximum information gain is selected as the feature of the node, and the child node is established based on the different values of the feature. Then the above methods are recursively called on the child nodes to build a decision tree. Finally, a decision tree is obtained until the information gain of all features is small or no features can be selected. ID3 is equivalent to selecting probability model by maximum likelihood method.

Algorithm (ID3 algorithm) input: training data set D D D, feature set A A A, threshold ϵ ϵ ϵ; Output: decision tree T T T;

  • Step.1 Initialize the threshold of information gain ϵ ϵ ϵ

  • Step.2 Determine whether sample D D D is the same type of output Ck C_k Ck. If T T T is a single-node tree T T T, class C K C_k Ck is taken as the class mark of this node and return T T T.

  • Step.3 Determine whether feature set A, A, A is empty. If so, return single-node tree T, T, T.

  • Step.4 Calculate the information gain of each feature (n n in total) in A, A and A on the output D, D and D, and select the feature with the maximum information gain A, G, G, G, g, g, g, g, g, g.

  • Step.5 If the information gain of Ag A_g Ag is less than the threshold ϵ ϵ ϵ, the single-node tree T T T is returned. The labeled category is the category with the largest number of output categories D D D C K C_k Ck in the sample, and T T T is returned.

  • Step.6 Otherwise, according to different values of feature Ag A_g A I a_i AI, the corresponding sample output D D D D D D is divided into several non-empty subsets D I D_i Di, and the classes with larger real examples in D I D_i Di are used as markers to construct child nodes. The node and its children form the number T T T, return T T T.

  • Step.7 For the I I I sub-node, take DI D_I DI as the training set and A− A- A−{

    Ag A_g g} is the feature set, recursively call steps (2)~ (6), get the subtree T I T_i Ti, return T I T_i Ti.

According to the above calculation, the feature “age” with the maximum information gain is selected as the node feature. Since “age” has three possible values, three children are derived from this node: one child corresponding to “youth”, containing five samples, which belong to the same class, so this is a leaf node; The other is the child corresponding to middle age, which contains four samples, which are also in the same category, so this is also a leaf node; Another node is old age.

All right, let’s look at the code.

# -*- coding: utf-8 -*-
from math import log
import operator

Parameters: No Returns: dataSet - dataSet allelages-feature tag ""
def createDataSet() :
    # data set
    dataSet = [[0.0.0.2.'no'], 
               [0.0.1.2.'no'], 
               [1.0.0.2.'yes'], 
               [2.0.0.1.'yes'],
               [2.1.0.0.'yes'], 
               [2.1.1.0.'no'], 
               [1.1.1.0.'yes'], 
               [0.0.0.1.'no'],
               [0.1.0.0.'yes'],
               [2.1.0.1.'yes'], 
               [0.1.1.1.'yes'], 
               [1.0.1.1.'yes'], 
               [1.1.0.2.'yes'],
               [2.0.2.2.'no']]
    # Feature tag
    labels = ['age'.'student'.'credit_rating'.'income']

    # return data set and category attributes
    return dataSet, labels 	

Parameters: dataSet Returns: shannonEnt - Empirical entropy (Shannon entropy) ""
def calcShannonEnt(dataSet) :
    Return the number of rows in the dataset
    numEntires = len(dataSet)                        
    
    Keep a dictionary of occurrences for each Label
    labelCounts = {}                                
    
    # Perform statistics on each group of feature vectors
    for featVec in dataSet:                            
        Extract the Label information
        currentLabel = featVec[-1]                    
        
        If the Label is not in the count dictionary, add it
        if currentLabel not in labelCounts.keys():    
            
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1   # Label counting
        
    shannonEnt = 0.0   # Empirical entropy (Shannon entropy)
    
    # Calculate Shannon entropy
    for key in labelCounts:                            
        
        The probability of selecting the Label
        prob = float(labelCounts[key]) / numEntires    
        
        # Use the formula to calculate
        shannonEnt -= prob * log(prob, 2)            
        
    # return empirical entropy (Shannon entropy)
    return shannonEnt                                						

Parameters: dataSet - dataSet to be partited axis - dataSet to be partited value - dataSet to be returned Returns: none ""
def splitDataSet(dataSet, axis, value) :		
    Create a list of returned datasets
    retDataSet = []										
    Walk through the data set
    for featVec in dataSet: 							
        if featVec[axis] == value:
            # Remove axis features
            reducedFeatVec = featVec[:axis]
            Add those that meet the criteria to the returned dataset
            reducedFeatVec.extend(featVec[axis+1:]) 	
            
            retDataSet.append(reducedFeatVec)
	
    Return the partitioned data set
    return retDataSet		  							

Return: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns
def calcConditionalEntropy(dataSet, i, uniqueVals) :
    
    Empirical conditional entropy
    newEntropy = 0.0
    for value in uniqueVals:
        subDataSet = splitDataSet(dataSet, i, value)
        prob = len(subDataSet) / float(len(dataSet))  # Maximum likelihood estimation probability
        newEntropy += prob * calcShannonEnt(subDataSet)  # Calculation of conditional entropy
    return newEntropy

Returns: bestIndex: best feature index bestInfoGain: Best information gain ""
def calcInformationGain(dataSet) :

    # Optimal feature index value
    bestIndex = -1
    # Information gain
    bestInfoGain = 0.0  	
    
    baseEntropy = calcShannonEnt(dataSet)
    
    # Number of features
    numFeatures = len(dataSet[0]) - 1	
    # Walk through all features
    for i in range(numFeatures): 						
        Obtain the ith all feature of the dataSet
        featList = [example[i] for example in dataSet]
        # create set {}, elements cannot be repeated
        uniqueVals = set(featList)     					
		
        Empirical conditional entropy
        newEntropy = 0.0
        # Calculate conditional entropy
        newEntropy = calcConditionalEntropy(dataSet, i, uniqueVals)
        # gain
        infoGain = baseEntropy - newEntropy  # Information gain is the reduction of yes entropy, which is also the reduction of YES uncertainty
        
        # Optimal gain selection
        if (infoGain > bestInfoGain): 	
            Update the information gain to find the maximum information gain
            bestInfoGain = infoGain 		
			
            Record the index value of the feature with the maximum information gain
            bestIndex = i 
    
    return bestIndex, bestInfoGain

Returns: sortedClassCount[0][0] Returns: sortedClassCount[0]
def majorityCnt(classList) :
    classCount = {}
    
    for vote in classList:		
        Count the number of occurrences of each element in the classList
        if vote not in classCount.keys():
            classCount[vote] = 0	
            classCount[vote] += 1
    sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)		Sort by descending dictionary values
    
    # return the most common element in classList
    return sortedClassCount[0] [0]								

Parameters: dataSet training datasets labels-classification attribute tags Featworths-Optimal feature tags for storing selected information. Returns: myTree featworths-decision tree
def createTree(dataSet, labels, featLabels) :
    
    # Take the category tag (whether to lend :yes or no)
    classList = [example[-1] for example in dataSet]			
    
    # Stop categorizing if the categories are exactly the same
    if classList.count(classList[0= =])len(classList):			
        return classList[0]
     
    # return the class tag with the most occurrences when all features are iterated
    if len(dataSet[0= =])1:									
        return majorityCnt(classList)
    
    bestFeat, bestInfoGain= calcInformationGain(dataSet)	# Select the optimal feature
    bestFeatLabel = labels[bestFeat]# Optimal feature tag
    
    featLabels.append(bestFeatLabel)
    myTree = {bestFeatLabel:{}}			# Tag spanning tree based on optimal features
    del(labels[bestFeat])			# Delete feature tags that are already used
    
    Get the attribute values of all the optimal features in the training set
    featValues = [example[bestFeat] for example in dataSet]		
    
    uniqueVals = set(featValues)		# Remove duplicate attribute values
    
    for value in uniqueVals:	# Iterate over features to create a decision tree.
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), labels, featLabels)

    return myTree

# test
if __name__ == '__main__':
    dataSet, labels = createDataSet()
    featLabels = []
    myTree = createTree(dataSet, labels, featLabels)
    print(myTree)
Copy the code

The results are as follows.



So we have a decision tree, so this is too abstract, but how do we classify it? All right, let’s just look at the code.

FeatLabels - Store selected optimal feature tags testVec - List of test data, in order corresponding to the optimal feature tags ClassLabel - Classification result """ 
def classify(inputTree, featLabels, testVec) :
    firstStr = next(iter(inputTree))		Get the decision tree node
    secondDict = inputTree[firstStr]				# Next dictionary
    featIndex = featLabels.index(firstStr)		
    
    for key in secondDict.keys():
        if testVec[featIndex] == key:

            if type(secondDict[key]).__name__ == 'dict':
                classLabel = classify(secondDict[key], featLabels, testVec)
            else: 
                classLabel = secondDict[key]
    return classLabel
Copy the code

The results are as follows.



It can be judged from examples that young people will buy, and the algorithm also indicates that they want to buy computers.

DT_buys_computer_Classifty\DT_buys_computer_Classifty_v1

\ DT_buys_computer_Classifty_v1 0 \ DT_buys_computer_Classifty_v1. 0. Py 】

4.2.1.3 Computer purchase for simple instance of decision tree – invoke the Sklearn library

Previous chapters have used Sklearn to implement algorithms, and this one is no exception. Let’s just look at the code.

# -*- coding: utf-8 -*-
from sklearn import tree
from sklearn.feature_extraction import DictVectorizer
import csv
from sklearn import preprocessing

# test
if __name__ == '__main__':
    
    ## Step 1: load data
    print("Step 1: load data...")

    # Read in the csv file and put features into list of dict and list of class label
    Data = open("C:/TensorFlow/data.csv"."rt")
    
    Read the raw data of the file
    reader = csv.reader(Data)The value returned is a list of each line in the CSV file, and the values read from each line are returned as a list
    
    #3.x uses this syntax, 2.7 uses headers=reader.next()
    headers = next(reader)The file object that reads the line,reader points to the next line
    #headers stores the first line of the CSV element, which is also the key of the later rowDict
    #print("headers :\n" + str(headers))
    
    featureList = []
    labelList = []
    
    for row in reader:
        labelList.append(row[len(row)-1])
        rowDict = {}
        for i in range(1.len(row)-1):
            rowDict[headers[i]] = row[i]
        featureList.append(rowDict)
    
    #print("featureList:\n" + str(featureList))
    #print("labelList:\n" + str(labelList))
    
    ## Step 2: Vetorize data...
    print("Step 2: Vetorize data...")

    # Extract data
    # Vetorize features
    vec = DictVectorizer()Initialize the dictionary feature extractor
    dummyX = vec.fit_transform(featureList).toarray()
    # View the extracted eigenvalues
    Output the transformed eigenmatrix
    #print("dummyX: \n" + str(dummyX))
    # Output the characteristic meaning of each dimension
    #print(vec.get_feature_names())

    # vectorize class labels
    lb = preprocessing.LabelBinarizer()# Binarize the tag matrix
    dummyY = lb.fit_transform(labelList)
    #print("dummyY: \n" + str(dummyY))
    
    ## Step 3: init DT...
    print("Step 3: init DT...")
    # Using decision tree for classification
    # http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClass ifier
    Select gini, entropy, default gini(CART algorithm), entropy (ID3 algorithm)
    clf = tree.DecisionTreeClassifier(criterion='entropy')
    
    ## Step 4: training...
    print("Step 4: training...")
    clf = clf.fit(dummyX, dummyY)

   # Forecast data
    oneRowX = dummyX[0To:]#print("oneRowX: " + str(oneRowX))
    
    newRowX = oneRowX
    newRowX[0] = 1
    newRowX[2] = 0
    print("newRowX: " + str(newRowX))
    
    ## Step 5: testing
    print("Step 5: testing...")
    PredictedY = CLF. Predict ([newRowX]
    predictedLabel = clf.predict(newRowX.reshape(1, -1))Method # 2

    ## Step 6: show the result
    print("Step 4: show the result...")
    #print("predictedLabel" + str(predictedLabel))

    if predictedLabel == 1:
        print("To buy")
    else:
        print("Don't buy")
Copy the code

The resulting run is shown below.



Dt_buys_computer_classi_fty -sklearn_v2.0\ dT_buys_computer_CLASSI_FTY -sklearn_v2.0.0\ DT_buys_computer_Classifty – sklearn_v2. 0.0. Py 】

4.2.1.4 Visualization of decision tree

4.2.1.4.1 Visualization of decision tree

Decision tree visualization requires Matplotlib, which is mainly an API for visual programming. The functions used for visualization in this chapter are as follows: getNumLeafs: Obtains the number of decision tree leaf nodes. Obtain the layers of a decision tree. Log in to the system according to the network topology. Obtain the layers of a decision tree.

# -*- coding: utf-8 -*-
import pandas as pd
from math import log
import operator
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt

Parameters: dataSet Returns: shannonEnt - Empirical entropy (Shannon entropy) ""
def calcShannonEnt(dataSet) :
    Return the number of rows in the dataset
    numEntires = len(dataSet)                        
    
    Keep a dictionary of occurrences for each Label
    labelCounts = {}                                
    
    # Perform statistics on each group of feature vectors
    for featVec in dataSet:                            
        Extract the Label information
        currentLabel = featVec[-1]                    
        
        If the Label is not in the count dictionary, add it
        if currentLabel not in labelCounts.keys():    
            
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1   # Label counting
        
    shannonEnt = 0.0   # Empirical entropy (Shannon entropy)
    
    # Calculate Shannon entropy
    for key in labelCounts:                            
        
        The probability of selecting the Label
        prob = float(labelCounts[key]) / numEntires    
        
        # Use the formula to calculate
        shannonEnt -= prob * log(prob, 2)            
        
    # return empirical entropy (Shannon entropy)
    return shannonEnt                                						

Parameters: dataSet - dataSet to be partited axis - dataSet to be partited value - dataSet to be returned Returns: none ""
def splitDataSet(dataSet, axis, value) :		
    Create a list of returned datasets
    retDataSet = []										
    Walk through the data set
    for featVec in dataSet: 							
        if featVec[axis] == value:
            # Remove axis features
            reducedFeatVec = featVec[:axis]
            Add those that meet the criteria to the returned dataset
            reducedFeatVec.extend(featVec[axis+1:]) 	
            
            retDataSet.append(reducedFeatVec)
	
    Return the partitioned data set
    return retDataSet		  							

Return: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns
def calcConditionalEntropy(dataSet, i, uniqueVals) :
    
    Empirical conditional entropy
    newEntropy = 0.0
    for value in uniqueVals:
        subDataSet = splitDataSet(dataSet, i, value)
        prob = len(subDataSet) / float(len(dataSet))  # Maximum likelihood estimation probability
        newEntropy += prob * calcShannonEnt(subDataSet)  # Calculation of conditional entropy
    return newEntropy

Returns: bestIndex: best feature index bestInfoGain: Best information gain ""
def calcInformationGain(dataSet) :

    # Optimal feature index value
    bestIndex = -1
    # Information gain
    bestInfoGain = 0.0  	
    
    baseEntropy = calcShannonEnt(dataSet)
    
    # Number of features
    numFeatures = len(dataSet[0]) - 1	
    # Walk through all features
    for i in range(numFeatures): 						
        Obtain the ith all feature of the dataSet
        featList = [example[i] for example in dataSet]
        # create set {}, elements cannot be repeated
        uniqueVals = set(featList)     					
		
        Empirical conditional entropy
        newEntropy = 0.0
        # Calculate conditional entropy
        newEntropy = calcConditionalEntropy(dataSet, i, uniqueVals)
        # gain
        infoGain = baseEntropy - newEntropy  # Information gain is the reduction of yes entropy, which is also the reduction of YES uncertainty
        
        # Optimal gain selection
        if (infoGain > bestInfoGain): 	
            Update the information gain to find the maximum information gain
            bestInfoGain = infoGain 		
			
            Record the index value of the feature with the maximum information gain
            bestIndex = i 
    
    return bestIndex, bestInfoGain

Returns: sortedClassCount[0][0] Returns: sortedClassCount[0]
def majorityCnt(classList) :
    classCount = {}
    
    for vote in classList:		
        Count the number of occurrences of each element in the classList
        if vote not in classCount.keys():
            classCount[vote] = 0	
            classCount[vote] += 1
    sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)		Sort by descending dictionary values
    
    # return the most common element in classList
    return sortedClassCount[0] [0]								

Parameters: dataSet training datasets labels-classification attribute tags Featworths-Optimal feature tags for storing selected information. Returns: myTree featworths-decision tree
def createTree(dataSet, labels, featLabels) :
    
    # Take the category tag (whether to lend :yes or no)
    classList = [example[-1] for example in dataSet]			
    
    # Stop categorizing if the categories are exactly the same
    if classList.count(classList[0= =])len(classList):			
        return classList[0]
     
    # return the class tag with the most occurrences when all features are iterated
    if len(dataSet[0= =])1:									
        return majorityCnt(classList)
    
    bestFeat, bestInfoGain= calcInformationGain(dataSet)	# Select the optimal feature
    bestFeatLabel = labels[bestFeat]# Optimal feature tag
    
    featLabels.append(bestFeatLabel)
    myTree = {bestFeatLabel:{}}			# Tag spanning tree based on optimal features
    del(labels[bestFeat])			# Delete feature tags that are already used
    
    Get the attribute values of all the optimal features in the training set
    featValues = [example[bestFeat] for example in dataSet]		
    
    uniqueVals = set(featValues)		# Remove duplicate attribute values
    
    for value in uniqueVals:	# Iterate over features to create a decision tree.
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), labels, featLabels)

    return myTree

FeatLabels - Store selected optimal feature tags testVec - List of test data, in order corresponding to the optimal feature tags ClassLabel - Classification result """ 
def classify(inputTree, featLabels, testVec) :
    firstStr = next(iter(inputTree))		Get the decision tree node
    secondDict = inputTree[firstStr]				# Next dictionary
    featIndex = featLabels.index(firstStr)		
    
    for key in secondDict.keys():
        if testVec[featIndex] == key:

            if type(secondDict[key]).__name__ == 'dict':
                classLabel = classify(secondDict[key], featLabels, testVec)
            else: 
                classLabel = secondDict[key]
    return classLabel

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # visualization # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Returns: numLeafs specifies the number of leaf nodes in a decision tree.
def getNumLeafs(myTree) :
    Initialize the leaf
    numLeafs = 0                                               
        
    #python3 mytree.keys () returns dict_keys,
    Mytree.keys ()[0];
    List (mytree.keys ())[0]
    firstStr = next(iter(myTree))                                

    Get the next set of dictionaries
    secondDict = myTree[firstStr]                                
    
    for key in secondDict.keys():
        # test whether this node is a dictionary. If not, this node is a leaf node
        if type(secondDict[key]).__name__=='dict':               
           
            numLeafs += getNumLeafs(secondDict[key])
        else:   numLeafs +=1
    return numLeafs

Parameters: myTree - decision tree Returns: maxDepth - decision tree layer ""
def getTreeDepth(myTree) :
    Initialize the decision tree depth
    maxDepth = 0                                                
    
    #python3 mytree.keys () returns dict_keys,
    Mytree.keys ()[0];
    List (mytree.keys ())[0]
    firstStr = next(iter(myTree))                                
    
    Get the next dictionary
    secondDict = myTree[firstStr]                                
    
    for key in secondDict.keys():
        # test whether this node is a dictionary. If not, this node is a leaf node
        if type(secondDict[key]).__name__=='dict':                
            
            thisDepth = 1 + getTreeDepth(secondDict[key])
        else:   
            thisDepth = 1
        if thisDepth > maxDepth: 
            maxDepth = thisDepth   # Update the number of layers
    return maxDepth

Parameters: nodeTxt - node name centerPt - text position parentPt - marked arrow position nodeType - node format Returns: none ""
def plotNode(nodeTxt, centerPt, parentPt, nodeType) :
    arrow_args = dict(arrowstyle="< -")     # define arrow format
    font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)  # Set Chinese font
    createPlot.ax1.annotate(nodeTxt, xy=parentPt,  xycoords='axes fraction'.# draw node
        xytext=centerPt, textcoords='axes fraction',
        va="center", ha="center", bbox=nodeType, arrowprops=arrow_args, FontProperties=font)

Parameters: cntrPt, parentPt - used to calculate the position of the annotation txtString - The content of the annotation Returns: none
def plotMidText(cntrPt, parentPt, txtString) :
    xMid = (parentPt[0]-cntrPt[0) /2.0 + cntrPt[0]     # calculate the position of the annotation
    yMid = (parentPt[1]-cntrPt[1) /2.0 + cntrPt[1]
    createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)

Parameters: myTree - decision tree (dictionary) parentPt - nodeTxt - node name Returns: none
def plotTree(myTree, parentPt, nodeTxt) :
    Set the node format
    decisionNode = dict(boxstyle="sawtooth", fc="0.8")     
    
    Set the leaf format
    leafNode = dict(boxstyle="round4", fc="0.8")     
   
    Get the number of decision leaf nodes, which determines the width of the tree
    numLeafs = getNumLeafs(myTree)               
    
    depth = getTreeDepth(myTree)   Get the number of decision tree layers
    firstStr = next(iter(myTree))                 # Next dictionary
    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)    # Central location
    plotMidText(cntrPt, parentPt, nodeTxt)         # Label the directed edge attribute value
    plotNode(firstStr, cntrPt, parentPt, decisionNode)     # draw node
    secondDict = myTree[firstStr]             # next dictionary, that is, continue to draw children
    plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD               # y offset
    
    for key in secondDict.keys():                               
        if type(secondDict[key]).__name__=='dict':       # test whether this node is a dictionary. If not, this node is a leaf node
            plotTree(secondDict[key],cntrPt,str(key))            Not a leaf, recursive calls continue to draw
        else:                                           If it is a leaf node, draw the leaf node and mark the directed edge attribute value
            plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW
            plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
            plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))
    plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD

Parameters: inTree - decision tree (dictionary) Returns: none ""
def createPlot(inTree) :
    fig = plt.figure(1, facecolor='white')   # to create FIG
    fig.clf()        # to empty the FIG
    axprops = dict(xticks=[], yticks=[])
    createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)    Drop the x and y axes
   
    Get the number of decision leaf nodes
    plotTree.totalW = float(getNumLeafs(inTree))  
    
    Get the number of decision tree layers
    plotTree.totalD = float(getTreeDepth(inTree))             
    
    plotTree.xOff = -0.5/plotTree.totalW; plotTree.yOff = 1.0;       # x offset
    plotTree(inTree, (0.5.1.0), ' ')                                 Draw the decision tree
    plt.show()

# test
if __name__ == '__main__':
    ## Step 1: load data
    print("Step 1: load data...")

    df=pd.read_csv('data.csv')
    data=df.values[:-1.1:].tolist()
    
    dataSet=data[:]
    label=df.columns.values[1: -1].tolist()
    labels=label[:]
    
    #print(dataSet)
    #print(labels)
    ## Step 2: training...
    print("Step 2: training...")

    featLabels = []
    myTree = createTree(dataSet, labels, featLabels)
    #print(myTree)

    ## Step 3: show pic
    print("Step 3: show the picture...")
    createPlot(myTree)
    
    ## Step 4: testing
    print("Step 4: testing...")
    # Test data
    testVec = ['middle_aged'.'yes'.'excellent'.'low']
    
    print("Test Example:"+ str(testVec))
    result = classify(myTree, featLabels, testVec)
    
    ## Step 5: show the result
    print("Step 5: show the result...")
    print("result:"+ str(result))
    if result == 'yes':
        print("To buy")
    else:
        print("Don't buy")
Copy the code

The results are shown below.



You can see that the decision tree has been drawn, and it’s clear that middle-aged people are buying computers.

DT_buys_computer_Classifty\DT_buys_computer_Classifty_v1 for the complete code

2 \ \ DT_buys_computer_Classifty_v1 DT_buys_computer_Classifty_v1. 2. Py.

Use Matplotlib to draw a decision tree. The answer is yes. Graphviz provides a way to visualize things, so here’s how to use Graphviz.

4.2.1.4.2 Visualization of decision tree only using Graphviz

First, you need to install Graphviz. The author used the Anaconda integration environment. Just type in the terminal window:

conda install python-graphviz
Copy the code
You need to add the following code to your code.# Visualize model
    with open("Infor_Gain.dot".'w') as f:
        f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f)
Copy the code

DT_buys_computer_Classifty-sklearn_v2.0\DT_buys_computer_Classifty-sklearn_v2.0.1\ DT_buys_computer_Classifty – sklearn_v2. 0.1. Py 】

After the file is successfully run, infor_gain. dot is generated in the directory. The dot file needs to be converted to the PDF visual decision tree.

dot -Tpdf Infor_Gain.dot -o output.pdf
Copy the code

The results are as follows.

Figure 2

I think it is still very troublesome, there are separate transformation, is there a simpler, the answer is some. Here’s the code.

# -*- coding: utf-8 -*-
import pandas as pd
import graphviz
from sklearn import tree

def createDataSet() :
    # data set
    dataSet = [[0.0.0.2],
               [0.0.1.2],
               [1.0.0.2],
               [2.0.0.1],
               [2.1.0.0], 
               [2.1.1.0],
               [1.1.1.0], 
               [0.0.0.1],
               [0.1.0.0],
               [2.1.0.1], 
               [0.1.1.1], 
               [1.0.1.1], 
               [1.1.0.2],
               [2.0.2.2]]
    # Feature tag
    labels = [0.0.1.1.1.0.1.0.1.1.1.1.1.0]

    # return data set and category attributes
    return dataSet, labels 

# test
if __name__ == '__main__':
    ## Step 1: load data
    print("Step 1: load data...")
    Style #
    df=pd.read_csv('data.csv')
    data=df.values[:-1.1:5]
    dataSet=data[:]
    
    labels=df.values[:-1.5:6]
    
    Way # 2
    #dataSet,labels = createDataSet()
    
    #print(dataSet)
    #print(labels)
    
    ## Step 2: init DT...
    print("Step 2: init DT...")

    Select gini, entropy, default gini(CART algorithm), entropy (ID3 algorithm)
    clf = tree.DecisionTreeClassifier(criterion='entropy')
    
    
    ## Step 3: training...
    print("Step 3: training...")
    clf = clf.fit(dataSet, labels)
    
    ## Step 4: picture...
    print("Step 4: picture...")
    """ dot_data = tree.export_graphviz(clf, out_file=None) """
    # Advanced configuration
    dot_data = tree.export_graphviz(clf, out_file=None, 
                            filled=True, rounded=True,  
                            special_characters=True)  
    graph = graphviz.Source(dot_data)  
    graph.render("tree")
    
    ## Step 5: testing
    print("Step 5: testing...")
    test = [1.0.0.2]
    predictedLabel = clf.predict([test])
    
    # Step 6: show the result
    print("Step 6: show the result...")
    print("predictedLabel" + str(predictedLabel))
Copy the code

The results are as follows.



DT_buys_computer_Classifty-sklearn_v2\DT_buys_computer_Classifty-sklearn_v2.1\DT_b Uys_computer_Classifty – sklearn_v2. 1. Py 】

Graphviz website: www.graphviz.org/ reference: blog.csdn.net/WuchangI/ar… Pypi.org/project/pyd…

4.2.2 Classification of Iris flowers in simple examples of decision tree

As in the previous chapters, we still use the same data to classify and see if we can classify.

# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

def test_DT() :
    ## Step 1: load data
    print("Step 1: load data...")
    # import data
    iris = datasets.load_iris()

    ## Step 2: split data
    print("Step 2: split data...")
    # Split data
    # X = features
    X = iris.data
    # Y = label
    Y = iris.target
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=6.)

    ## Step 3: init NB
    print("Step 3: init NB...")
    Initialize the Bayesian classifier
    clf = tree.DecisionTreeClassifier(criterion='entropy')

    ## Step 4: training...
    print("Step 4: training...")
    # Training data
    clf.fit(X_train, Y_train)

    ## Step 5: testing
    print("Step 5: testing...")
    # Forecast data
    predictedLabel =  clf.predict(X_test)
    #predictedLabel = clf.fit(X_train, Y_train).predict(X_test)

    ## Step 6: show the result
    print("Step 6: show the result...")
    # Request accuracy
    # http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
    print(accuracy_score(Y_test, predictedLabel))
    #print("predictedLabel is :")
    #print(predictedLabel)

if __name__ == '__main__':
    test_DT()
Copy the code

The results are as follows.



DT_Iris_Classify\DT_Iris_Classify-sklearn_v1.0

The accuracy can be compared with the previous algorithms. Compare yourself if you are interested. Time complexity can also be compared. You can also draw a decision tree.

# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import tree
import graphviz

def test_DT() :
    ## Step 1: load data
    print("Step 1: load data...")
    # import data
    iris = datasets.load_iris()

    ## Step 2: split data
    print("Step 2: split data...")
    # Split data
    # X = features
    X = iris.data
    # Y = label
    Y = iris.target

    ## Step 3: init NB
    print("Step 3: init NB...")
    Initialize the Bayesian classifier
    clf = tree.DecisionTreeClassifier(criterion='entropy')

    ## Step 4: training...
    print("Step 4: training...")
    # Training data
    clf.fit(X, Y)
    
    ## Step 5: picture..
    print("Step 5: picture...")
    """ dot_data = tree.export_graphviz(clf, out_file=None) """
    # Advanced configuration
    dot_data = tree.export_graphviz(clf, out_file=None, 
                            feature_names=iris.feature_names,  
                            class_names=iris.target_names,  
                            filled=True, rounded=True,  
                            special_characters=True)  
    graph = graphviz.Source(dot_data)  
    graph.render("tree")

if __name__ == '__main__':
    test_DT()
Copy the code

The results are as follows.

DT_Iris_Classify\DT_Iris_Classify-sklearn_v2.0

4.3 Decision tree Data Storage

Constructing a decision tree can be a time-consuming task, taking several seconds to process even a small data set, such as the sample data shown above, or consuming a lot of computational time if the data set is large. However, solving the classification problem with a well-created decision tree can be done quickly. Therefore, to save computing time, it is best to invoke the already constructed decision tree each time a classification is performed. To solve this problem, you need to pickle serialized objects using the Python module. Serialized objects can hold objects on disk and be read when needed.

# -*- coding: utf-8 -*-
import pandas as pd
from math import log
import operator
import pickle

Parameters: No Returns: dataSet - dataSet allelages-feature tag ""
def createDataSet() :
    # data set
    dataSet = [[0.0.0.2.'no'], 
               [0.0.1.2.'no'], 
               [1.0.0.2.'yes'], 
               [2.0.0.1.'yes'],
               [2.1.0.0.'yes'], 
               [2.1.1.0.'no'], 
               [1.1.1.0.'yes'], 
               [0.0.0.1.'no'],
               [0.1.0.0.'yes'],
               [2.1.0.1.'yes'], 
               [0.1.1.1.'yes'], 
               [1.0.1.1.'yes'], 
               [1.1.0.2.'yes'],
               [2.0.2.2.'no']]
    # Feature tag
    labels = ['age'.'student'.'credit_rating'.'income']

    # return data set and category attributes
    return dataSet, labels 	

Parameters: dataSet Returns: shannonEnt - Empirical entropy (Shannon entropy) ""
def calcShannonEnt(dataSet) :
    Return the number of rows in the dataset
    numEntires = len(dataSet)                        
    
    Keep a dictionary of occurrences for each Label
    labelCounts = {}                                
    
    # Perform statistics on each group of feature vectors
    for featVec in dataSet:                            
        Extract the Label information
        currentLabel = featVec[-1]                    
        
        If the Label is not in the count dictionary, add it
        if currentLabel not in labelCounts.keys():    
            
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1   # Label counting
        
    shannonEnt = 0.0   # Empirical entropy (Shannon entropy)
    
    # Calculate Shannon entropy
    for key in labelCounts:                            
        
        The probability of selecting the Label
        prob = float(labelCounts[key]) / numEntires    
        
        # Use the formula to calculate
        shannonEnt -= prob * log(prob, 2)            
        
    # return empirical entropy (Shannon entropy)
    return shannonEnt                                						

Parameters: dataSet - dataSet to be partited axis - dataSet to be partited value - dataSet to be returned Returns: none ""
def splitDataSet(dataSet, axis, value) :		
    Create a list of returned datasets
    retDataSet = []										
    Walk through the data set
    for featVec in dataSet: 							
        if featVec[axis] == value:
            # Remove axis features
            reducedFeatVec = featVec[:axis]
            Add those that meet the criteria to the returned dataset
            reducedFeatVec.extend(featVec[axis+1:]) 	
            
            retDataSet.append(reducedFeatVec)
	
    Return the partitioned data set
    return retDataSet		  							

Return: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns: newEntropy () Returns
def calcConditionalEntropy(dataSet, i, uniqueVals) :
    
    Empirical conditional entropy
    newEntropy = 0.0
    for value in uniqueVals:
        subDataSet = splitDataSet(dataSet, i, value)
        prob = len(subDataSet) / float(len(dataSet))  # Maximum likelihood estimation probability
        newEntropy += prob * calcShannonEnt(subDataSet)  # Calculation of conditional entropy
    return newEntropy

Returns: bestIndex: best feature index bestInfoGain: Best information gain ""
def calcInformationGain(dataSet) :

    # Optimal feature index value
    bestIndex = -1
    # Information gain
    bestInfoGain = 0.0  	
    
    baseEntropy = calcShannonEnt(dataSet)
    
    # Number of features
    numFeatures = len(dataSet[0]) - 1	
    # Walk through all features
    for i in range(numFeatures): 						
        Obtain the ith all feature of the dataSet
        featList = [example[i] for example in dataSet]
        # create set {}, elements cannot be repeated
        uniqueVals = set(featList)     					
		
        Empirical conditional entropy
        newEntropy = 0.0
        # Calculate conditional entropy
        newEntropy = calcConditionalEntropy(dataSet, i, uniqueVals)
        # gain
        infoGain = baseEntropy - newEntropy  # Information gain is the reduction of yes entropy, which is also the reduction of YES uncertainty
        
        # Optimal gain selection
        if (infoGain > bestInfoGain): 	
            Update the information gain to find the maximum information gain
            bestInfoGain = infoGain 		
			
            Record the index value of the feature with the maximum information gain
            bestIndex = i 
    
    return bestIndex, bestInfoGain

Returns: sortedClassCount[0][0] Returns: sortedClassCount[0]
def majorityCnt(classList) :
    classCount = {}
    
    for vote in classList:		
        Count the number of occurrences of each element in the classList
        if vote not in classCount.keys():
            classCount[vote] = 0	
            classCount[vote] += 1
    sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)		Sort by descending dictionary values
    
    # return the most common element in classList
    return sortedClassCount[0] [0]								

Parameters: dataSet training datasets labels-classification attribute tags Featworths-Optimal feature tags for storing selected information. Returns: myTree featworths-decision tree
def createTree(dataSet, labels, featLabels) :
    
    # Take the category tag (whether to lend :yes or no)
    classList = [example[-1] for example in dataSet]			
    
    # Stop categorizing if the categories are exactly the same
    if classList.count(classList[0= =])len(classList):			
        return classList[0]
     
    # return the class tag with the most occurrences when all features are iterated
    if len(dataSet[0= =])1:									
        return majorityCnt(classList)
    
    bestFeat, bestInfoGain= calcInformationGain(dataSet)	# Select the optimal feature
    bestFeatLabel = labels[bestFeat]# Optimal feature tag
    
    featLabels.append(bestFeatLabel)
    myTree = {bestFeatLabel:{}}			# Tag spanning tree based on optimal features
    del(labels[bestFeat])			# Delete feature tags that are already used
    
    Get the attribute values of all the optimal features in the training set
    featValues = [example[bestFeat] for example in dataSet]		
    
    uniqueVals = set(featValues)		# Remove duplicate attribute values
    
    for value in uniqueVals:	# Iterate over features to create a decision tree.
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), labels, featLabels)

    return myTree

Parameters: inputTree - a generated decision tree filename - a storage decision tree filename Returns: none ""
def storeTree(inputTree, filename) :
    with open(filename, 'wb') as fw:
        pickle.dump(inputTree, fw)
        
        
# test
if __name__ == '__main__':
    ## Step 1: load data
    print("Step 1: load data...")

    Style #
    #dataSet, labels = createDataSet()
        
    Way # 2
    df=pd.read_csv('data.csv')
    data=df.values[:-1.1:].tolist()
    
    dataSet=data[:]
    label=df.columns.values[1: -1].tolist()
    labels=label[:]
    
    #print(dataSet)
    #print(labels)
    ## Step 2: training...
    print("Step 2: training...")

    featLabels = []
    myTree = createTree(dataSet, labels, featLabels)
    
    #print(myTree)
    
    ## Step 3: Storage tree ...
    print("Step 3: Storage tree...")
    storeTree(myTree, 'classifierStorage.txt')
Copy the code

Running the code in the same directory as the Python file produces a TXT file called ClassifierStorage. TXT, which binary stores our decision tree. We can open it up and look at the stored results.



[Note] Open classifierstorage. TXT as required, I use Sublime Text.

Because this is a binary storage file, we don’t need to understand the contents of the inside, can store, can use. So how do you use it? Load it using pickle.load, and write the following code.

# -*- coding: utf-8 -*-
import pickle

Returns: pickle.load(fr) - decision tree dictionary ""
def grabTree(filename) :
    fr = open(filename, 'rb')
    return pickle.load(fr)
# test
if __name__ == '__main__':
    myTree = grabTree('classifierStorage.txt')
    print(myTree)  
Copy the code

The results are as follows.

4.4 Shortage of ID3 algorithm of decision tree

Although the ID3 algorithm put forward new ideas, but there are still many places worth improving. A) ID3 does not consider continuous features, such as length and density are continuous values, which cannot be applied in ID3. This greatly limits the use of ID3.

B) ID3 uses features with large information gain to establish nodes of decision tree preferentially. It was soon discovered that, under the same conditions, features with more values gain more information than features with fewer values. For example, a variable has 2 values, each of which is 1/2, and another variable has 3 values, each of which is 1/3. In fact, they are completely uncertain variables, but the information gain of taking 3 values is greater than that of taking 2 values. How about correcting for this problem?

C) ID3 algorithm does not consider the case of missing value;

D) The problem of fitting has not been considered.

ID3 algorithm author Quinlan based on the above deficiencies, the ID3 algorithm has been improved, this is the C4.5 algorithm, perhaps you will ask, why not called ID4, ID5 and so on the name? That is because the decision tree is too popular, his ID3 out, others two innovation, soon accounted for ID4, ID5, so he found a new way, named C4.0 algorithm, later the evolutionary version of C4.5 algorithm. We’ll talk about the C4.5 algorithm later

Reference Documents: English documents: scikit-learn.org/stable/modu… The Chinese document: sklearn.apachecn.org/cn/0.19.0/m…

English link Chinese link

References: [1] Machine Learning by Zhou Zhihua [2] Statistical Learning Methods by Li Hang [3]L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984. [4] Quinlan, JR. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106. [5]J.R. Quinlan. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. [6]T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning, Springer, 2009.

ValueError: Expected 2D array, got 1D array instead: array=[1. 0. 0. 0. 1. 1. 0. 0. 1. 0.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. This error occurs using a new sciKit learning version, which throws an error because in the new version, everything must be a two-dimensional matrix, even a column or row. You can see the prompt under the error and solve the problem. That is, using arrays to reshape your data. If your data has a separate attribute or array, refactor (-1, 1). Reconstruct (1, -1) if it contains a single sample. I’m forced to give you two ways. Method one:

predictedY = clf.predict([newRowX])
Copy the code

Method 2:

predictedY = clf.predict(newRowX.reshape(1, -1))
Copy the code

Reference: blog.csdn.net/dongyanwen6…

In this chapter, please refer to the attachment and click to enter