Evaluation of classification effects by machine learning

For regression problems, there are usually MSE, MAE, RMSE and R^2 methods to evaluate the effect of the model. For classification problems, the simplest approach is to use accuracy to evaluate the effectiveness of the model. For example, the default score for classification problems in SkLearn is based on the accuracy rate.

It’s easy to use accuracy to assess understanding, but predictions for extremely skewed data can be problematic. For example, for cancer prediction, the ratio of healthy to sick might be 10,000 to one. For such extremely skewed data, we can make the simplest model and directly predict that all samples belong to the healthy category, so that the accuracy of the model can reach 99.99%.

For this type of data, the score of the classification algorithm model can be evaluated by the obfuscation matrix.

Confusion matrix

In order to facilitate the explanation of confusion matrix and terms such as accuracy and recall rate, the dichotomy problem is first analyzed as an example.

True/forecast	0	1
0	TN	FP
1	FN	TP

In the table above, the rows represent the actual values and the columns represent the predicted values.
0 means negative, 1 means postive.
TN (True Negative) Indicates that the actual value is Negative and the predicted value is Negative, indicating that the predicted value is correct.
FP (False Positive) Indicates that the actual value is negative, the predicted value is Positive, and the predicted value is wrong.
FN (False Negative) indicates that the actual value is positive and the predicted value is Negative, indicating that the predicted value is incorrect.
TP (True Positive) Indicates that the actual value is Positive and the predicted value is Positive. The predicted value is wrong.

This is a little abstract, but let’s do a concrete example.

True/forecast	0	1
0	9980	10
1	3	7

9,980 people did not have cancer themselves, and the algorithm also predicted that they did not have cancer.
Ten people didn’t have cancer, but the algorithm predicted they did.
Three people had cancer, but the algorithm predicted they didn’t have cancer.
Seven people had cancer, and the algorithm predicted that they had cancer.

Accurate rate

Accuracy is defined as the probability that the prediction is correct 7 times and wrong 10 times (17 times in total) in the outcome of the event of interest.

Accuracy = TP/(TP + FP) = 7 / (10+7), which means that out of the 17 predictions made, 7 were correct on average.

The recall rate

Recall was defined as the probability of predicting 7 of the type of concern (i.e., 10 patients).

Recall rate = TP/(TP + FN) = 7 / (7 + 3) = 70%, that is to say, when there are 100 patients, the algorithm can successfully find 70 patients on average and miss 30 patients.

F1-Score

For some scenarios, the accuracy rate is more appropriate, such as the stock prediction scenario, in order to predict whether the stock will rise or fall, the business needs are more accurate to find the stock that can rise. For the disease prediction scenario, to predict whether the patient is ill or not, the business requirement at this time is to find out all the sick patients and not to miss any patients. It can be said that it may not matter much to diagnose healthy patients as patients, as long as the patient is not diagnosed as healthy.

But what about situations where you need to combine accuracy and recall? This can be solved by using F1-score, where F1 is the harmonic average of accuracy and recall:

The instance

In order to demonstrate the three concepts mentioned above, we first build an extremely skewed data. We choose SkLearn to provide a handwriting recognition data set. In this data set, the ten digits from 0 to 9 are evenly distributed. The other category does not equal 9 to create a skew in the data.

import numpy as np
from sklearn import datasets

digits = datasets.load_digits()
X = digits.data
y = digits.target.copy()

y[digits.target==9] = 1
y[digits.target!=9] = 0
Copy the code

Use logistic regression to make predictions:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=Awesome!)

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

log_reg.score(X_test, y_test)
Copy the code

Because of the extreme skew of the data, even if all sample types were 0, the accuracy would be about 90 percent. Accuracy can only show the accuracy of prediction of each sample by the model, but it can’t really accurately find the sample of type 1, that is to say, accuracy can’t reflect whether the model can accurately find the sample of type 1. The SkLearn Metrics package provides direct support for obfuscation matrices, accuracy, and recall.

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_log_predict)

from sklearn.metrics import precision_score

precision_score(y_test, y_log_predict)

from sklearn.metrics import recall_score

recall_score(y_test, y_log_predict)

from sklearn.metrics import f1_score

f1_score(y_test, y_log_predict)
Copy the code

PR curve

For dichotomous problems, we can adjust the classification boundary value to adjust the proportion of accuracy and recall rate. When score > threshold, the classification is 1; when score < threshold, the classification is 0. The threshold increases, the accuracy increases, and the recall rate decreases. The threshold decreases, the accuracy decreases, and the recall rate increases. Accuracy rate and recall rate are two contradictory variables which cannot be increased at the same time.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

digits = datasets.load_digits()
X = digits.data
y = digits.target.copy()
y[digits.target==9] = 1
y[digits.target!=9] = 0

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=Awesome!)

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
decision_scores = log_reg.decision_function(X_test)


from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

precisions = []
recalls = []
thresholds = np.arange(np.min(decision_scores), np.max(decision_scores), 0.1)

for threshold in thresholds:
    y_predict = np.array(decision_scores >= threshold, dtype='int')
    precisions.append(precision_score(y_test, y_predict))
    recalls.append(recall_score(y_test, y_predict))
Copy the code

plt.plot(precisions, recalls)
plt.show()
Copy the code

The ROC curve

Receiver Operation Characteristic Curve (ROC) is used to describe the relationship between TPR and FPR, where:

TPR(True Positive Rate) indicates the True Rate. The number of Positive sample results predicted to be Positive/the actual number of Positive samples: TPR = TP /(TP + FN)
TNR(True Negative Rate) indicates True Negative Rate. Number of negative sample results predicted to be negative/actual number of negative samples: TNR = TN /(TN + FP)
False Positive Rate (FPR) : False Positive Rate. Number of negative sample outcomes predicted to be positive/actual number of negative samples: FPR = FP /(TN + FP)
FNR(False Negative Rate) indicates False Negative Rate. Number of positive sample results predicted to be negative/actual number of positive samples: FNR = FN /(TP + FN)

The instance

import numpy as np
from sklearn import datasets

digits = datasets.load_digits()
X = digits.data
y = digits.target.copy()
y[digits.target==9] = 1
y[digits.target!=9] = 0

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=Awesome!)

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
decision_scores = log_reg.decision_function(X_test)

from sklearn.metrics import roc_curve

fprs, tprs, thresholds = roc_curve(y_test, decision_scores)

import matplotlib.pyplot as plt
plt.plot(fprs, tprs)
plt.show()
Copy the code

The area enclosed by ROC curve and graph boundary is a standard to measure the merits of the model. The larger the area is, the better the model will be.

Evaluation of classification effects by machine learning

Confusion matrix

Accurate rate

The recall rate

F1-Score

The instance

PR curve

The ROC curve

The instance

Related Posts

Hu Yichuan: The future of intelligence is my past

Computation and Memory Requirement analysis of classical CNN model for deep learning

Application of deep learning in CTR prediction