Abstract: ROC/AUC is very important as an evaluation indicator of machine learning. It is also a frequently asked question in interviews (80% of them).
This article to share from huawei cloud community “interview 80% dry goods | technology to solve the problem, based on MindSpore realize AUC/ROC”, the original author: Li Jiaqi.
ROC/AUC is very important as an evaluation of machine learning and is often asked in interviews (80% of the time). In fact, it is not very difficult to understand it, but a lot of friends have encountered a same problem, that is: every time you read a book is very clear, but turn around and forget, often easy to confuse the concept. There are friends before the interview back down, but a nervous brain blank forgot all, resulting in a very poor answer.
I have also encountered similar problems in the interview process before. My interview experience is that when general pen tests meet multiple choice questions, they will test this rate or that rate, or give you a scene to choose which one. I was asked many times during the interview, for example, what is AUC/ROC? What do the horizontal and vertical axes represent? What are the advantages? Why use it?
I remember in my first answer, I confused the concepts of accuracy, accuracy, recall, etc., and ended up in a mess. After going back, I combed through all the relevant concepts from beginning to end, and basically answered the following interview very well. Now I want to share some of my understandings with you, and hopefully remember the ROC/AUC concept once and for all.
The full name of ROC is Receiver Operating Characteristic, and its main analysis tool is a curve drawn on a two-dimensional plane — ROC curve. The horizontal coordinate of the plane is false positive rate(FPR), and the vertical coordinate is true positive rate(TPR). For a classifier, we can obtain a TPR and FPR point pair according to its performance on the test sample. In this way, the classifier can be mapped to a point on the ROC plane. By adjusting the threshold value used in classification of this classifier, we can obtain a curve that passes through (0, 0) and (1, 1), which is the ROC curve of this classifier. In general, this curve should be above the lines (0, 0) and (1, 1). Because the ROC curve formed by the lines (0, 0) and (1, 1) actually represents a random classifier. If, unfortunately, you end up with a classifier below this line, an intuitive remedy is to reverse all predictions, i.e., if the classifier output is positive, the final classification will be negative, and if the classifier output is negative, the final classification will be positive. However, using ROC curves to represent the performance of the classifier is intuitive and useful.
However, there is always a desire for a number to indicate whether a classifier is good or bad. So Area Under roc Curve(AUC) appears. As the name implies, the value of AUC is the size of the area below the ROC curve. Typically, AUC values range from 0.5 to 1.0, with a larger AUC representing better performance. Area Under ROC Curve (AUC) is a standard used to measure the quality of classification model.
ROC sample curve (binary classification problem) :
To interpret some conceptual definitions of ROC graphs:
- True (TP) Positive samples predicted to be Positive by the model;
- False Negative (FN) is the positive sample predicted to be Negative by the model;
- False Positive (FP) is a negative sample predicted to be Positive by the model;
- True Negative (TN) is the Negative sample predicted by the model to be Negative.
Sensitivity, specificity, true rate, false positive rate
Before formally introducing ROC/AUC, we need to introduce two indicators, and the choice of these two indicators is the reason why ROC and AUC can ignore sample imbalance. These are: sensitivity and (1-specificity), also called true rate (TPR) and false positive rate (FPR).
Sensitivity = TP/(TP+FN)
Specificity = TN/(FP+TN)
In fact, we can see that the sensitivity and recall rate are exactly the same, but the name is changed.
Since we care more about positive samples, we need to see how many negative samples are wrongly predicted to be positive, so use (1-specificity) instead of specificity.
True rate (TPR) = sensitivity = TP/(TP+FN)
False positive rate (FPR) = 1- specificity = FP/(FP+TN)
Below are the diagrams of true rate and false positive rate. We find that TPR and FPR are based on actual performance 1 and 0 respectively, that is to say, they observe relevant probability problems in actual positive samples and negative samples respectively.
And because of that, it doesn’t matter whether the sample is balanced or not. Let’s say 90% of the total sample is positive and 10% is negative. We know accuracy is questionable, but TPR is not the same as FPR. Here, TPR only focuses on how many of the 90% positive samples are truly covered, and has nothing to do with the 10%. Similarly, FPR only focuses on how many of the 10% negative samples are incorrectly covered, and has nothing to do with the 90%, so it can be seen that:
If we start from the perspective of each outcome of actual performance, we can avoid the problem of sample imbalance, which is why TPR and FPR are used as ROC/AUC indicators.
Or we can look at it another way: conditional probability. We assume that X is the predicted value and Y is the true value. Then these indicators can be expressed according to conditional probability:
- Accurate rate = P (Y = 1 | X = 1)
- Recall rate sensitivity = = P (X = 1 | Y = 1)
- Specific degrees = P (X = 0 | Y = 0)
We can see from the above three formulas: if we first take the actual results as the condition (recall rate, specificity), then we only need to consider one sample, and the predicted value as the condition (accuracy), then we need to consider both positive and negative samples. Therefore, indicators based on actual results are not affected by sample imbalance, while those based on predicted results will be.
ROC (Receiver operating Characteristic Curve)
** Receiver Operating Characteristic (ROC) curve, also known as Receiver Operating Characteristic curve. ** This curve was first applied in the field of radar signal detection to distinguish signal from noise. Later, it was used to evaluate the predictive power of the model, and the ROC curve was based on the obfuscation matrix.
The two main indicators in the ROC curve are the true rate and the false positive rate, and the benefits of this choice are explained above. The abscissa is the false positive rate (FPR), and the ordinate is the true rate (TPR). The following is a standard ROC curve.
- Threshold problem of ROC curve
Similar to the previous P-R curve, the ROC curve is plotted by traversing all thresholds. If we continuously traverse all the thresholds, the positive and negative samples of the prediction are constantly changing and will correspondingly slide along the ROC curve.
- How to judge the good or bad ROC curve?
Changing the threshold simply changes the predicted number of positive and negative samples, namely TPR and FPR, but the curve itself does not change. So how do you tell if a model’s ROC curve is good? Again, this comes back to our purpose: FPR represents the degree of response falsely reported by the model, while TPR represents the degree of coverage predicted by the model. The hope, of course, is that as little padding as possible and as much coverage as possible. So to sum up, the higher the TPR and the lower the FPR (i.e. the steeper the ROC curve), the better the performance of the model. Refer to the following dynamic diagram for understanding.
- The ROC curve ignores sample imbalance
We have already explained why the ROC curve can ignore sample imbalance. Let’s show how it works again in the form of a dynamic graph. We found that no matter how the proportion of red and blue samples changed, the ROC curve had no effect.
AUC (Area under the curve)
To calculate points on the ROC curve, we can evaluate the logistic regression model multiple times using different classification thresholds, but this is very inefficient. Fortunately, there is an efficient sorting algorithm called Area Under Curve that can provide us with such information.
Interestingly, if we connect the diagonal, its area is exactly 0.5. The actual meaning of the diagonal line is: to judge the response and non-response randomly, the positive and negative sample coverage rate should be 50%, indicating random effect. The steeper the ROC curve, the better, so the ideal value is 1, a square, and the worst random judgment has 0.5, so the general AUC value is between 0.5 and 1.
- General criteria for AUC
0.5-0.7: low, but good for predicting stocks 0.7-0.85: average 0.85 to 0.95: very good 0.95-1: very good, but generally unlikely
- The physical significance of AUC
The area under the curve comprehensively measures the effect of all possible classification thresholds. One way to read the area under the curve is as the probability that the model places some random positive category sample above some random negative category sample. Take the following sample, with logistic regression predictions in ascending order from left to right:
Ok, that’s the principle, go to the code for the MindSpore framework.
MindSpore Code Implementation (ROC)
"""ROC""" import numpy as np from mindspore._checkparam import Validator as validator from .metric import Metric class ROC(Metric): def __init__(self, class_num=None, pos_label=None): Super ().__init__() # The class number is an integer self.class_num = class_num if class_num is None else Validator.check_value_type ("class_num", class_num, [int]) # Determines the integer of the positive class, which is converted to 1 for binary problems. For multi-class problems, this parameter should not be set because it iterates in the range [0, num_classes-1]. self.pos_label = pos_label if pos_label is None else validator.check_value_type("pos_label", pos_label, [int]) self.clear() def clear(self): Self.y_pred = 0 self.y = 0 self.sample_weights = None self._is_update = False def _precision_recall_curve_update(self, y_pred, y, class_num, pos_label): Shape == len(y_pred.shape) and len(y_pred.shape) == len(y.shape) + 1): raise ValueError("y_pred and y must have the same number of dimensions, Or one additional dimension for" "y_pred.") # If len(y_pred.shape) == len(y.shape): if class_num is not None and class_num ! = 1: raise ValueError('y_pred and y should have the same shape, but number of classes is different from 1.') class_num = 1 if pos_label is None: Pos_label = 1 y_pred = y_pred.flatten() y = y.flatten() # elif (y_pred.shape) == len(y.shape) + 1: if pos_label is not None: raise ValueError('Argument `pos_label` should be `None` when running multiclass precision recall ' 'curve, but got {}.'.format(pos_label)) if class_num ! = y_pred.shape[1]: raise ValueError('Argument `class_num` was set to {}, but detected {} number of classes from ' 'predictions.'.format(class_num, y_pred.shape[1])) y_pred = y_pred.transpose(0, 1).reshape(class_num, -1).transpose(0, 1) y = y.flatten() return y_pred, Y, class_num, pos_label def update(self, *inputs): "" Check the inputs in the inputs: = 2: raise ValueError('ROC need 2 inputs (y_pred, y), Format (inputs[0])) # numpy y_pred = self._convert_data(inputs[0]) y = self._convert_data(inputs[1]) # Y_pred, y, class_num, pos_label = self._precision_recall_curve_update(y_pred, y, self.class_num, self.pos_label) self.y_pred = y_pred self.y = y self.class_num = class_num self.pos_label = pos_label self._is_update = True def _roc_(self, y_pred, y, class_num, pos_label, sample_weights=None): if class_num == 1: fps, tps, thresholds = self._binary_clf_curve(y_pred, y, sample_weights=sample_weights, pos_label=pos_label) tps = np.squeeze(np.hstack([np.zeros(1, dtype=tps.dtype), tps])) fps = np.squeeze(np.hstack([np.zeros(1, dtype=fps.dtype), fps])) thresholds = np.hstack([thresholds[0][None] + 1, thresholds]) if fps[-1] <= 0: raise ValueError("No negative samples in y, false positive value should be meaningless.") fpr = fps / fps[-1] if tps[-1] <= 0: raise ValueError("No positive samples in y, Chinglish: true positive value should be chinglish.") TPR = TPS/TPS [-1] return FPR, TPR, thresholds # thresholds = [], [], [] for c in range(class_num): preds_c = y_pred[:, c] res = self.roc(preds_c, y, class_num=1, pos_label=c, sample_weights=sample_weights) fpr.append(res[0]) tpr.append(res[1]) thresholds.append(res[2]) return fpr, tpr, thresholds def roc(self, y_pred, y, class_num=None, pos_label=None, sample_weights=None): """roc""" y_pred, y, class_num, pos_label = self._precision_recall_curve_update(y_pred, y, class_num, Pos_label) return self._roc_(y_pred, y, class_num, pos_label, sample_weights) def (self): """ Calculates the ROC curve. Return is a tuple composed of 'FPR', 'TPR' and 'thresholds'. """ if self._is_update is False: raise RuntimeError('Call the update method before calling .') y_pred = np.squeeze(np.vstack(self.y_pred)) y = np.squeeze(np.vstack(self.y)) return self._roc_(y_pred, y, self.class_num, self.pos_label)Copy the code
The usage method is as follows:
-
An example of dichotomies
import numpy as np from mindspore import Tensor from mindspore.nn.metrics import ROC binary classification example x = Tensor(np.array([3, 1, 4, 2])) y = Tensor(np.array([0, 1, 2, 3])) metric = ROC(pos_label=2) metric.clear() metric.update(x, y) fpr, tpr, thresholds = metric.() print(fpr, tpr, Thresholds) [0, 0., 0.33333333, 0.6666667, 1] is [0., 1, 1, 1, 1], [5, 4, 3, 2, 1)
-
Examples of multiple categories
import numpy as np from mindspore import Tensor from mindspore.nn.metrics import ROC multiclass classification example x = Tensor (np) array ([[0.28, 0.55, 0.15, 0.05], [0.10, 0.20, 0.05, 0.05], [0.20, 0.05, 0.15, 0.05], 0.05, 0.05, 0.05, ])) y = Tensor(Np.array ([0, 1, 2, 3]) metric = ROC(class_num=4) metric. Clear () metric. Thresholds = metric.() Print (FPR, TPR) [array([0., 0., 0.33333333, 0.66666667, 1.]), array([0., 0.3333333333, 0.33333333, 1]), array ([0., 0.33333333, 1]), array ([0., 0. 1.]]] [array ([0., 1, 1, 1, 1]), array ([0., 0. 1., 1]), array ([0, 1, 1]), array ([0., 1, 1])] [array ([1.28, 0.28, 0.2, 0.1, 0.05]), array ([1.55, 0.55, 0.2, 0.05]). Array ([1.15, 0.15, 0.05]), array ([1.75, 0.75, 0.05])]
MindSpore Code Implementation (AUC)
""" "import numpy as np def auc(x, y, reorder=False): """ Use the trapezoidal rule to compute the area under the curve (AUC). This is a general function, given a point on a curve. Calculate the area under the ROC curve. """ # Input x is either the FPR value derived from the ROC curve or an array of false positives numPY. If it's multi-class, this is a list numpy like this, where each group represents a class. Input y is either the TPR value derived from the ROC curve or a true positive NUMPY array. If it's multi-class, this is a list numpy like this, where each group represents a class. if not isinstance(x, np.ndarray) or not isinstance(y, np.ndarray): Raise TypeError('The inputs must be Np.ndarray, but got {}, {}'. Format (type(x), type(y))) # check whether The inputs must be The same. Check that all objects in the array have the same shape or length. _check_consistent_length(x, y) # Expand column or 1D NUMpy array. X = _column_or_1D (x) y = _column_or_1D (y) if x.shape[0] < 2: raise ValueError('At least 2 points are needed to compute the AUC, but x.shape = {}.'.format(x.shape)) direction = 1 if reorder: order = np.lexsort((y, x)) x, y = x[order], y[order] else: dx = np.diff(x) if np.any(dx < 0): if np.all(dx 1: raise ValueError("Found input variables with inconsistent numbers of samples: {}." .format([int(length) for length in lengths]))Copy the code
The usage method is as follows:
-
The FPR and TPR values of ROC were used to calculate auC
import numpy as np from mindspore.nn.metrics import auc x = Tensor(np.array([[3, 0, 1], [1, 3, 0], [1, 0, 2]])) y = Tensor(np.array([[0, 2, 1], [1, 2, 1], [0, 0, 1]])) metric = ROC(pos_label=1) metric.clear() metric.update(x, Eval () = auC (FPR, TPR) print(output) 0.45
Click to follow, the first time to learn about Huawei cloud fresh technology ~