Earlier we discussed the advantages of using the ROC curve to describe classifiers. Someone said it describes “strategies for randomly guessing categories.” Let’s go back to the ROC curve. Consider a very simple data set containing 10 observations (not linearly separable)

And here we can check that it is indeed inseparable

Consider logistic regression

reg = glm(y~x1+x2,data=df,family=binomial(link = "logit"))
We can use our own ROC functions






Or R

We can draw both of them here

So, our code works fine here. Let’s think about the diagonal. The first is: Everyone has the same probability (say 50%)

But we only have two points here :(0,0) and (1,1). In fact, this is the case no matter what probability we choose

We can try another strategy, such as “prediction by flipping an unbiased coin”. We get

We can also try “random classifiers” in which we randomly select scores

One step further. Let’s consider another function to plot the ROC curve


Copy the code

But now let’s think about randomly chosen strategies

For (I in 1:500){S=runif(10) V= roctorize (roc.curve)(seq(0,1,length=251) MY[I,]=roc_curve(x)


The red line is the average of all the random classifiers. It’s not a straight line, and we see it oscillating around the diagonal.

reg = glm(PRO~.,data=my,family=binomial(link = "logit")) plot(performance(prediction(S,Y),"tpr","fpr")) Segments (0,0,1,1, col = "light blue")

This is a “random classifier” where we randomly plot scores on the unit interval

Segments (0,0,1,1, col = "light blue")

If we repeat it 500 times, we can get

for(i in 1:500){ S=runif(length(Y)) MY[i,]=roc(x) } lines(c(0,x),c(0,apply(MY,2,mean)),col="red",type="s",lwd=3) Segments (0,0,1,1, col = "light blue")

So, when I randomly plot the score on the unit interval, I get the diagonal result. Given Y, we can plot two empirical cumulative distribution functions for fractions


Copy the code

We can also use histograms (or density estimates) to see the distribution of scores

Hist (S/Y = = 0, col = RGB (1, 0, 2), aim-listed probability = TRUE, breaks = (0:10) / 10, border = "white")


We do have a “perfect classifier” (curve near upper left)



There is an error. That should be the case below


Ten percent of the time, we might get the classification wrong


More misclassification



And finally we have the diagonal



