Original link:tecdat.cn/?p=19018

Original source:Tuo End number according to the tribe public number

 

Earlier we discussed the advantages of using the ROC curve to describe classifiers. Someone said it describes “strategies for randomly guessing categories.” Let’s go back to the ROC curve. Consider a very simple data set containing 10 observations (not linearly separable)

And here we can check that it is indeed inseparable

plot(x1,x2,col=c("red","blue")[1+y],pch=19)
Copy the code

Consider logistic regression

reg = glm(y~x1+x2,data=df,family=binomial(link = "logit"))
Copy the code

We can use our own ROC functions


roc=function(s,print=FALSE){
Ps=(S<=s)*1
 
FP=sum((Ps==1)*(Y==0)/sum(Y==0)

TP=sum((Ps==1)*(Y==1)/sum(Y==1)

if(print==TRUE){

print(table(Observed=Y,Predicted=Ps))


vect=c(FP,TP)

names(vect)=c("FPR","TPR")
Copy the code

Or R

performance(prediction(S,Y),"tpr","fpr")
Copy the code

We can draw both of them here

So, our code works fine here. Let’s think about the diagonal. The first is: Everyone has the same probability (say 50%)


points(V[1,],V[2,])
Copy the code

 

But we only have two points here :(0,0) and (1,1). In fact, this is the case no matter what probability we choose


plot(performance(prediction(S,Y),"tpr","fpr"))
points(V[1,],V[2,])
Copy the code

We can try another strategy, such as “prediction by flipping an unbiased coin”. We get

Segments (0,0,1,1, col = "light blue")Copy the code

We can also try “random classifiers” in which we randomly select scores



S=runif(10)
Copy the code

One step further. Let’s consider another function to plot the ROC curve

 

y=roc(x) 
lines(x,y,type="s",col="red")
Copy the code

But now let’s think about randomly chosen strategies

For (I in 1:500){S=runif(10) V= roctorize (roc.curve)(seq(0,1,length=251) MY[I,]=roc_curve(x)Copy the code

 

The red line is the average of all the random classifiers. It’s not a straight line, and we see it oscillating around the diagonal.

reg = glm(PRO~.,data=my,family=binomial(link = "logit")) plot(performance(prediction(S,Y),"tpr","fpr")) Segments (0,0,1,1, col = "light blue")Copy the code

This is a “random classifier” where we randomly plot scores on the unit interval

Segments (0,0,1,1, col = "light blue")Copy the code

If we repeat it 500 times, we can get

for(i in 1:500){ S=runif(length(Y)) MY[i,]=roc(x) } lines(c(0,x),c(0,apply(MY,2,mean)),col="red",type="s",lwd=3) Segments (0,0,1,1, col = "light blue")Copy the code

So, when I randomly plot the score on the unit interval, I get the diagonal result. Given Y, we can plot two empirical cumulative distribution functions for fractions



plot(f0,(0:(length(f0)-1))/(length(f0)-1))

lines(f1,(0:(length(f1)-1))/(length(f1)-1))
Copy the code

We can also use histograms (or density estimates) to see the distribution of scores

Hist (S/Y = = 0, col = RGB (1, 0, 2), aim-listed probability = TRUE, breaks = (0:10) / 10, border = "white")Copy the code

 

We do have a “perfect classifier” (curve near upper left)

 

 

There is an error. That should be the case below

 

Ten percent of the time, we might get the classification wrong

 

More misclassification

 

 

And finally we have the diagonal

 


 

Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression