The original link:tecdat.cn/?p=13913 

Original source:Tuo End number according to the tribe public number

 

 

We discuss methods of using programs to obtain confidence intervals for predictions. We’re going to talk about linear regression.

 

> plot(cars)
> reg=lm(dist~speed,data=cars)
> abline(reg,col="red")
> n=nrow(cars)
> x=21
> points(x,predict(reg,newdata= data.frame(speed=x)),pch=19,col="red")
Copy the code

 

We’re making a prediction here. As in the R class (and) in the process of forecasting model of the review, when we want to provide a confidence interval for prediction, it is recommended that you determine confidence interval for predictor (this will depend on the prediction error) estimation of parameter) and the potential value of the confidence interval (this also depends on the model error, residuals of discrete degree). Let’s start with the predicted confidence interval:


abline(reg,col="light blue")
points(x,predict(reg,newdata=data.frame(speed=x)),pch=19,col="blue")
Copy the code

Blue values are possible predictions that can be obtained by resampling in our observational database. The confidence interval (90%) for the normality assumption of the residuals (and hence estimates of the slope and constant of the regression line) is as follows


lines(0:30,U[,2],col="red",lwd=2)
lines(0:30,U[,3],col="red",lwd=2)
Copy the code

 

Here we can compare the distribution of values obtained on 500 generated data sets and compare the empirical quantiles with the quantiles under assumed normality,

 polygon(c(D$x[I],rev(D$x[I])),c(D$y[I],rep(0,length(I))),col="blue",border=NA)
Copy the code

The number is

      5%      95% 
58.63689 70.31281 
       fit      lwr      upr
65.00149 59.65934 70.34364
Copy the code

Now, let’s look at another type of confidence interval that focuses on the possible values of variables. This time, in addition to drawing new samples and calculating predicted values, we will also add noise to each drawing and we get the possible values.

 points(x,Yx[s],pch=19,col="red")
Copy the code

 

Again, here we can compare (starting graphically) values obtained by resampling with values obtained under normal conditions,

 polygon(c(D$x[I],rev(D$x[I])),c(D$y[I],rep(0,length(I))),col="blue",border=NA)
Copy the code

 

Give the following comparison numerically

      5%      95% 
44.43468 96.01357 
       fit      lwr      upr
1 67.63136 45.16967 90.09305
Copy the code

This time there’s a slight asymmetry on the right. Obviously, we cannot assume a Gaussian residual because positive values are larger than negative values. This is reasonable given the nature of the data (the distance cannot be negative).

Then we start talking about using regression models.

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3209 4372 4411 4428 4435 4456
[2,] 3367 4659 4696 4720 4730   NA
[3,] 3871 5345 5398 5420   NA   NA
[4,] 4239 5917 6020   NA   NA   NA
[5,] 4929 6794   NA   NA   NA   NA
[6,] 5217   NA   NA   NA   NA   NA
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3209 1163   39   17    7   21
[2,] 3367 1292   37   24   10   NA
[3,] 3871 1474   53   22   NA   NA
[4,] 4239 1678  103   NA   NA   NA
[5,] 4929 1865   NA   NA   NA   NA
[6,] 5217   NA   NA   NA   NA   NA
Copy the code

Then, we can build a data set.


> head(base,12)
      y   ai bj
1  3209 2000  0
2  3367 2001  0
3  3871 2002  0
4  4239 2003  0
5  4929 2004  0
6  5217 2005  0
7  1163 2000  1
8  1292 2001  1
9  1474 2002  1
10 1678 2003  1
11 1865 2004  1
12   NA 2005  1
> tail(base,12)
    y   ai bj
25  7 2000  4
26 10 2001  4
27 NA 2002  4
28 NA 2003  4
29 NA 2004  4
30 NA 2005  4
31 21 2000  5
32 NA 2001  5
33 NA 2002  5
34 NA 2003  5
35 NA 2004  5
36 NA 2005  5
Copy the code

We can then use a regression model based on Stavros Christofides’ logarithmic incremental payment model, which is based on the lognormal model originally proposed by Etienne de Vylder in 1978.

Residuals: Min 1Q Median 3Q max-0.26374-0.05681 0.00000 0.04419 0.33014 Coefficients: T the value Estimate Std. Error (Pr > | | t) (Intercept). * * * 7.9471 0.1101 72.188 6.35 e-15 as factor (ai) 2001 0.1604 0.1109 1.447 As.factor (ai)2002 0.2718 0.1208 2.250 0.04819 * AS.factor (AI)2003 0.5904 0.1342 4.399 0.00134 ** As.factor (ai)2004 0.5535 0.1562 3.543 0.00533 ** AS.factor (AI)2005 0.6126 0.2070 2.959 0.01431 * as.factor(BJ) 1-0.9674 Factor (BJ) 2-4.2329 0.1208-35.038 8.50E-12 *** as.factor(BJ) 3-5.0571 0.1342-37.684 Factor (BJ) 5-4.9026 0.2070-23.685 4.08E-10 *** -- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '0.1' 0.1753 on 10 degrees of Freedom (15 observations deleted due to missingness) Multiple R-squared: 0.9975, Adjusted R - squared: 0.9949 F - statistic: 391.7 on 10 and 10 DF, p - value: 1.338 e-11 [1], [2], [3], [4] [5] [6], [1] [2] 2871.2 1091.3 41.7 18.3 7.8 21.3 3370.8 1281.2 48.9 21.5 9.2 25.0 [3] 3768.0 1432.1 54.7 24.0 10.3 28.0 [4,] 5181.5 1969.4 75.2 33.0 14.2 38.5 [5,] 4994.1 1898.1 72.5 31.8 13.6 37.1 [6,] 5297.8 2013.6 76.9 33.7 14.5 39.3 > sum(base$py[is.na(base$y)]) [1] 2481.857Copy the code

We got slightly different results than we did with the Chain Ladder method. We can also try Poisson regression (with logarithmic links), as Hachemeister and Stanard suggested in 1975,

Deviance Residuals: Min 1Q Median 3Q max-2.3426-0.4996 0.0000 0.2770 3.9355 Coefficients: Estimate Std. Error z value (Pr > | z |) (Intercept) 8.05697 0.01551 519.426 < 2-16 * * * e as) factor (ai) 0.06440 0.02090 2001 As.factor (ai)2002 0.20242 0.02025 9.995 < 2e-16 *** as.factor(AI)2003 0.31175 0.01980 15.744 < 2e-16 *** as. Factor (ai)2004 0.44407 0.01933 22.971 < 2e-16 *** as. Factor (AI)2005 0.50271 0.02079 24.179 < 2e-16 *** As. factor(bj) 1-0.96513 0.01359-70.994 < 2e-16 *** as.factor(BJ) 2-4.14853 0.06613-62.729 < 2e-16 *** as -5.10499 0.12632-40.413 < 2e-16 *** as.factor(bj) 4-5.94962 0.24279-24.505 < 2e-16 *** as.factor(BJ) 5-5.01244 0.21877 -22.912 < 2e-16 *** -- signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '0.1' (Dispersion parameter for poisson family taken to be 1) Null deviance: 46695.269 on 20 degrees of freedom Residual deviance: 30.214 on 10 degrees of Freedom (15 Observations deleted due to missingness) AIC: 209.52 Number of Fisher Scoring iterations: 4 > round (matrix (base $py2, n, n), 1) [1], [2], [3], [4] [5] [6], [1] [2] 3155.7 1202.1 49.8 19.1 8.2 21.0 3365.6 1282.1 53.1 20.4 8.8 22.4 [3,] 3863.7 1471.8 61.0 23.4 10.1 25.7 [4,] 4310.1 1641.9 68.0 26.1 11.2 28.7 [5,] 4919.9 1874.1 Sum (base$py2[is.na(base$y)]) [1] 2426.985Copy the code

This prediction is consistent with the estimator obtained by the chain ladder method. Klaus Schmidt and Angela Wensche established links with minimum deviation methods in chain ladder, marginal sum and maximum likelihood estimation in 1998.