Original link:tecdat.cn/?p=10207
Original source:Tuo End number according to the tribe public number
When OLS estimates are applied to regression models in the social sciences, one of the assumptions is homoscedasticity, and I prefer the constant error variance. This means that there is no systematic pattern of error variances, which means that the model is equally bad at all levels of prediction.
Heteroscedasticity is a complement to homoscedasticity and does not bias OLS. If you don’t care about p-values as much as most people in the social sciences do, heteroscedasticity may not be a problem.
Econometricians have developed a wide variety of heteroscedastic consistent standard errors, so they can continue to apply OLS while adjusting for variable error variances. The Wikipedia page for these corrections lists many of the names used for these alternative standard errors.
We provide likelihood functions, and both functions will find parameter estimates that maximize likelihood.
Let’s look at a simple example: ‘ ‘
First, I extracted 500 observations from the normal distribution of mean 3 and standard deviation 1.5 and saved them in the dataset:
dat <- data.frame(y = rnorm(n = 500, mean = 3, sd = 1.5))
Copy the code
The mean and standard deviation of samples are:
mean(dat$y)
[1] 2.999048
sd(dat$y)
[1] 1.462059
Copy the code
I could also ask the question, what are the parameters of the normal distribution, mean and standard deviation that maximize the probability of the observed variables?
m.sd <- mle2(y ~ dnorm(mean = a, sd = exp(b)), data = dat,
start = list(a = rnorm(1), b = rnorm(1)))
Copy the code
In the syntax above, the mean value of R variable y is a constant A, and the standard deviation of Y is a constant B. I’m going to power the standard deviation to make sure that it’s never negative. We provide the initial value so that it can start estimating before converging to a value that maximizes the likelihood. Random numbers are sufficient to satisfy initial values.
m.sd
Call:
mle2(minuslogl = y ~ dnorm(mean = a, sd = exp(b)), start = list(a = rnorm(1),
b = rnorm(1)), data = dat)
Coefficients:
a b
2.9990478 0.3788449
Log-likelihood: -898.89
Copy the code
The coefficient A is very similar to the average of the data. The coefficient b must be exponentiated to obtain the standard deviation:
exp(coef(m.sd)[2])
b
1.460596
Copy the code
This is similar to the standard deviation we obtained above. Another interesting fact about the syntax demonstrated above is that lm() resembles the function coef(), summary() and can be used on mle2() objects.
The maximum likelihood estimate we performed above is similar to the intercept only regression model using OLS estimation:
coef(lm(y ~ 1, dat))
(Intercept)
2.999048
sigma(lm(y ~ 1, dat))
[1] 1.462059
Copy the code
The intercept is the mean value of the data, and the residual standard deviation is the standard deviation.
Heteroscedasticity regression model
Consider the following study. We assigned two groups, a treatment group of 30 people, and a control group of 100 people each, and the treatment groups were matched by covariates that determined the outcome. Therefore, we are interested in the therapeutic effect and let us assume that a simple mean difference is sufficient. As it happens, in addition to being effective, the treatment is also homogenous, with, for example, better outcomes after subjects are brainwashed. The following data sets should conform to the above protocol:
Copy the code
There were 100 participants who had a treatment status of 0 (control group) with a mean of 0 and a standard deviation of 1. Thirty participants had a treatment status of 1 (treatment group), with a mean of 0.3, a standard value of 1 and a deviation of 0.25.
This situation obviously violates the homoscedasticity hypothesis, however, we continue to perform OLS estimation of the treatment effect:
Call:
Residuals:
Min 1Q Median 3Q Max
-2.8734 -0.5055 -0.0287 0.4231 3.4097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03386 0.09298 0.364 0.716
treat 0.21733 0.19355 1.123 0.264
Residual standard error: 0.9298 on 128 degrees of freedom
Multiple R-squared: 0.009754, Adjusted R-squared: 0.002018
F-statistic: 1.261 on 1 and 128 DF, p-value: 0.2636
Copy the code
The therapeutic effect was 0.22, not statistically significant, p= 0.26, P =.26 at a grade of.05. But we know that the variance is not homoscedasticity because we created the data and a simple diagnostic graph of the residuals against the fitted values confirms this:
Copy the code
First, I’ll document recreating the OLS model:
Copy the code
In this function, I create a model for the average of the results, which is a function of the intercept, b_INT, and a coefficient of the treatment predictor, b_treat. The standard deviation is still an exponential constant. The model will be equivalent to the linear model.
But, we know that the variance is not constant, that the two groups are different. We can specify the standard deviation as a function of the group:
Copy the code
Here, we specify a model for the standard deviation as a function of the intercept s_int, representing the control group, and the deviation from the intercept is s_treat.
We can do better. We can use coefficients from OLS models as initial values b_INT and b_treat. Operating model:
Maximum likelihood estimation
Call:
(minuslogl = y ~ dnorm(mean = b_int + b_treat * treat, sd = exp(s_int +
s_treat * treat)), start = list(b_int = coef(m.ols)[1], b_treat = coef(m.ols)[2],
s_int = rnorm(1), s_treat = rnorm(1)))
Coefficients:
Estimate Std. Error z value Pr(z)
b_int 0.033862 0.104470 0.3241 0.74584
b_treat 0.217334 0.112249 1.9362 0.05285 .
s_int 0.043731 0.070711 0.6184 0.53628
s_treat -1.535894 0.147196 -10.4344 < 2e-16 ***
---
Signif. codes: 0'* * *'0.001'* *'0.01'*'0.05'. '0.1' '1
-2 log L: 288.1408
Copy the code
The treatment is about the same, but now the P is.053. Far less than 0.26 assumed for pure analysis of variance. The b_treat variable is much more accurate because the standard error here is.11 less than.19.
The standard deviation model suggests that the standard deviation is:
exp(coef(m.het)[3])
s_int
1.044701
Copy the code
Control group and 1.045:
exp(coef(m.het)[3] + coef(m.het)[4])
s_int
0.2248858
Copy the code
22 was the treatment group. These values are close to what we know about simulations. We can confirm the sample statistics as follows:
treat y
1 0 1.0499657
2 1 0.2287307
Copy the code
Models can also be easily compared when there is no heteroscedasticity and heteroscedasticity is allowed:
Likelihood Ratio Tests
Model 1: m.mle, y~dnorm(mean=b_int+b_treat*treat,sd=exp(s1))
Model 2: m.het, y~dnorm(mean=b_int+b_treat*treat,sd=exp(s_int+s_treat*treat))
Tot Df Deviance Chisq Df Pr(>Chisq)
1 3 347.98
2 4 288.14 59.841 1 1.028 e-14 ***
---
Signif. codes: 0'* * *'0.001'* *'0.01'*'0.05'. '0.1' '1
Copy the code
The likelihood ratio test suggested that we improved the model, χ2 (1) =59.81, P < 0.001χ2 (1) =59.81, P <.001.
Therefore, we can confirm that reciprocal modeling improves accuracy in this single example. When the effect is zero and we have heteroscedasticity, it is easy to write simulation code that compares heteroscedasticity MLE to OLS estimates.
I changed the code from above by giving the treatment group an average of zero so that there was no mean difference between the two groups. I repeated the process 500 times, saving therapeutic effect from OLS and its P value, and saving therapeutic effect from heteroscedasticity MLE and its P value.
Then, I draw the result:
par(mfrow = c(1.1))
Copy the code
OLS and heteroscedastic MLE had similar therapeutic effects. However, when NULL is true, the p-value of the heteroscedastic MLE model performs better. If null is true, the p values can be expected to be evenly distributed. The P values of OLS iterations are stacked at the high end.
This time, I repeated the process so that the mean for the treatment group was 0.15, so that the null hypothesis of zero effect was false.
Again, the therapeutic effect has the same distribution. However, compared with OLS, the P value of heteroscedasticity MLE was much smaller and had greater statistical power to detect therapeutic effects.
First, you specify a function for negative logarithmic possibilities, and then you pass this function to the MLE.
(minuslogl = ll, start = list(b_int = rnorm(1), b_treat = rnorm(1),
s_int = rnorm(1), s_treat = rnorm(1)))
Coefficients:
Estimate Std. Error z value Pr(z)
b_int 0.033862 0.104470 0.3241 0.74584
b_treat 0.217334 0.112249 1.9362 0.05285 .
s_int 0.043733 0.070711 0.6185 0.53626
s_treat -1.535893 0.147196 -10.4343 < 2e-16 ***
---
Signif. codes: 0'* * *'0.001'* *'0.01'*'0.05'. '0.1' '1
-2 log L: 288.1408
Copy the code
Family: gaussian ( identity )
Formula: y ~ treat
Dispersion: ~treat
Data: dat
AIC BIC logLik deviance df.resid
296.1 307.6 -144.1 288.1 126
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.03386 0.10447 0.324 0.7458
treat 0.21733 0.11225 1.936 0.0528 .
---
Signif. codes: 0'* * *'0.001'* *'0.01'*'0.05'. '0.1' '1
Dispersion model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.08746 0.14142 0.618 0.536
treat -3.07179 0.29439 -10.434 <2e-16 ***
---
Signif. codes: 0'* * *'0.001'* *'0.01'*'0.05'. '0.1' '1
Copy the code
In this case, the dispersion is within the range of logarithmic variance, so the exponential logarithmic variance square root of the square must be taken to retrieve the group standard deviation described above.