Original link:tecdat.cn/?p=14887
Generalized linear model (GLM)Is through theThe connection function, put the independent variablelinearThe combination is associated with a probability distribution of the dependent variable, which can be a Gaussian, binomial, polynomial, Poisson, gamma, or exponential distribution. The connection functions are:
- Square root connection (for Poisson model)
Consider some random variable Y with mean μ and variance σ2. Taylor expansion
ifIf g (y) = SQRT {y} g (y) = y, then the second equation becomes
Therefore, we have variance stability through the square root transformation, which can be interpreted as a certain homology.
- Log function of Bernoulli model
Suppose the variables are Poisson variables,
The previous model looked like Bernoulli regression analysis, with H as the link function, \ mathbb {P}
So now suppose that instead of observing N, we observe that Y = 1 (N> 0). In that case, running Bernoulli regressions with logarithmic linking functions, first with running Poisson regressions on the raw data, and then using them on our binary variables zero and non-zero. Let’s compare eλx and PX from standard logistic regression with some simulation data
regPois = glm(Y~.,data=base,family=poisson(link="log"))
regBinom = glm((Y==0)~.,data=base,family=binomial(link="probit"))
Copy the code
What if px \ is obtained from Bernoulli’s regression and has a join function?
plot(prob,1-exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code
The fit is good. Now, if we model the marital infidelity data set, published by Ray Fair in 1978 in the journal Political Economy (563 observations, nine variables) :
prob = predict(regBinom, type="response")
plot(prob,exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code
In this case, the two models turn out to be very different. The second model is the same
plot(prob,1-exp(-lambda),xlim=0:1,ylim=0:1)
abline(a=0,b=1,lty=2,col="red")
Copy the code
How do we explain this? Is it because the Poisson model is bad? We run the zero inflation model here to compare,
summary(regZIP)
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>| z |) (Intercept) 0.002274 0.048413 0.047 0.963 1.019814 0.026186 38.945 X1<2e-16 ***
X2 1.004814 0.024172 41.570 <2e-16 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>| z |) (Intercept) 4.90190 2.07846 2.358 0.0184 2.00227 0.86897 2.304 0.0212 * * X1 - X2-0.01545-0.96121-0.016-0.9872 - Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1Copy the code
Because of the expansion of zero, we reject the assumption of the Poisson distribution here and can use logarithmic connections to check whether the Poisson distribution is a good model.
reference
1. Use SPSS to estimate the HLM hierarchical linear model
2. Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Regular Discriminant Analysis (RDA) of R language
3. Lmer mixed linear regression model based on R language
4. Simple Bayesian linear regression simulation analysis of Gibbs sampling in R language
5. Use GAM (Generalized additive Model) to analyze power load time series in R language
6. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus
Ridge regression, lasso regression, principal component regression in R language: Linear model selection and regularization
8. Prediction of air quality ozone data by linear regression model in R language
9.R language hierarchical linear model case