Original link:tecdat.cn/?p=21641 

Original source:Tuo End number according to the tribe public number

 

Wage model

In the field of labor economics, the study of income and wages offers insights into issues ranging from gender discrimination to higher education. In this paper, we will analyze cross-sectional wage data in order to construct wage prediction models using Bayesian methods such as BIC and Bayesian model in practice.

Load the package

In this experiment, we will explore the data using the DPlyr package and visualize the data using the GGploT2 package. We can also use the MASS package in one of the exercises to achieve stepwise linear regression.

We will use bas.lm in this package later in the lab to implement the Bayesian model.

data

The data to be used in this laboratory were randomly selected from a nationwide sample of 935 respondents.

variable describe
wage Weekly income
hours Average working hours per week
IQ IQ scores
kww Working knowledge score
educ Years of education
exper Work experience
tenure Have worked with your current employer for many years
age age
married = 1 if you’re married
black = 1 (if black)
south = 1 if you live in the south
urban = 1 if you live in a city
sibs Number of siblings
brthord Birth order
meduc The mother’s education level
feduc Father’s Education
lwage The natural log of wages

Is this an observational study or an experiment?

  1. Observational studies

 

To explore the data

As with any new data set, standard exploratory data analysis is a good place to start. We’ll start with the wage variable, because it’s the dependent variable in our model.

  1. Which of the following statements is not true about wages?

    1. Seven respondents earned less than 300 yuan a week
summary(wage)
Copy the code
## min.1st Qu. Median Mean 3rd Qu. Max. ## 115.0 669.0 905.0 957.9 1160.0 3078.0Copy the code

Since wages were our dependent variable, we wanted to explore the relationship between other variables as predictors. Exercise: Exclude salary and seniority, and choose two other variables that you think are good predictors of salary. Use appropriate graphs to visualize their relationship to salary. Education and hours worked seem to be good predictors of workers’ wages.


ggplot(data = wage, aes(y=wage, x=exper))+geom_point()
Copy the code

ggplot(data = wage, aes(y=wage, x=educ))+geom_point()
Copy the code

Simple linear regression

One possible explanation for the wage differences we see in the data is that smarter people make more money. The graph below shows the scatter plot between weekly wages and IQ scores.

ggplot(data = wage, aes(x = iq, y = wage)) +
  geom_point()
Copy the code

The graph is pretty messy. While there may be a slight positive relationship between IQ scores and wages, IQ is at best a rough predictor of wages. We can quantify this by fitting a simple linear regression.

m_wage_iq$coefficients
Copy the code
IQ ## 116.991565 8.303064Copy the code
# # 384.7667 [1]Copy the code

Remember, under the model

If you are usingAnd reference prior, then The Bayesian posterior mean and standard deviation are equal to the frequency estimate and standard deviation respectively.

The bayesian model specification assumes that the errors are normally distributed and the variance is constant. As with the frequency method, we test this hypothesis by examining the residual distribution of the model. If the residuals are highly non-normal or skewed, the hypothesis is violated and any subsequent inference is invalid.

  1. Check the residual of M_WAGE_IQ. Is the assumption of normal distribution error valid?

    1. No, because the model’s residual distribution is skewed to the right.
qqnorm(m_wage_iq$residuals)
qqline(m_wage_iq$residuals)
Copy the code

Exercise: Readjust the model, this time using EDUC (education) as the independent variable. Did you change your answer to the last exercise?

Educ ## 146.95244 60.21428Copy the code
summary(m_wage_educ)$sigma
Copy the code
# # 382.3203 [1]Copy the code

It is also concluded that the residual of the linear model is approximately normally distributed with ϵ I ~ N (0, σ2), so further inferences can be made based on the linear model.

The variable transformation

One way to accommodate right-skewing data is to (naturally) transform the dependent variable logarithmically. Note that this is only possible if the variable is strictly positive, because negative logarithms are not defined, and log (0) =−∞. We try to fit a linear model with logarithmic wages as the dependent variable. Problem 4 will be based on this logarithmic transformation model.

m_lwage_iq = lm(lwage ~ iq, data = wage)
Copy the code

Exercise: Check the residuals of the model. Is it reasonable to assume the residuals of a normal distribution?

Based on the residual diagram, a logarithmic linear model of wages and a normal distribution of IQ can be assumed.

Recall that the posterior distributions of α and β given σ2 are normal, but slightly follow a T-distribution with n− P −1 degrees of freedom. In this case, p=1, because IQ is the only logarithmic wage predictor in our model. Thus, the posterior probabilities of both α and β follow the t-distributions of 933 degrees of freedom, and since df is very large, these distributions are actually approximately normal.

  1. The 95% posterior confidence interval (IQ coefficient) of β is given with reference to a prior P (α, β, σ2) ∞1/σ2.

    1. (0.00709, 0.01050)
Qnorm (c(0.025, 0.975), mean = iq_mean_estimate, sd=iq_sd)Copy the code
# # 0.007103173 0.010511141 [1]Copy the code

Exercise: The IQ coefficient is small, which is to be expected, since a one-point increase in IQ scores hardly has a large multiplier effect on wages. One way to make coefficients easier to interpret is to standardise IQ before putting it into the model. From this new model, what percentage increase in wages is estimated to result from a 1 standard deviation (15 points) increase in IQ?

IQ is standardized using a scale function, and a 15-point increase in IQ leads to a rise in wages


coef(summary(m_lwage_scaled_iq))["siq", "Estimate"]*15+coef(summary(m_lwage_scaled_iq))["(Intercept)", "Estimate"]
Copy the code
# # 8.767568 [1]Copy the code

Multiple linear regression

Clearly, wages can be explained by a number of predictors, such as experience, education, and IQ. We can try to account for as much of the wage change as possible by including all the relevant covariables in the regression model.

The use of. In LM tells R to include all covariates in the model, then to modify them further with -wage, and then to exclude wage variables from the model. By default, the LM function performs a full case study, so it removes observations that are missing (NA) values in one or more predictive variables.

Because of these missing values, we must make an additional assumption in order for our inferences to be valid. In other words, our data must be randomly missing. For example, if all first-born children did not report their birth order, the data would not be randomly lost. In the absence of any additional information, we will assume this is reasonable and use 663 complete observations (as opposed to the original 935) to fit the model. The Bayesian and Frequentist methods both exist on data sets that deal with missing data, but they are beyond the scope of this course.

  1. From this model, who earns more: married blacks or single non-blacks?

    1. Married black

All other equal, married blacks would get the following multiplier compared to a single non-black man.


married_black <- married_coef*1+black_coef*1

married_black
Copy the code
# # 0.09561888 [1]Copy the code

As can be seen from a quick summary of the linear model, many of the coefficients of the independent variables are not statistically significant. You can select variables based on the adjusted R2. This paper introduces bayesian information criterion (BIC), a metric that can be used for model selection. BIC is based on model fitting and penalizes the number of parameters proportionally based on sample size. We can calculate the BIC of a fully linear model using the following command:

BIC(m_lwage_full)
Copy the code
# # 586.3732 [1]Copy the code

We can compare the BIC of the complete model with that of the simplified model. Let’s try to remove the birth order from the model. To ensure that observations remain constant, the data set can be specified as na.omit(wage), which contains only observations with no missing values.

m_lwage_nobrthord = lm(lwage ~ . -wage -brthord, data = na.omit(wage))
Copy the code
# # 582.4815 [1]Copy the code

As you can see, removing birth order from the regression reduces BIC, and we try to minimize BIC by selecting models.

  1. Which variable is eliminated from the complete model to get the lowest BIC?

    1. feduc

BIC(m_lwage_sibs)
Copy the code
# # 581.4031 [1]Copy the code
BIC(m_lwage_feduc)
Copy the code
# # 580.9743 [1]Copy the code
BIC(m_lwage_meduc)
Copy the code
# # 582.3722 [1]Copy the code

Exercise: R has a function stepAIC that will run backwards in the model space, deleting variables until BIC no longer decreases. It takes a complete model and a penalty parameter K as input. The best model is found according to BIC (in this case k=log (n) k=log (n)).

# For AIC, the penalty factor is a contact value k. For step BIC, we will use the stepAIC function and set k to log(n) step(m_lWAGe_fulL1, direction = "backward", k=log(n))Copy the code
## Residuals: ## Min 1Q Median 3Q Max ## -172.57-63.43-35.43 23.39 1065.78 ## ## Coefficients: # # Estimate Std. Error t value (Pr > | | t) # # (Intercept) to 5546.2382 84.7839 65.416 < 2 * * * e - 16 # # 1.9072 0.6548 hours 2.913 0.0037 ** ## lwage 951.0113 11.5041 82.667 < 2E-16 *** ## -- ## Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1 "' 1 # # # # Residual standard error: 120.1 on 659 degrees of freedom ## Multiple R-squared: 0.9131, Adjusted R-squared: 0.9127 ## F-statistic: 2307 on 3 and 659 DF, p-value: < 2.2e-16Copy the code

Bayesian model averaging

Often, several models are equally trustworthy, and choosing just one ignores the inherent uncertainty involved in choosing the variables contained in the model. One way of getting around this problem would be to enable Bayesian model averaging (BMA), which averages multiple models to obtain a posterior and predicted value of coefficients from the new data. We can use it to implement BMA or select models. We first apply BMA to wage data.

bma(lwage ~ . -wage, data = wage_no_na,
                   prior = "BIC", 
                   modelprior = uniform())
Copy the code
## ## Marginal Posterior Inclusion Probabilities: ## Intercept Hours IQ KWW educ exper ## 1.00000 0.85540 0.89732 0.34790 0.99887 0.70999 ## tenure age Married1 black1 South1 urban1 ## 0.70389 0.52468 0.99894 0.34636 0.32029 1.00000 ## sibs brthord feduc ## 0.04152 0.12241 0.57339 0.23274Copy the code
summary(bma_lwage)
Copy the code
## P(B ! = 0 | Y) model 1 model 2 model 3 # # # # Intercept 1.00000000 1.0000 1.0000000 1.0000000 0.85540453 1.0000 1.0000000 hours 1.0000000 ## IQ 0.89732383 1.0000 1.0000000 1.0000000 ## KWW 0.34789688 0.0000 0.0000000 0.0000000 ## educ 0.99887165 1.0000001.0000000 ## exper 0.70999255 0.0000 1.0000000 1.0000000 ## tenure 0.70388781 1.00001.0000000 ## tenure 0.70388781 1.00001.0000000 # 1.0000000 ## married1 0.99894488 1.0000 1.0000000 1.0000000 ## black1 0.34636467 0.0000 0.0000000 0.0000000 ## south1 0.32028825 0.0000 0.0000000 0.0000000 ## urban1 0.99999983 1.0000 1.0000000 1.0000000 ## sibs 0.04152242 0.0000 0.0000000 0.0000000 ## brthord 0.12241286 0.0000 0.0000000 0.0000000 ## Meduc 0.57339302 1.0000 1.0000000 1.0000000 ## feduc 0.23274084 0.0000 0.0000000 0.0000000 ## BF NA 1.0000 0.5219483 0.5182769 ## PostProbs NA 0.0455 0.0237000 0.0236000 ## R2 NA 0.2710 0.2767000 0.2696000 ## dim 9.000010.0000000 9.0000000 ## logmarg nA-1490.0530-1490.7032349-1490.7102938 ## Model 4 Model 5 ## Intercept 1.0000000 1.0000000 ## Hours 1.0000000 1.0000000 ## IQ 1.0000000 1.0000000 ## KWW 1.0000000 0.0000000 ## educ 1.0000000 1.0000000 ## exper 1.0000000 0.0000000 ## tenure 1.0000000 1.0000000 ## age 0.0000000 1.0000000 ## married1 1.0000000 1.0000000 ## black1 0.0000000 ## south1 0.0000000 0.0000000 ## urban1 1.0000000 1.0000000 ## sibs 0.0000000 0.0000000 ## brthord 0.0000000 0.0000000 ## feduc 0.0000000 0.0000000 ## BF 0.4414346 0.4126565 ## PostProbs 0.0201000 0.0188000 ## R2 0.2763000 0.2762000 ## DIM 10.0000000 10.0000000 ## logmarg-1490.8707736Copy the code

The output model object and summary command provide us with a posteriori model of each variable including probabilities and most likely models. For example, the posterior probability of including hours in the model is 0.855. In addition, the most likely models with a posterior probability of 0.0455 included intercept, working hours, IQ, education, age, marital status, urban living status, and maternal education. While a posterior probability of 0.0455 sounds small, it is much larger than the uniform prior probability assigned to it, because there are 216 possible models. Under the model averaging method, the posterior distribution of coefficients can be visualized. We plot the posterior distribution of the IQ coefficients as follows.

Plot (coef_lwage, subset = c(3,13), ask=FALSE)Copy the code

We can also provide 95% confidence intervals for these coefficients:

## + hours + 6.787648e+00 + 6.841901e+00 + 6.8142970694 ## hours + 9.213871e-03 0.000000e+00 + 0.0053079979 ## echo + IQ +0; ## echo + IQ +0 ## turn + 1 + e-1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 + e1 ## married1 1.173088e-01 3.015797e-01 0.2092940731 ## black1 -1.900452e-01 0.000000e+00 -0.0441863361 ## south1-1.021332e-01 1.296992e-05-0.0221757978 ## urban1 1.374767e-01 2.621311e-01 0.1981221313 ## sibs 0.000000e+00 0.000000e+00 + 00218455 ## brthord-2.072393e-02 0.000000e+00 ## fedu-7.996880e-06 1.558576e-02 0.0025125930 ## Attr (, "aim-listed Probability") # # # # 0.95 [1] attr (" class ") # # [1] "confint. Bas"Copy the code

For questions 7-8, we will use a simplified data set that excludes siblings, birth order, and parental education.

wage_red = wage %>% dplyr::select(-sibs, -brthord, -meduc, -feduc)
Copy the code
  1. Based on this simplified data set, which of the following variables has the lowest marginal posterior inclusion probability according to the Bayesian model on average?

    1. age
## 
## Call:
## 
## 
##  Marginal Posterior Inclusion Probabilities: 
## Intercept      hours         iq        kww       educ      exper  
##    1.0000     0.8692     0.9172     0.3217     1.0000     0.9335  
##    tenure        age   married1     black1     south1     urban1  
##    0.9980     0.1786     0.9999     0.9761     0.8149     1.0000
Copy the code
## P(B ! = 0 | Y) model 1 model 2 model 3 # # # # Intercept 1.0000000 1.0000 1.0000000 1.0000000 0.8691891 1.0000 1.0000000 hours 1.0000000 ## IQ 0.9171607 1.0000 1.0000000 1.0000000 ## KWW 0.3216992 0.0000 1.0000000 0.0000000 ## educ 1.0000000 Tenure 0.9980015 1.0000 1.0000000 1.0000000 ## exper 0.9334844 1.0000 1.0000000 1.0000000 ## tenure 0.9980015 1.0000 1.0000000 1.0000000 ## married1 0.9999368 1.0000 1.0000000 1.0000000 ## married1 0.9999368 1.0000 1.0000000 1.0000000 ## black1 0.9761347 1.0000 1.0000000 1.0000000 ## south1 0.8148861 1.0000 1.0000000 0.0000000 ## urban1 1.0000000 1.0000 1.0000000 1.0000000 ## BF NA 1.0000 0.5089209 0.2629789 ## PostProbs NA 0.3311 0.1685000 0.0871000 ## R2 NA 0.2708 0.2751000 0.2634000 ## 10.0000 11.0000000 9.0000000 ## logmarg NA-2275.4209-2276.0963811-2276.7565998 ## Model 4 Model 5 ## Intercept 1.0000000 1.0000000 ## hours 1.0000000 0.0000000 ## IQ 1.0000000 1.0000000 ## KWW 0.0000000 0.0000000 ## educ 1.0000000 1.0000000 ## married1 1.0000000 ## married1 1.0000000 ## married1 1.0000000 ## married1 1.0000000 ## married1 1.0000000 ## married1 1.0000000 ## married1 1.0000000 1.0000000 ## south1 1.0000000 1.0000000 ## urban1 1.0000000 1.0000000 ## BF 0.2032217 0.1823138 ## PostProbs 0.0673000 0.0604000 ## R2 0.2737000 0.2628000 ## dim 11.0000000 9.0000000 ## logmarg - 2277.0143763-2277.1229445Copy the code
  1. Judgment: The posterior probability of the naive model with all variables is greater than 0.5. (zellner-siow zero priors for coefficients and β binomial (1,1) priors for models)

    1. really
bma_lwage_full
Copy the code
## 
## Call:
## 
## 
##  Marginal Posterior Inclusion Probabilities: 
## Intercept      hours         iq        kww       educ      exper  
##    1.0000     0.9792     0.9505     0.6671     0.9998     0.8951  
##    tenure        age   married1     black1     south1     urban1  
##    0.9040     0.7093     0.9998     0.7160     0.6904     1.0000  
##      sibs    brthord      meduc      feduc  
##    0.3939     0.5329     0.7575     0.5360
Copy the code
## P(B ! = 0 | Y) model 1 model 2 model 3 model 4 # # # # Intercept 1.0000000 1.00000000 1.00000000 1.00000000 1.0000 hours 0.9791805 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## IQ 0.9504649 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## KWW 0.6670580 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## educ 0.9998424 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## exper 0.8950911 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## tenure 0.9040156 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## age 0.7092839 1.00000000 1.00000000 1.00000000 1.0000 ## Married1 0.9997881 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## Black1 0.7160065 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## south1 0.6903763 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## Urban1 1.0000000 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## sibs 0.3938833 1.00000000 1.00000000 0.00000000 0.0000 ## Brthord 0.5329258 1.00000000 1.00000000 1.00000000 1.00000000 0.0000 ## meduc 0.7575462 1.00000000 1.00000000 1.00000000 1.00000000 1.0000 ## Feduc 0.5359832 1.00000000 0.00000000 1.00000000 0.0000 # BF NA 0.01282537 0.06040366 0.04899546 1.0000 ## PostProbs NA 0.07380000 0.02320000 0.01880000 0.0126 ## DIM NA 0.29250000 0.29140000 0.29090000 0.2882 ## dim NA 16.00000000 15.00000000 15.00000000 13.0000 ## logmarg NA 76.00726935 77.55689364 77.34757164 80.3636 ## Model 5 ## Intercept 1.000000 ## hours 1.000000 ## IQ 1.000000 ## KWW 1.000000 ## educ 1.000000 ## exper 1.000000 ## tenure 1.000000 ## age # married1 1.000000 ## black1 1.000000 ## south1 1.000000 ## urban1 1.000000 ## sibs 0.000000 ## brthord 1.000000 ## meduc 1.000000 ## feduc 0.000000 ## BF 0.227823 ## PostProbs 0.012500 ## R2 0.289600 ## Logmarg 78.884413Copy the code

Exercise: Plot a posterior distribution of age coefficients from a data set.


plot(coef_bma_wage_red, ask = FALSE)
Copy the code

To predict

A major advantage of Bayesian statistics is prediction and probabilistic interpretation of predictions. Most Bayesian prediction is done using simulation techniques. This is typically applied in regression modeling, although we will examine it with an example that contains only intercept items.

Suppose you observe four numerical observations of y, which are 2, 2, 0, and 0, with the sample mean y ‘=1 and the sample variance S2 =4/3. Assuming y ~ N (μ, σ2), with reference to prior P (μ, σ2) ~ 1/σ2, our posterior probability becomes

Centered on the sample mean

Where a= (n-1) /2 and b=s2 (n-1) /2=2.

To obtain the predicted distribution of y5, we can first simulate y5 from a posterior point of σ2 and then from μ. Our prediction for y5 will come from a posteriori prediction distribution of a new observation. The following example extracts 100,000 times from the posterior prediction distribution of Y5.

set.seed(314)
N = 100000

y_5 = rnorm(N, mu, sqrt(sigma2))
Copy the code

We can see estimates of the predicted distribution by looking at a smoothed version of the simulated data histogram.

The 95% central confidence interval for the new observation is in this case, L is 0.025 quantile and U is 0.975 quantile. We can use a quantile function to obtain these values to find 0.025 and 0.975 sample quantiles for tracy5.

  1. Estimate a new observation y595% confidence interval

    1. (3.11, 5.13)
Quantile (y_5, probs = C (0.025, 0.975))Copy the code
## -3.109585 5.132511Copy the code

Exercise: In the simple example above, you can use integrals to analyze and calculate a posterior predictive value. In this case, it is a T-distribution with 3 degrees of freedom (n−1). Plot the empirical density of y and the actual density of the t-distribution. How do they compare?

 
plot(den_of_y)
Copy the code

BAS prediction

In BAS, the bayesian model averaging method is used to construct the prediction interval through simulation, but in the case of model selection, it is often feasible to use the prediction interval for accurate reasoning.

Returning to the wage data set, let’s find the predicted value under the best forecast model, that is, the model with the predicted value closest to the BMA and corresponding posterior standard deviation.

predict(bma_lwage, estimator="BPM")
Copy the code
##  [1] "Intercept" "hours"     "iq"        "kww"       "educ"     
##  [6] "exper"     "tenure"    "age"       "married1"  "urban1"   
## [11] "meduc"
Copy the code

We can compare this to the highest probability model and the median probability model (MPM) that we found earlier

predict(bma_lwage, estimator="MPM")
Copy the code
##  [1] "Intercept" "hours"     "iq"        "educ"      "exper"    
##  [6] "tenure"    "age"       "married1"  "urban1"    "meduc"
Copy the code

MPM contains exper in addition to all variables of HPM, while BPM contains KWH in addition to all variables in MPM. Exercise: Using simplified data, what covariables are included in the best prediction model, median probability model, and highest posterior probability model? Let’s take a look at which characteristics of the BPM model affect the highest wages.

##         [,1]     
## wage    "1586"   
## hours   "40"     
## iq      "127"    
## kww     "48"     
## educ    "16"     
## exper   "16"     
## tenure  "12"     
## age     "37"     
## married "1"      
## black   "0"      
## south   "0"      
## urban   "1"      
## sibs    "4"      
## brthord "4"      
## meduc   "16"     
## feduc   "16"     
## lwage   "7.36897"
Copy the code

95% confidence intervals for predicted logarithmic wages can be obtained

Pred ## 6.661869 8.056452 7.359160Copy the code

In terms of wages, we can indexate the interval.

Pred ## 782.0108 3154.0780 1570.5169Copy the code

Get a salary within the 95% forecast range. If YOU use BMA, the interval is zero

Pred ## 733.3446 2989.2076 1494.9899Copy the code

Exercise: Using simplified data, construct a 95% prediction interval for the individual with the highest predicted salary under BPM.

reference

Wooldridge, Jeffrey. 2000. Introductory Econometrics- A Modern Approach. South-Western College Publishing. 


Most welcome insight

1. Matlab uses Bayesian optimization for deep learning

2. Matlab Bayesian hidden Markov HMM model implementation

3. Simple Bayesian linear regression simulation of Gibbs sampling in R language

4. Block Gibbs Sampling Bayesian multiple linear regression in R language

5. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

6.Python implements Bayesian linear regression model with PyMC3

7.R language uses Bayesian hierarchical model for spatial data analysis

8.R language random search variable selection SSVS estimation Bayesian vector autoregression (BVAR) model

9. Matlab Bayesian hidden Markov HMM model implementation