Original link:tecdat.cn/?p=4612

Original source:Tuo End number according to the tribe public number

 

 

Many introductions to Bayesian analysis use relatively simple teaching examples (such as reasoning about the probability of success based on Bernoulli data). While this is a good introduction to Bayes’ principles, extending these principles to regression is not straightforward.

This article will outline how these principles can be extended to simple linear regression. I will derive the posterior conditional distribution of the parameters of interest, give the R code for implementing the Gibbs sampler, and propose the so-called grid method.

 

Bayes model

Suppose we look at the data

For our modelis

 

The interesting thing is the inferenceand. If we place the normal prior on the coefficient and the anti-gamma prior on the variance term, the complete Bayesian model of this data can be written as:

Assumed hyperparameterIs known, a posteriori can be written to a proportionality constant,

The items in parentheses are the joint distribution or probability of the data. Other terms include the joint prior distribution of parameters.

The R code generates data from the specified “real” parameter model. We will use this data later to estimate a Bayesian regression model to see if we can recover these true parameters.




tphi<-rinvgamma(1, shape=a, rate=g)
tb0<-rnorm(1, m0, sqrt(t0) )
tb1<-rnorm(1, m1, sqrt(t1) )
tphi; tb0; tb1;

y<-rnorm(n, tb0 + tb1*x, sqrt(tphi))
Copy the code

 

Gibbs sampler

To derive from this posterior distribution, we can use Gibbs sampling algorithm. Gibbs sampling is an iterative algorithm that generates samples from a posterior distribution of each parameter of interest. It is derived sequentially from the conditional posterior distribution of each parameter in the following manner:

As you can see, the remaining 1,000 samples were taken from a posterior distribution. These samples are not independent. Each step depends on the previous position.

Conditional posterior distribution

To use Gibbs, we need to determine the conditional posteriori for each parameter.

To find a conditional posteriori for a parameter, we simply delete all terms that do not contain a parameter posteriori. For example, a constant term conditional posterior:

Similarly,

Conditional posteriori can be thought of as another inverse gamma distribution.

Conditions for posteriorNot so easy to identify. But if we use the grid method, we don’t have to go through algebra.

Consider the grid approach. The grid method is a violence method that samples from its conditional posterior distribution. This conditional distribution is just a function. So we can evaluate densityValue. In R, it could be grid = seq (-10, 10, by =.001). This sequence is a “grid” of points.

Then the conditional posterior distribution evaluated at each grid point tells us the relative probability of this sample.

We can then extract from these grids using the sample () function in R with a sampling probability proportional to the density assessment at the grid.





  for(i in 1:length(p) ){
    p[i]<- (-(1/(2*phi))*sum( (y - (grid[i]+b1*x))^2 ))  + ( -(1/(2*t0))*(grid[i] - m0)^2)
  }
  

  draw<-sample(grid, size = 1, prob = exp(1-p/max(p)))
Copy the code

This is implemented in the functions rb0cond () and rb1cond () in the first part of R code.

Numerical problems are common when using the grid approach. Because of the unstandardized posteriori in our evaluation grid, the results can become quite large or small. Inf and -inf values may be generated in R.

For example, in the functions rb0cond () and rb1cond (), I actually evaluate the logarithm of the conditional posterior distribution. And then I normalized and logarithmic.

We do not need to use the grid method to sample from a posterior condition because it comes from a known distribution.

Note that this grid approach has some drawbacks.

First, it’s computationally complicated. Algebraically, we want to get a known posterior distribution that is computationally efficient.

Second, the grid method needs to specify the region of the grid points. If the conditional posteriori has a significant density outside the grid interval we specify [-10,10], in this case we will not get an accurate sample from the conditional posteriori. And a wide range of grids are required for experiments. So, we need to be flexible with numbers, such as numbers that are close to Inf and -INF values in R.

 

The simulation results

Now that we can sample from the conditional posteriori of each parameter, we can implement the Gibbs sampler.

iter<-1000
burnin<-101
phi<-b0<-b1<-numeric(iter)
phi[1]<-b0[1]<-b1[1]<-6
Copy the code

It worked out well. The figure below shows a sequence of 1,000 Gibbs samples. The red line represents the real parameter values of our simulated data. The fourth figure shows a posterior joint distribution of intercept and slope terms, with the red line representing the contour line.

z <- kde2d(b0, b1, n=50)
plot(b0,b1, pch=19, cex=.4)
contour(z, drawlabels=FALSE, nlevels=10, col='red', add=TRUE)
Copy the code

To summarize, we first derive an expression for the joint distribution of parameters. Then we outline Gibbs algorithm for posteriori sampling. In this process, we realize that Gibbs method relies on a conditional posterior distribution for each parameter, which is easily recognized as a known distribution. For slope and intercept terms, we decided to circumvent the algebraic approach with a grid approach. The nice thing about this is that we can bypass a lot of algebra. The trade-off is increased computational complexity.

 

reference

 

1. Matlab uses Bayesian optimization for deep learning

 

2. Matlab Bayesian hidden Markov HMM model implementation

 

3. Simple Bayesian linear regression simulation of Gibbs sampling in R language

 

4. Block Gibbs Sampling Bayesian multiple linear regression in R language

 

5. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

 

6.Python implements Bayesian linear regression model with PyMC3

 

7.R language uses Bayesian hierarchical model for spatial data analysis

 

8.R language random search variable selection SSVS estimation Bayesian vector autoregression (BVAR) model

 

9. Matlab Bayesian hidden Markov HMM model implementation