Original link:tecdat.cn/?p=23524

Original source:Tuo End number according to the tribe public number

In this article, I want to show you how to sample from a Bayesian Poisson regression model using R’s Metropolis sampling.

Metropolis – Hastings algorithm

Metropolis-hastings sampling algorithm is a kind of Markov chain Monte Carlo (MCMC) method, whose main idea is to generate a Markov chainThe stationary distribution is the target distribution. One of the most common applications of this algorithm is sampling posterior density in Bayesian statistics, which is the goal of this paper.

The algorithm says for a given state Xt, how to generate the next stateI have a candidate point Y, which is from a proposed distribution, is accepted according to the decision criteria, so the chain moves to state Y at time t+1, that is, Xt+1=Y or is rejected, so the chain remains in state Xt at time t+1, that is, Xt+1=Xt.

Metropolis sampling

In the Metropolis algorithm, the proposal distribution is symmetric, that is, the proposal distributionmeet 

, so the Metropolis sampler produces a Markov chain as follows.

  1. Pick a proposal distributionBefore selecting it, understand the ideal features in this function.
  2. Generate X0 from the proposal distribution G.
  3. Repeat until the chain converges to a stationary distribution.
  • fromGeneration Y.
  • Generate U from Uniform(0, 1).
  • if, accept Y and set Xt+1=Y, otherwise set Xt+1=Xt. That means candidate Y is accepted with a high probability.
  • Increasing t.

Bayes method

As I mentioned earlier, we’re sampling from bayes defined as poisson regression models.

For parameter estimation in Bayesian analysis, we need to find the likelihood function of the model of interest, in this case from the Poisson regression model.

Now we must specify a prior distribution for each parameter β0 and β1. We will use a normal distribution without information for these two parameters, β0 ~ N(0,100) and β1 ~ N(0,100).

Finally, we define the posterior distribution as the product of the prior distribution and the likelihood distribution.

When using the Metropolis sampler, the posterior distribution is the target distribution.

Calculation method

Here you’ll learn how to use the Metropolis sampler of R to sample from a posterior distribution of parameters β0 and β1.

data

First, we generate data from poisson regression model introduced above.

N < -1000 # sample size J < -2 # number of arguments X < -runif (n,-2,2) # generate argument value beta < -runif (J,-2,2) # generate argument value y < -rpois (n, Lambda = lambda) # Generate the dependent variable valueCopy the code

Likelihood function

Now we define the likelihood function. In this case, we will use the logarithm of this function, which is strongly recommended to avoid numerical problems when running the algorithm.

LikelihoodFunction < -function (param){beta0 < -param [1] beta1 < -param [2] lambda < -exp (beta1*X + beta0) # logarithmic LikelihoodFunction loglikelihoods <- sum(dpois(y, lambda = lambda, log=T)) return(loglikelihoods) }Copy the code

Prior distribution

Next we define the prior distributions of the parameters β0 and β1. As with the likelihood function, we will use the logarithm of the prior distribution.

beta0prior <- dnorm(beta0, 0, sqrt(100), log=TRUE) beta1prior <- dnorm(beta1, 0, sqrt(100), Log =TRUE) return(beta0prior + beta1prior) # log of prior distributionCopy the code

Posterior distribution

Since we are working with logarithms, we define the posterior distribution as the sum of the logarithms of the likelihood function and the logarithms of the prior distribution. Remember, this function is our target function f(.). We’re going to sample it.

Proposed function

Finally, we define proposal distribution g (. | Xt). Since we will be using the Metropolis sampler, the proposed distribution must be symmetric and depends on the current state of the chain, so we will use a normal distribution whose average is equal to the parameter values in the current state.

Metropolis sampler

Finally, we write code that helps us execute the Metropolis sampler. In this case, since we are using logarithms, we must define the probability that candidate Y is accepted as.

# create an array to hold the chain value chain[1,] < -startvalue # define the chain startvalue for (I in 1:iterations){# generate Y Y < -proposalfunction (chain[I, PosteriorFunction(chain[I,])) accept or reject the decision criteria for Y if(runif(1) < probability) {chain[I +1, ] <- Y }else{ chain[i+1, ] <- chain[i, ]Copy the code

Due to the strong autocorrelation of MCMC chains, it may produce samples that are not representative of true underlying posterior distributions in the short term. So, in order to reduce the autocorrelation, we can just use each n value on the chain to dilute the sample. In this case, we will choose a value for our final chain every 20 iterations of the algorithm.

For (I in 1:10,000){if (I == 1){cfinal[I, ] < -chain [I *20,]} else {cfinal[I,] < -chain [I *20,] # burnIn < -5000Copy the code

Here, you can see the ACF diagram, which gives us the autocorrelation of any sequence with its lag value. In this case, we show the ACF diagram of the initial MCMC chain and the final chain after diluting a sample of two parameters. From the figure we can conclude that the program used can actually significantly reduce autocorrelation.

The results of

In this section, we introduce the chain generated by the Metropolis sampler and its distribution for the parameters β0 and β1. The true value of the parameter is indicated by the red line.

Comparison with GLM ()

Now we must compare the results obtained using Metropolis sampling with the GLM () function used to fit the generalized Linera model.

The following table lists the actual values of the parameters and the average of the estimated values obtained using the Metropolis sampler.

##       True value Mean MCMC       glm
## beta0  1.0578047 1.0769213 1.0769789
## beta1  0.8113144 0.8007347 0.8009269
Copy the code

conclusion

From the results, we can conclude that the estimates for the parameters β0 and β1 of the Poisson regression model obtained using the Metropolis sampler and the GLM () function are very similar and close to the actual values of the parameters. In addition, it must be recognized that the choice of the prior distribution, the proposed distribution and the initial value of the chain has a great influence on the results, so this choice must be made correctly.


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Hosmer-lemeshow goodness of fit test in R language regression

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7. Realize Logistic Logistic regression in R language

8. Python predicts stock prices using linear regression

9. How to calculate IDI and NRI indices for R language in survival analysis and Cox regression