Original link:tecdat.cn/?p=21545

Original source:Tuo End number according to the tribe public number

 

Example 1: Sampling using MCMC’s exponential distribution

The goal of any MCMC scheme is to generate samples from a “target” distribution. In this case, we will use an exponential distribution with an average value of 1 as our target distribution. So let’s start by defining the target density:

target = function(x){
  if(x<0){
    return(0)}
  else {
    return( exp(-x))
  }
}
Copy the code

Having defined the function, we can now use it to evaluate several values (just to illustrate the concept of a function) :

target(1)
Copy the code
[1] of 0.3678794Copy the code
target(-1)
Copy the code
[1] 0
Copy the code

Next, we plan out a Metropolis-Hastings scheme that samples from a distribution proportional to the target

X [1] = 3 # this is just a starting point, For (I in 2:1000){A = target(proposedx)/target(currentX) if(runif(1)<A){x[I] = proposedx # accept probability min (1, A)} else {x[I] = currentX #Copy the code

Note that x is an implementation of a Markov chain. We can draw some graphs of x:

 

 

 

 

We can wrap this in an MCMC function to make the code cleaner, making it easier to change the starting values and proposal distribution

for(i in 2:niter){ currentx = x[i-1] proposedx = rnorm(1,mean=currentx,sd=proposalsd) A = Target (proposedx)/target(currentX) if(runif(1)<A){x[I] = proposedx # min (1, A)} else {x[I] = currentX # Leave it as it is}Copy the code

Now we’ll run the MCMC scheme 3 times to see how similar the results are:

Z1 = MCMC (1000,3,1) z2 = MCMC (1000,3,1) = z3 MCMC (1000,3,1) plot (z1, type = "l")Copy the code

 

 

Hist (z1,breaks=seq(0,maxz,length=20)) par(mfcol=c(3,1)) #Copy the code

 

 

practice

Use the function easyMCMC to learn the following:

  1. How do different starting values affect MCMC schemes?
  2. What is the effect of a larger/smaller proposal standard deviation?
  3. Try changing the target function to
target = function(x){
  
  return((x>0 & x <1) + (x>2 & x<3))
}
Copy the code

What does this goal look like? What if the proposed SD is too small? (For example, try 1 and 0.1)

Example 2: Estimate allele frequency

When modeling biallelic loci genotypes, such as loci with BOTH AA and AA alleles, a standard assumption is that populations are “random”. That means if P is the frequency of the AA allele, then the genotypeandWill have frequencies separatelyand.

P a simple prior is to assume that it is uniform on [0,1]. Let’s say we sample n individuals and lookgenotype,genotypeandgenotype.

The R code below gives a short MCMC routine to sample from a posterior distribution of P. Try walking through the code to see how it works.

The prior = function (p) {if ((p < 0) | | (p > 1)) {# | | here means "or" return (0)} else {return (1)}} likelihood = function (p, nAA, nAA, naa){ return(p^(2*nAA) * (2*p*(1-p))^nAa * (1-p)^(2*naa)) } psampler = function(nAA, nAa, Naa){for(I in 2:niter){if(runif(1)<A){p[I] = newp # min (1, A)} else {p[I] = currentp #Copy the code

rightRun this sample.

Now use some R code to compare a posterior sample with a theoretical posterior sample (which can be obtained by analysis in this case; Because 121 As and 79 As were observed, the posterior sample of P was β (121+1,79+1) out of 200 samples.

Hist (z, Prob =T) lines(x,dbeta(x,122, 80)) # Superimpose the β density on the histogramCopy the code

You may also want to discard the first 5000 Z values as “burnin” (pre-burn period). Here’s one way to do it, just pick the last 5,000 z in R

hist(z[5001:10000])
Copy the code

 

 

practice

How the research starting point and proposal standard deviation affect the convergence of the algorithm.

Example 3: Estimate allele frequency and inbreeding number

A slightly more complicated alternative is to assume that people have a tendency to inbreed with other people who are more closely related than “random” (for example, this can happen in geographically structured populations). An easy way to do this is to introduce an additional parameter, the “inbreeding coefficient” F, and assumeandGenotypes have frequencies

and.

In most cases, it is natural to treat F as a population characteristic and therefore assume that F is constant at various loci.

Note that both f and p are constrained to be between 0 and 1 (inclusive). A simple prior for each of these two parameters is to assume that they are independent on [0,1]. Let’s say we sample n individuals and lookgenotype,genotypeandgenotype.

Practice:

  • Write a short MCMC program to sample the joint distribution of F and P.
sampler = function(){ f[1] = fstartval p[1] = pstartval for(i in 2:niter){ currentf = f[i-1] currentp = p[i-1] newf = Currentf + newp = currentp +} return(list(f=f,p=p)) # return "list" containing two elements named f and p}Copy the code
  • Use this sample to obtain point estimates of F and P (e.g., using a posterior mean) and interval estimates of f and P (e.g., 90% posterior confidence intervals), data:

 

Appendix: GIBBS sampling

You can also solve this problem with the Gibbs sampler

To do this, you will want to represent the model using the following “latent variables” :

Adding zi yields the same model as above:

 


Most welcome insight

1. Use R language for Metroplis-in-Gibbs sampling and MCMC operation

2. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

3.R language implements Metropolis — Hastings algorithm and Gibbs sampling in MCMC

4. Bayesian analysis markov chain Monte Carlo method (MCMC) sampling

5. Block Gibbs Sampling Bayesian multiple linear regression in R language

6. Simple Bayesian linear regression simulation analysis of Gibbs sampling in R language

7.R language uses Rcpp to speed up metropolis-Hastings sampling to estimate the parameters of bayesian logistic regression models

8.R language uses metropolis-Hasting sampling algorithm for logistic regression

9. Har-rv model in R language based on mixed data sampling (MIDAS) regression predicts GDP growth