Original link:http://tecdat.cn/?p=23099
In this paper, we discuss how to conduct Bayesian analysis in R. We introduce Bayesian analysis, an example of the number of goals scored in professional football.
model
First, we think that the number of goals scored in professional football comes from distributionWhere θ is the average number of goals scored. Now suppose we use the opinion of a football expert to get the average number of goals scored in a football match, parameter θ, we get:.
Curve (dnorm(x, 2.5, 0.2), from = -2, to = 8,...)
What do we want to know?
In this case, we want to know what the posterior distribution of θ looks like, and what the average value of this distribution is. To do this, we will analyze three scenarios:
We have one observation, x equals 1, which comes from the distributionThe overall.
We have 3 observed values x=c(1,3,5), derived from one withThe population of the distribution.
We have 10 observations x=c(5,4,3,4,3,2,7,2,4,5) from a sample withThe population of the distribution.
Theoretical method
Here, I want to tell you how Bayesian analysis is done. First, we have a likelihood function from a Poisson distribution of a population with an unknown parameter θ.
We know that the prior distribution of parameter θ P (θ) is given by the following formula.
Finally, the posterior distribution of θ is.
The calculation method of constant C is as follows.
And then check distribution E (theta | x) average value given by the following formula.
Calculation method
Here you will learn how to use Monte Carlo simulation in R to answer the questions posed above. For all three cases, you will follow these steps.
1. Define data
First, you need to define the data according to the schema.
X is less than minus 1 # 1
2. Calculate constant C
Now use the Monte Carlo simulation to compute the integral. To do this, it is necessary to generate N=10000 values θi from the prior distribution, and in the likelihood functionEvaluate them in. Finally, in order to get C, these values are averaged. The code in R is as follows.
Rnorm (N =N, mean = 2.5, sd = 0.2) # Prod (dpois(x=x, lambda = theta)) # likelihood function
3. Search for a posterior distribution
After calculating C, you can get the posterior distribution, as shown below.
fvero(theta) * dnorm(x=theta) / C
4. Calculate the mean of the posterior distribution
And finally you can use the Monte Carlo simulation to compute the integral to get the average of the posterior distribution.
integral <- mean(aux)
posterior <- integral/C
The results of
As mentioned earlier, the code described above is used in all three cases; the only thing that varies is X. In this section, we will show a graph for each case containing the prior and posterior distributions of θ, the mean of the posterior distributions (dotted blue line), and the observed values (pink dots).
The first case
Curve (dnorm(x, 2.5, 0.2), col=4,,x=x, y=rep(0, length(x)), line,v = mposterior,legend=c("topright", legend=c(" posterior "), "A priori"),
The second case
The third case
conclusion
From the results we can draw the conclusion that when we have very little observational data, as shown in Figure 1 and Figure 2, the posterior distribution will tend to be similar to the prior distribution due to lack of sample evidence. Conversely, when we have a large number of observed data, as shown in Figure 3, the posterior distribution will deviate from the prior distribution because the data will have a greater impact.
I hope you enjoyed this article and learned about Bayesian statistics. I encourage you to run this program with other distributions.
The most popular insight
1. Application case of multiple Logistic Logistic regression in R language
2. Implementation of Panel Smooth Transfer Regression (PSTR) analysis case
3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB
4. Case study of R language Poisson regression model
5. Hosmer-Lemeshow goodness of fit test in R language regression
6. Realization of Lasso regression, Ridge Ridge regression and Elastic Net model in R language
7. Logistic Logistic regression was realized in R language
8. Python uses linear regression to predict stock prices
9. How does R language calculate IDI and NRI indexes in survival analysis and Cox regression