Original link:tecdat.cn/?p=22702

Original source:Tuo End number according to the tribe public number

 

Abstract

Bayesian regression quantile has received much attention in recent literature. In this paper, we implement Bayesian coefficient estimation and variable selection in regression quantile (RQ), Bayesian with LASSO and adaptive LasSO penalty. Further modeling capabilities for summarizing results, plotting path plots, posterior histograms, autocorrelation plots, and plotting quantile plots are also included.

Introduction to the

Regression quantile (RQ), proposed by (Koenker and Gilbert, 1978), models the conditional quantile of the outcome of interest as a function of predictors. Since its introduction, quantile regression has been a topic of great concern in the theoretical circle and has been widely applied in many research fields, such as econometrics, marketing, medicine, ecology and survival analysis (Neelon et al., 2015; Davino et al., 2013; Hao and Naiman, 2007). Suppose we have an observation sample {(xi, yi); I = 1, 2, -, n}, where Yi represents the dependent variable and Xi represents the k-dimensional vector of the covariable.

The bayesianquantileReturn to the

Tobit RQ provides a method to describe the relationship between non-negative dependent variables and covariable vectors, which can be expressed as quantile regression models in which the data of dependent variables are not fully observed. There is a considerable literature on Tobit quantile regression models, we can refer to Powell (1986), Portnoy (2003), Portnoy and Lin (2010) and Kozumi and Kobayashi (2011) for an overview. Think about this model.

Where yi is the observed dependent variable, y∗ I is the corresponding potential unobserved dependent variable, and y 0 is a known point. It can be shown that the RQ coefficient vector β can be continuously estimated by the solution of the following minimization problem

Yu and Stander (2007) proposed a Bayesian approach to Tobit RQ, calculating the error using ALD and extracting β from a posterior distribution using the Metropolise-Hastings (MH) method.

Real data instance

Let’s consider using real data examples.

Immunoglobulin G data

This data set, which included serum concentrations of immunoglobulin G (G/l) in 298 children aged 6 months to 6 years, was discussed in detail by Isaacs et al. (1983) and also used by Yu et al. (2003). To illustrate, a Bayesian quantile regression model for this data set (which can be fitted below).

Rq (Serum concentration ~ age, Tau =0.5, runs=2000)Copy the code

The function provides estimates and 95% confidence intervals

 

Plot the data and then superimpose the five fitted RQ lines on the scatter plot.

R> for (I in 1:5) {+ taus=c(0.05, 0.25, 0.5, 0.75, 0.95) + rq(tau=taus[I],runs=500, burn=100) + abline(fit, Col = I) +} R > R > for (I in 1:5) {+ fit = rq (age + I ^ 2 (age), tau = taus [I], runs = 500, burn = 100) + curve (, add = TRUE) +}Copy the code

Figure 2: Scatter plot and RQ fitting of immunoglobulin G data.

The graph shows scatter plots of immunoglobulin G in 298 children aged 6 months to 6 years. Superimposed on the figure are {.05,.25,.50,.75,.95} RQ lines (left) and RQ lines (left) and RQ curves (right).

The graph can be used to evaluate the convergence of gibbs sampling to a stationary distribution. In FIG. 1, we only report the path diagram and posterior histogram of each parameter τ=0.50. We use the following code

 

The plot (fit, "tracehist", D = c (1, 2))Copy the code

The drawing result of Gibbs sampling can be summarized by generating path graph, posterior histogram and autocorrelation graph. Paths and histograms, paths and autocorrelation, histograms and autocorrelation, and paths, histograms and autocorrelation. This function also has one option. In Figure 3, the path diagram of the immunoglobulin G data coefficients shows that there are relatively few steps for sampling to jump from one remote region of posterior space to another. Furthermore, the histogram shows that the marginal density is actually the expected stationary univariate normal.

 

Figure 3: Coefficient path and density diagram of the immunoglobulin G data set with τ=0.50.

Prostate cancer data

In this section, we illustrate the performance of Bayesian quantile regression on a prostate cancer dataset (Stamey et al., 1989). This dataset examined the relationship between prostate-specific antigen (LPSA) levels and eight covariates in patients awaiting radical prostatectomy.

These covariables are: Logarithmic volume of cancer (LCAVOL), logarithmic weight of prostate (Lweight), age (age), logarithmic volume of benign prostate (LBPH), seminal vesicle invasion (SVI), logarithmic volume of capsule penetration (LCP), Gleason score (GLEason), and percentage of Gleason score 4 or 5 (PGG45).

In this section, we assume that the mean of the dependent variable (LPSA) is zero and that the predictors have been normalized with a mean of zero. To illustrate, we consider a Bayesian Lasso lasso RQ (method =”BLqr”) when τ=0.50. In this case, we use the following code


R> x=as.matrix(x)
R> rq(y~x,tau = 0.5, method="BLqr", runs = 5000, burn = 1000, thin = 1)
Copy the code

Modeling can be used to determine the active variables in regression.

The convergence of the corresponding Gibbs sample is assessed by the path diagram and marginal posterior histogram of the generated sample. Thus, graphs can be used to provide a graphical check on gibbs sampler convergence by checking path graphs and marginal posteriori histograms using the following code.

plot(fit, type="trace")
Copy the code

The results of the above code are shown in Figures 4 and 5, respectively. The path diagram in Figure 4 shows that the generated sample rapidly traverses the posterior space, and the marginal posterior histogram in Figure 5 shows that the conditional posterior distribution is actually the desired stationary univariate normal.

The data of wheat

Let’s consider a wheat data set. This data set comes from the National Wheat Planting Development Program (2017). The wheat data consisted of 584 observations of 11 variables. The dependent variable is the percentage increase in wheat yield per 2500 square meters. Covariate is fertilizer urea (U), wheat seed sowing date (Ds), wheat seed rate (Qs), laser flat field technology (LT), compound fertilizer NPK), drill technology (SMT), mung bean crops (SC), high (H), crop crop herbicides potash fertilizer (K), trace element fertilizer (ME).

The following commands give a posterior distribution of Tobit RQ at τ=0.50.

 

Rq (y ~ x, tau = 0.5, the methods = "Btqr)"Copy the code

Bayesian lassoTobit quantile regression and Bayesian adaptive lassoTobit quantile regression can also be fitted. τ=0.50, the function can be used to obtain a posterior mean of Tobit quantile regression and a 95% confidence interval.

conclusion

In this paper, we have described bayesian coefficient estimation and variable selection in quantile regression (RQ). In addition, we also implement bayesian Tobit quantile regression with LasSO and adaptive Lasso punishment. Further modeling of summary results, plotting path plots, posterior histograms, autocorrelation plots, and plotting quantitative plots is also included.

reference

Alhamzawi, R., K. Yu, And D. F. Benoit (2012). Bayesian Adaptive Lasso Quantile Regression. Statistical Modelling 12 (3), 279 — 297.

Brownlee, K. A. (1965). Statistical theory and methodology in science and engineering, Volume 150. Wiley New York.

Davino, C., M. Furno, and D. Vistocco (2013). Quantile regression: theory and applications. John Wiley & Sons.


Most welcome insight

1. Matlab uses Bayesian optimization for deep learning

2. Matlab Bayesian hidden Markov HMM model implementation

3. Simple Bayesian linear regression simulation of Gibbs sampling in R language

4. Block Gibbs Sampling Bayesian multiple linear regression in R language

5. Bayesian model of MCMC sampling for Stan probabilistic programming in R language

6.Python implements Bayesian linear regression model with PyMC3

7.R language uses Bayesian hierarchical model for spatial data analysis

8.R language random search variable selection SSVS estimation Bayesian vector autoregression (BVAR) model

9. Matlab Bayesian hidden Markov HMM model implementation