Original link:tecdat.cn/?p=23509 

Original source:Tuo End number according to the tribe public number

We use generalized additive models (GAMs) in our research work. The MGCV software package is an excellent set of software for specifying, fitting, and visualizing GAMs for very large data sets.

This article introduces the current capabilities of generalized Additive models (GAMs).

We need to load MGCV

library('mgcv')
Copy the code

Popular example data set

The data in DAT have been well studied in GAM related studies and contain a number of covariables — labeled X0 to x3– that have a non-linear relationship with the dependent variable to varying degrees.

We want to try to fit these relationships by using splines to approximate the real relationships between covariables and dependent variables. To fit an additive model, we use

 gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), dat,  "REML")
Copy the code

MGCV provides a summary() method to extract information about fitting GAM.

The check() function checks whether each smooth function in the model uses a sufficient number of base functions. You may not use check() directly — it outputs other diagnostics as well as four model diagnostics.

Drawing smoothfunctionfigure

To visualize the estimated GAMs, MGCV provides plot.gam() methods and vis.gam() functions to generate ggplot2-like graphs from objects. To visualize the four estimated smooth functions in the GAM model, we will use

plot(mod)
Copy the code

The result is to draw every smooth function in mod GAM.

Use the plot function to draw multiple panels on a plotting device and arrange each plot in a row.

Extract the smoothfunctiondata

For handling the basic smoothing functions represented in the mod, you can use the smooth() function if you want to extract most of the data used to build the graph.

smooth(mod, "x1")
Copy the code

Diagnosis of figure

Diagnostic graph generated by check()

check(mod)
Copy the code

The result is an array of four diagnostic graphs, including q-Q plots of model residials (upper left) and histogram plots (lower left), residials versus linear predictors plots (upper right), and observed versus fit plots.

Each of these four diagrams is generated through user-accessible functions that implement a particular diagram. For example, qqplot(mod) produces the q-Q plot at the top left of the image above.

qqplot(mod)
Copy the code

The result of qqplot(mod) is a q-Q plot of residials in which the reference values are obtained by simulating the model data.

It can also handle many of the more specialized smoothing functions currently available. For example, a two-dimensional smooth function.


plot(mod)
Copy the code

The default way to plot two-dimensional smooth functions is using plot().

The terms interacting with the factor smoothing function, which corresponds to the random slope and intercept of a smooth curve, are drawn on a panel, and colors are used to distinguish different random smoothing functions.

F1 < function(x, a=2, x, x, x) B = 1) exp (a *) x + f2 b < - function (x) 0.2 * x ^ 11 * * (1 -) x (10) ^ 6 + 10 * 10 * (x) ^ 3 * (1 -) x 10 f ^ < - f0 (x0) + f1 (x1, a[fac], b[fac]) + f2(x2) fac <- factor(fac) y <- f + rnorm(n) * 2 plot(mod)Copy the code

The result of more complex GAM with factor-smooth function interaction terms, bs = ‘fs’.

What else can you do?

It can handle most smooth functions that MGCV can estimate, including variable smooth functions with factors and continuous auxiliary variables, random effect smooth functions (BS = ‘re’), two-dimensional tensor product smooth functions, and models with parametric terms.

reference

Augustin, N. H., Sauleau, E.-A., and Wood, S. N. (2012). On quantile quantile plots for generalized linear models. Computational statistics & data analysis 56, 2404-2409 doi: 10.1016 / j.carol carroll sda. 2012.01.026.


Most welcome insight

1.R language multiple Logistic Logistic regression application case

2. Panel smooth transfer regression (PSTR) analysis case implementation

3. Partial least squares regression (PLSR) and principal component regression (PCR) in MATLAB

4.R language Poisson regression model analysis cases

5. Analysis of mixed effects of R language in lung cancer by Logistic model

6. Implementation of LASSO regression, Ridge regression and Elastic Net model in R language

7.R language logistic regression, Naive Bayes Bayes, decision tree, random forest algorithm to predict heart disease

8. Python predicts stock prices using linear regression

9.R language uses logistic regression, decision tree and random forest to classify and predict credit data sets