Generalized Additive Model (GAM)

 

·         The basic

·         Smoothing plot

·         Degree of freedom

·         Distribution and link function

·         Sample input window

·         Smooth conditioned on factors

·         Sample output

·         Explanation of output

 

 

The Basic

 

GAM (generalized additive model) maintains the additive nature of GLM, which is Y = g(β 0 + β1*X1 + ... + β m*Xm), but to replace the simple terms of the linear equation β i*Xi with fi(Xi) where fi is a non-parametric function of the predictor Xi.  Instead of a single coefficient for each variable (additive term) in the model, in additive models an unspecified (non-parametric) function is estimated for each predictor.

Smoothing plot

The main result of interest is how the predictors are related to the dependent variable. Scatter plots can be computed showing the smoothed predictor variable values plotted against the partial residuals, i.e., the residuals after removing the effect of all other predictor variables.  Below is a sample smoothing plot of diastolic blood pressure (DBP) versus body mass index (BMI).

 

This plot allows us to evaluate the nature of the relationship between the BMI with the residual (adjusted) DBP values.  

Many of the standard results statistics computed by Generalized Additive Models are similar to those customarily reported by linear or nonlinear model fitting procedures. For example, predicted and residual values for the final model can be computed, and various graphs of the residuals can be displayed to help the user identify possible outliers, etc. Refer also to the description of the residual statistics computed by Generalized Linear/Nonlinear Models for details.

Degrees of Freedom

GAM replaces the β i*Xi with fi(Xi), a cubic spline smoother for Xi. When estimating a single parameter value βi, we lose one degree of freedom. It is not clear how many degrees of freedom are lost due to estimating the cubic spline smoother for each variable.

Intuitively, a smoother can either be very smooth or less smooth. In the most extreme case, a simple line would be very smooth, and require us to estimate a single slope parameter, i.e., we would use one degree of freedom to fit the smoother (simple straight line); on the other hand, we could force a very "non-smooth" line to connect each actual data point, in which case we could "use-up" approximately as many degrees of freedom as there are points in the plot.

Generalized Additive Models allows you to specify the degrees of freedom for the cubic spline smoother; the fewer degrees of freedom you specify, the smoother is the cubic spline fit to the partial residuals, and typically, the worse is the overall fit of the model.

The default is to minimize the Generalized (Approximate) Cross Validation (GCV or GACV).  GAM attempts to find the appropriate smoothness for each applicable model term using a prediction error criteria Generalized (Approximate) Cross Validation (GCV or GACV).

Distributions and Link Functions

Like GLM, Generalized Additive Models allows you to choose from a wide variety of distributions for the dependent variable, and link functions for the effects of the predictor variables on the dependent variable

Following is a table of commonly used link functions and their.

Distribution

Name

Link Function

Mean Function

Normal

Identity

\mathbf{X}\boldsymbol{\beta}=\mu\,\!

\mu=\mathbf{X}\boldsymbol{\beta}\,\!

Exponential

Inverse

\mathbf{X}\boldsymbol{\beta}=\mu^{-1}\,\!

\mu=(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!

Gamma

Inverse
Gaussian

Inverse
squared

\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\!

\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\!

Poisson

Log

\mathbf{X}\boldsymbol{\beta}=\ln{(\mu)}\,\!

\mu=\exp{(\mathbf{X}\boldsymbol{\beta})}\,\!

Binomial

Logit

\mathbf{X}\boldsymbol{\beta}=\ln{\left(\frac{\mu}{1-\mu}\right)}\,\!

\mu=\frac{\exp{(\mathbf{X}\boldsymbol{\beta})}}{1 + \exp{(\mathbf{X}\boldsymbol{\beta})}} = \frac{1}{1 + \exp{(-\mathbf{X}\boldsymbol{\beta})}}\,\!

Multinomial

 

After you specify the dependent variable (Y), Empower will automatically check the variable type. If it is a continuous variable, Empower will use normal distribution and identity link function as the default; if it is a dichotomous variable, Empower will use binomial distribution and logit link function as the default. 

 

 

Below is the sample input window

 

 

Smooth conditioned on factors

 

In above example, separate smoother of BMI for male and female will be estimated while all other variables are same for male and female. This is different from stratified analysis, which not only gives different smoother of BMI but also gives different regression coefficient for other variables (SMOKE, OCCUPATION, …).

 

Below is the sample output of the above model:

 

Generalized additive model

 

Family: gaussian

Link function: identity

 

Formula:

BMI ~ s(AGE, k = 4, by = factor(SEX)) + SEX + SMOKE.NEW + OCCU +

    ALH + PSMK

 

Parametric coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  20.8405     0.4931  42.264   <2e-16 ***

SEX           0.4770     0.2713   1.758   0.0792 . 

SMOKE.NEW    -0.3457     0.2569  -1.345   0.1790   

OCCU          0.1326     0.1883   0.704   0.4815   

ALH           0.3554     0.2764   1.286   0.1990   

PSMK         -0.3284     0.2020  -1.626   0.1045   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Approximate significance of smooth terms:

                      edf Ref.df     F p-value 

s(AGE):factor(SEX)1 1.894  2.294 3.263  0.0325 *

s(AGE):factor(SEX)2 2.406  2.759 0.953  0.4092 

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

R-sq.(adj) =  0.0322   Deviance explained =  4.6%

GCV score = 5.0186  Scale est. = 4.9394    n = 653

 

Explanation of the output:

 

In the above output, the “Parametric coefficients” section lists the regression coefficient (the βs) and its significance test for non smoothing terms. 

 

For example:

 

The β for SMOKE (β 5 in the model) is -1.65419, which means smoker (SMOKE.NEW=1) has SBP 0.3457 mmhg lower than non-smoker (SMOKE.NEW=0), the p value is 0.1790, which is not significantly different from 0.

 

The “Approximate significance of smooth terms” lists smoother of AGE for male (SEX=1) and female (SEX=2).  The estimated degree of freedom for male is 1.894 and for female 2.406.  P-value is 0.0325 and 0.4092 respectively.

 

Besides the model output, Empower will also output following plots:

 

(1)   Smoothing plot for male and female.

 

(2)   The difference of smoothes compare male versus female.