Generalized Additive Model (GAM)


·         The basic

·         Smoothing plot

·         Degree of freedom

·         Distribution and link function

·         Sample input window

·         Smooth conditioned on factors

·         Sample output

·         Explanation of output



The Basic


GAM (generalized additive model) maintains the additive nature of GLM, which is Y = g(β 0 + β1*X1 + ... + β m*Xm), but to replace the simple terms of the linear equation β i*Xi with fi(Xi) where fi is a non-parametric function of the predictor Xi.  Instead of a single coefficient for each variable (additive term) in the model, in additive models an unspecified (non-parametric) function is estimated for each predictor.

Smoothing plot

The main result of interest is how the predictors are related to the dependent variable. Scatter plots can be computed showing the smoothed predictor variable values plotted against the partial residuals, i.e., the residuals after removing the effect of all other predictor variables.  Below is a sample smoothing plot of diastolic blood pressure (DBP) versus body mass index (BMI).


This plot allows us to evaluate the nature of the relationship between the BMI with the residual (adjusted) DBP values.  

Many of the standard results statistics computed by Generalized Additive Models are similar to those customarily reported by linear or nonlinear model fitting procedures. For example, predicted and residual values for the final model can be computed, and various graphs of the residuals can be displayed to help the user identify possible outliers, etc. Refer also to the description of the residual statistics computed by Generalized Linear/Nonlinear Models for details.

Degrees of Freedom

GAM replaces the β i*Xi with fi(Xi), a cubic spline smoother for Xi. When estimating a single parameter value βi, we lose one degree of freedom. It is not clear how many degrees of freedom are lost due to estimating the cubic spline smoother for each variable.

Intuitively, a smoother can either be very smooth or less smooth. In the most extreme case, a simple line would be very smooth, and require us to estimate a single slope parameter, i.e., we would use one degree of freedom to fit the smoother (simple straight line); on the other hand, we could force a very "non-smooth" line to connect each actual data point, in which case we could "use-up" approximately as many degrees of freedom as there are points in the plot.

Generalized Additive Models allows you to specify the degrees of freedom for the cubic spline smoother; the fewer degrees of freedom you specify, the smoother is the cubic spline fit to the partial residuals, and typically, the worse is the overall fit of the model.

The default is to minimize the Generalized (Approximate) Cross Validation (GCV or GACV).  GAM attempts to find the appropriate smoothness for each applicable model term using a prediction error criteria Generalized (Approximate) Cross Validation (GCV or GACV).

Distributions and Link Functions

Like GLM, Generalized Additive Models allows you to choose from a wide variety of distributions for the dependent variable, and link functions for the effects of the predictor variables on the dependent variable

Following is a table of commonly used link functions and their.



Link Function

Mean Function





















\mu=\frac{\exp{(\mathbf{X}\boldsymbol{\beta})}}{1 + \exp{(\mathbf{X}\boldsymbol{\beta})}} = \frac{1}{1 + \exp{(-\mathbf{X}\boldsymbol{\beta})}}\,\!



After you specify the dependent variable (Y), Empower will automatically check the variable type. If it is a continuous variable, Empower will use normal distribution and identity link function as the default; if it is a dichotomous variable, Empower will use binomial distribution and logit link function as the default. 



Below is the sample input window



Smooth conditioned on factors


In above example, separate smoother of BMI for male and female will be estimated while all other variables are same for male and female. This is different from stratified analysis, which not only gives different smoother of BMI but also gives different regression coefficient for other variables (SMOKE, OCCUPATION, …).


Below is the sample output of the above model:


Generalized additive model


Family: gaussian

Link function: identity



BMI ~ s(AGE, k = 4, by = factor(SEX)) + SEX + SMOKE.NEW + OCCU +

    ALH + PSMK


Parametric coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  20.8405     0.4931  42.264   <2e-16 ***

SEX           0.4770     0.2713   1.758   0.0792 . 

SMOKE.NEW    -0.3457     0.2569  -1.345   0.1790   

OCCU          0.1326     0.1883   0.704   0.4815   

ALH           0.3554     0.2764   1.286   0.1990   

PSMK         -0.3284     0.2020  -1.626   0.1045   


Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Approximate significance of smooth terms:

                      edf Ref.df     F p-value 

s(AGE):factor(SEX)1 1.894  2.294 3.263  0.0325 *

s(AGE):factor(SEX)2 2.406  2.759 0.953  0.4092 


Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


R-sq.(adj) =  0.0322   Deviance explained =  4.6%

GCV score = 5.0186  Scale est. = 4.9394    n = 653


Explanation of the output:


In the above output, the “Parametric coefficients” section lists the regression coefficient (the βs) and its significance test for non smoothing terms. 


For example:


The β for SMOKE (β 5 in the model) is -1.65419, which means smoker (SMOKE.NEW=1) has SBP 0.3457 mmhg lower than non-smoker (SMOKE.NEW=0), the p value is 0.1790, which is not significantly different from 0.


The “Approximate significance of smooth terms” lists smoother of AGE for male (SEX=1) and female (SEX=2).  The estimated degree of freedom for male is 1.894 and for female 2.406.  P-value is 0.0325 and 0.4092 respectively.


Besides the model output, Empower will also output following plots:


(1)   Smoothing plot for male and female.


(2)   The difference of smoothes compare male versus female.