Generalized Additive Model (GAM)
·
Distribution and link function
·
Smooth
conditioned on factors
GAM (generalized additive
model) maintains the additive nature of GLM, which is Y = g(β_{
0} + β_{1}*X_{1} + ... + β_{ m}*X_{m}), but to replace the simple
terms of the linear equation β_{ i}*X_{i} with f_{i}(X_{i})
where f_{i} is a nonparametric function of the predictor X_{i}.
Instead of a
single coefficient for each variable (additive term) in the model, in additive
models an unspecified (nonparametric) function is estimated for each
predictor.
The main result of interest is how the predictors are
related to the dependent variable. Scatter plots can be computed showing the smoothed predictor variable
values plotted against the partial
residuals, i.e., the residuals after removing the effect of all other
predictor variables. Below is a sample smoothing plot of diastolic
blood pressure (DBP) versus body mass index (BMI).
This plot allows us to evaluate the nature of the relationship
between the BMI with the residual (adjusted) DBP values.
Many of the standard results statistics computed by Generalized
Additive Models are similar to those customarily reported by linear or
nonlinear model fitting procedures. For example, predicted and residual values for the final model can
be computed, and various graphs of the residuals can be displayed to help the
user identify possible outliers, etc. Refer
also to the description of the residual statistics computed by Generalized Linear/Nonlinear Models for details.
GAM replaces the β_{ i}*X_{i} with f_{i}(X_{i}), a cubic
spline smoother for X_{i}. When estimating a single
parameter value β_{i}, we lose one
degree of freedom. It is not clear how many degrees of freedom are lost due to
estimating the cubic spline smoother for each variable.
Intuitively, a smoother can either be very smooth or less
smooth. In the most extreme case, a simple line would be very smooth, and
require us to estimate a single slope parameter, i.e., we would use one degree
of freedom to fit the smoother (simple straight line); on the other hand, we
could force a very "nonsmooth" line to connect each actual data
point, in which case we could "useup" approximately as many degrees
of freedom as there are points in the plot.
Generalized Additive Models allows you to specify the
degrees of freedom for the cubic spline smoother; the fewer degrees of freedom
you specify, the smoother is the cubic spline fit to the partial residuals, and
typically, the worse is the overall fit of the model.
The default is to minimize the Generalized (Approximate) Cross Validation (GCV or GACV). GAM
attempts to find the appropriate smoothness for each applicable
model term using a prediction error criteria Generalized (Approximate) Cross
Validation (GCV or GACV).
Like GLM, Generalized
Additive Models allows you to choose from a wide variety of distributions
for the dependent variable, and link functions for the effects of the predictor variables on the dependent
variable
Following is a table of commonly used link functions and
their.
Distribution 
Name 
Link
Function 
Mean
Function 
Identity 






Inverse 









After you specify the dependent variable (Y), Empower
will automatically check the variable type. If it is a continuous variable,
Empower will use normal distribution and identity link function as the default;
if it is a dichotomous variable, Empower will use binomial
distribution and logit link function as the default.
Below is the sample input window
In above example, separate smoother of BMI for male
and female will be estimated while all other variables are same for male and
female. This is different from stratified analysis, which not only gives
different smoother of BMI but also gives different regression coefficient for
other variables (SMOKE, OCCUPATION, …).
Below is the sample output of the above model:
Generalized additive model
Family: gaussian
Link function: identity
Formula:
BMI ~ s(AGE,
k = 4, by = factor(SEX)) + SEX + SMOKE.NEW + OCCU +
ALH + PSMK
Parametric coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 20.8405
0.4931 42.264 <2e16 ***
SEX 0.4770 0.2713
1.758 0.0792 .
SMOKE.NEW 0.3457
0.2569 1.345 0.1790
OCCU 0.1326 0.1883
0.704 0.4815
ALH 0.3554 0.2764
1.286 0.1990
PSMK 0.3284 0.2020 1.626 0.1045

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
1
Approximate significance of smooth
terms:
edf
Ref.df F pvalue
s(AGE):factor(SEX)1 1.894 2.294 3.263
0.0325 *
s(AGE):factor(SEX)2 2.406 2.759 0.953
0.4092

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
1
Rsq.(adj)
= 0.0322 Deviance explained = 4.6%
GCV score = 5.0186 Scale est. = 4.9394 n = 653
In the above output, the “Parametric coefficients”
section lists the regression coefficient (the βs) and its significance
test for non smoothing terms.
For example:
The β for SMOKE (β_{ 5 }in the
model) is 1.65419, which means smoker (SMOKE.NEW=1) has SBP 0.3457 mmhg lower
than nonsmoker (SMOKE.NEW=0), the p value is 0.1790, which is not significantly different from 0.
The “Approximate significance of smooth terms” lists smoother of
AGE for male (SEX=1) and female (SEX=2).
The estimated degree of freedom for male is 1.894 and for female
2.406. Pvalue is 0.0325 and 0.4092
respectively.
Besides the model output, Empower
will also output following plots:
(1) Smoothing plot for male and female.
(2) The difference of smoothes compare male versus female.