Generalized Estimating Equations

 

The Basic

The generalized estimating equations (GEE), introduced by Liang and Zeger (1986), is a method of analyzing correlated data that otherwise could be modeled as a generalized linear model (GLM).

Correlated data can arise from longitudinal studies, in which subjects are measured at different points in time, or from clustering, in which measurements are taken on subjects who share a common characteristic such as belonging to the same litter.

Subject ID

Subject ID is the variable used for identifying the observations that comes from same subjects.

Type

This is for specifying the correlation structure. Common types of correlation structure are "independence", "exchangeable", "ar1", 'unstructured".

Distribution and Link Function

Following is a table of distribution and link functions commonly used in GLM.

Distribution

Name

Link Function

Mean Function

Normal

Identity

\mathbf{X}\boldsymbol{\beta}=\mu\,\!

\mu=\mathbf{X}\boldsymbol{\beta}\,\!

Exponential

Inverse

\mathbf{X}\boldsymbol{\beta}=\mu^{-1}\,\!

\mu=(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!

Gamma

Inverse
Gaussian

Inverse
squared

\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\!

\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\!

Poisson

Log

\mathbf{X}\boldsymbol{\beta}=\ln{(\mu)}\,\!

\mu=\exp{(\mathbf{X}\boldsymbol{\beta})}\,\!

Binomial

Logit

\mathbf{X}\boldsymbol{\beta}=\ln{\left(\frac{\mu}{1-\mu}\right)}\,\!

\mu=\frac{\exp{(\mathbf{X}\boldsymbol{\beta})}}{1 + \exp{(\mathbf{X}\boldsymbol{\beta})}} = \frac{1}{1 + \exp{(-\mathbf{X}\boldsymbol{\beta})}}\,\!

Multinomial

 

 

Below is the sample input window

 

SNAGHTML129a9e58

 

 

Below is the sample output and explanation of the above model:

 

(1)    First, output of the model formula

 

Call:

geeglm(formula = formula(tmp.formula), family = binomial(link = "logit"),

data = WD, na.action = na.omit, id = FMYID, corstr = "independence")

 

(2)    Next, output of the model regression coefficients

 

 Coefficients:

                  Estimate   Std.err   Wald Pr(>|W|)   

(Intercept)      -6.426883  1.244158 26.684 2.40e-07 ***

AGE               0.075278  0.008542 77.663  < 2e-16 ***

SEX              -0.343300  0.316332  1.178 0.277810   

BMI               0.109291  0.043422  6.335 0.011838 * 

factor(EDU.NEW)2  0.111972  0.233449  0.230 0.631482   

factor(EDU.NEW)3 -0.443752  0.274369  2.616 0.105803   

OCCU              0.730275  0.188301 15.041 0.000105 ***

SMOKE.NEW         0.099921  0.290639  0.118 0.730999   

PSMK              0.140254  0.216111  0.421 0.516344   

ALH              -0.285699  0.279560  1.044 0.306800   

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Estimated Scale Parameters:

            Estimate Std.err

(Intercept)    1.038  0.3267

 

Correlation: Structure = independenceNumber of clusters:   195   Maximum cluster size: 9

 

(3)    Next, output of the odds ratios and 95% confidence interval (calculated as eβ) if the distribution is binomial

 

 

                 Odds ratio Low 95%CI High 95%CI   P value

(Intercept)        0.001617 0.0001412    0.01853 2.396e-07

AGE                1.078184 1.0602831    1.09639 0.000e+00

SEX                0.709425 0.3816276    1.31878 2.778e-01

BMI                1.115487 1.0244785    1.21458 1.184e-02

factor(EDU.NEW)2   1.118481 0.7078042    1.76744 6.315e-01

factor(EDU.NEW)3   0.641625 0.3747427    1.09857 1.058e-01

OCCU               2.075652 1.4350566    3.00220 1.052e-04

SMOKE.NEW          1.105083 0.6251701    1.95340 7.310e-01

PSMK               1.150566 0.7532764    1.75739 5.163e-01

ALH                0.751489 0.4344661    1.29984 3.068e-01