Multivariate regression using GEE (generalize estimate equation)

 

In many studies, information on outcomes and/or risk factors is obtained from multiple sources or multiple measurements.  For example, respiratory symptoms include cough, phlegm, wheeze, shortness of breath.  Evaluation of children’s behavior could be obtained from parents and teachers. 

For multiple measurements of outcomes, one approach is to model each outcome separately; for multiple sources of risk factors, the approach is to model outcome with each source of risk factor separately.

Using GEE (generalize estimate equation), we can model multiple measurements or multiple sources of outcomes and/or risk factors simultaneously, and adjusting the correlation of multiple measurements and/or multiple sources of risk factors. 

In the example below, four respiratory symptoms: cough, phlegm, wheeze and shortness of breath were treated as multiple measurements.  The risk factor is SNP marker 1.

Screen shot of a sample input window

ScreenHunter_01 Jan. 03 18.51.gif

The detailed explanation of the sample output of this module is as below:

Multivariate regression using GEE

                     

 Univariate regression

 

Call:

geeglm(formula = tmp.y ~ +SNP1.1.COUGH + SNP1.1.PHLEGM + SNP1.1.WHEEZE +

    SNP1.1.SOB + SNP1.2.COUGH + SNP1.2.PHLEGM + SNP1.2.WHEEZE +

    SNP1.2.SOB + factor(tmp.yidx) + ALH + PSMK + SMOKE.NEW +

    OCCU + SEX + AGE, family = binomial(link = "logit"), data = WD,

    id = tmp.id, corstr = "independence")

 

 Coefficients:

                   Estimate   Std.err    Wald Pr(>|W|)   

(Intercept)       -3.445283  0.293670 137.636  < 2e-16 ***

SNP1.1.COUGH      -0.333075  0.239765   1.930   0.1648   

SNP1.1.PHLEGM      0.129920  0.201255   0.417   0.5186   

SNP1.1.WHEEZE     -0.162856  0.240977   0.457   0.4992   

SNP1.1.SOB        -0.012223  0.174548   0.005   0.9442   

SNP1.2.COUGH       0.514358  0.447267   1.323   0.2501   

SNP1.2.PHLEGM      0.261780  0.422261   0.384   0.5353   

SNP1.2.WHEEZE     -0.258877  0.576407   0.202   0.6533    

SNP1.2.SOB        -0.147673  0.376402   0.154   0.6948   

factor(tmp.yidx)2  0.252578  0.177718   2.020   0.1553   

factor(tmp.yidx)3 -0.088223  0.188543   0.219   0.6398   

factor(tmp.yidx)4  0.973951  0.165942  34.448 4.38e-09 ***

ALH               -0.120524  0.141391   0.727   0.3940   

PSMK               0.103992  0.105911   0.964   0.3262   

SMOKE.NEW          0.330857  0.129651   6.512   0.0107 * 

OCCU               0.230644  0.097065   5.646   0.0175 * 

SEX               -0.046315  0.133847   0.120   0.7293   

AGE                0.034453  0.003316 107.959  < 2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

Estimated Scale Parameters:

            Estimate Std.err

(Intercept)   0.9872 0.08221

 

Correlation: Structure = independenceNumber of clusters:   3224   Maximum cluster size: 1

 

Explanation:

The module first conducts univariate regression (call univariate model), which is to model each outcome (symptom) separately.  Though it is in just one model (formula), the regression coefficients for cough, phlegm, wheeze and sob were separately estimated. 

The variable tmp.yidx was used to indicate the symptoms, so in the formula the factor(tmp.yidx)2, factor(tmp.yidx)3 and factor(tmp.yidx)4 represents intercept difference of phlegm, wheeze and sob comparing to cough.

 

Multivariate regression using GEE:
 
Call:
geeglm(formula = tmp.y ~ SNP1.1 + SNP1.2 + factor(tmp.yidx) + 
    ALH + PSMK + SMOKE.NEW + OCCU + SEX + AGE, family = binomial(link = "logit"), 
    data = WD, id = tmp.id, corstr = "independence")
 
 Coefficients:
                  Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)       -3.49742  0.28530 150.27  < 2e-16 ***
SNP1.1            -0.06419  0.10375   0.38   0.5361    
SNP1.2             0.09665  0.22336   0.19   0.6652    
factor(tmp.yidx)2  0.37831  0.14115   7.18   0.0074 ** 
factor(tmp.yidx)3 -0.08131  0.15236   0.28   0.5936    
factor(tmp.yidx)4  1.03297  0.13259  60.70  6.7e-15 ***
ALH               -0.12113  0.14116   0.74   0.3908    
PSMK               0.10406  0.10587   0.97   0.3257    
SMOKE.NEW          0.33080  0.12926   6.55   0.0105 *  
OCCU               0.23105  0.09698   5.68   0.0172 *  
SEX               -0.04650  0.13364   0.12   0.7279    
AGE                0.03442  0.00331 108.27  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 
Estimated Scale Parameters:
            Estimate Std.err
(Intercept)     0.99  0.0823
 
Correlation: Structure = independenceNumber of clusters:   3224   Maximum cluster size: 1 

 

Explanation:

Then the module conducts multivariate regression (call multivariate model), which treats 4 symptoms as 4 measurement, and estimates the overall regression coefficient for SNP1=1 and SNP1=2.  Again the tmp.yidx represents the intercepts difference.

 

Test for the significant difference of SNP1 regression coefficients for different outcomes
Analysis of 'Wald statistic' Table
 
Model 1 tmp.y ~ +SNP1.1.COUGH + SNP1.1.PHLEGM + SNP1.1.WHEEZE + SNP1.1.SOB + SNP1.2.COUGH + SNP1.2.PHLEGM + SNP1.2.WHEEZE + SNP1.2.SOB + factor(tmp.yidx) + ALH + PSMK + SMOKE.NEW + OCCU + SEX + AGE 
Model 2 tmp.y ~ SNP1.1 + SNP1.2 + factor(tmp.yidx) + ALH + PSMK + SMOKE.NEW + OCCU + SEX + AGE
 
  Df  X2 P(>|Chi|)
1  6 4.6       0.6
                                                                                              
 Mulivariate regression coefficient is appropriate for representing the overall effect of SNP1

 

Explanation:

Finally, the module compares the univariate model to the multivariate model using log-likelihood ratio test, which is equivalent to testing the interactions of SNP1 with different symptoms.  If there is no interaction, it means that the regression coefficients of SNP1 to different symptoms are not significantly different, and the mulitivariate regression coefficient is appropriate for representing the overall effect of SNP1.