Cross Table and Chi-square Test (χ2 test)

 

Cross table, also known as contingency table, is most often used to analyze categorical data. A cross-tabulation is a two dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table.

Below is a 2*3 cross table (the row variable is gender, the column variable is smoking status)

 

Never smoke

Former smoker

Current smoker

Total

Male

N11

N12

N13

R1

Female

N21

N22

N23

R2

Total

C1

C2

C3

N

Chi-square test (χ2 test)

The Chi-square test is the primary method used for testing the independence of the two categorical variables.

The null hypothesis is the two variables are independent (have no relationship).  Under this hypothesis, the expected frequency for each cell can be calculated, for example, in the above table the expected frequency for N11 is

E11= R1*C1/N

Then compare the observed versus the expected, calculate the X2 statistics.

 The value of the test-statistic is

\Chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}

where

Oi = an observed frequency;

Ei = an expected (theoretical) frequency, asserted by the null hypothesis;

n = the number of cells in the table.

Χ2 = Pearson's cumulative test statistic, which asymptotically approaches a Χ2 distribution.

If the Χ2 value is high with very low probability, say only 5%, we say the results are "statistically significant" at the ".05 or 5% level" and we reject null hypothesis and accept the alternative one that the two variables are related.

Fisher’s exact test

 

Fisher’s exact test provides exact p-value.

 

Chi-square test is only an approximation because the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is inadequate when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. 

 

For small, sparse, or unbalanced data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.

 

Exact p-value becomes difficult to calculate with large samples or well-balanced tables. Fortunately these are exactly the conditions where the chi-square test is appropriate.

 

Empower will automatically conduct Fisher’s exact test if the sample size is small and/or the observed frequency are sparse or unbalanced. 

 

Below is the sample input window

 

 

 

Below is the sample output of the above model:

 

R*C frequency table and Chi-square test

 

 

   Cell Contents

|-------------------------|

|                       N |

|           N / Row Total |

|           N / Col Total |

|         N / Table Total |

|-------------------------|

 

 

Total Observations in Table:  642

 

 

             | HBP

        SNP1 |         0 |         1 | Row Total |

-------------|-----------|-----------|-----------|

           0 |        30 |         3 |        33 |

             |     0.909 |     0.091 |     0.051 |

             |     0.053 |     0.041 |           |

             |     0.047 |     0.005 |           |

-------------|-----------|-----------|-----------|

           1 |       185 |        28 |       213 |

             |     0.869 |     0.131 |     0.332 |

             |     0.325 |     0.384 |           |

             |     0.288 |     0.044 |           |

-------------|-----------|-----------|-----------|

           2 |       354 |        42 |       396 |

             |     0.894 |     0.106 |     0.617 |

             |     0.622 |     0.575 |           |

             |     0.551 |     0.065 |           |

-------------|-----------|-----------|-----------|

Column Total |       569 |        73 |       642 |

             |     0.886 |     0.114 |           |

-------------|-----------|-----------|-----------|

 

 

Statistics for All Table Factors

 

 

Pearson's Chi-squared test

------------------------------------------------------------

Chi^2 =  1.065719     d.f. =  2     p =  0.5869243

 

 

 

Fisher's Exact Test for Count Data

------------------------------------------------------------

Alternative hypothesis: two.sided

p =  0.5916893

 

 

In the above cross table, each cell have 4 numbers, the first is N (frequency), next is row percentage (N/Row total), next is column percentage (N/Col Total), the last one is the percentage (N/Table Total).

X2 is 1.065719, degree of freedom is 2, P value is 0.5869243

 

Fisher’s exact test was done for this example, the p value is 0.5916893.

 

Empower also output a graph as: