Customarily, if a data set includes any
categorical data, then one of the first steps in the data analysis is to
compute a frequency table for those categorical variables. For example, below
is a frequency show the number of persons with different genotype of a SNP.
SNP genotype

AA

AB

BB

Total

Frequency

N_{1}

N_{2}

N_{3}

N

The Chisquare test could be used to compare
the observed frequency against the theoretical distribution.
For example, if the SNP has a “B” allele
frequency 90%, the theoretical genotype frequency is AA = 0.01, AB=0.18,
BB=0.81. The expected number of people with AA genotype should be 0.01*N, for
AB: 0.18*N, and for BB: 0.81*N.
Then calculate the X^{2 }statistics.
The
value of the teststatistic is
where
O_{i} = an observed
frequency;
E_{i} = an expected
(theoretical) frequency, asserted by the null hypothesis;
n = the number of
cells in the table.
Χ^{2} = Pearson's
cumulative test statistic, which asymptotically approaches a Χ^{2} distribution.
If the Χ^{2}
value is high with very low probability, say only 5%, we say the results are
"statistically significant" at the ".05 or 5% level" and we
reject null hypothesis and accept the alternative one that the two variables
are related.
Below is the sample
input window
Below is the sample
output of the above model:
One
way frequency table and Chisquare test
n frequency
AA 33
0.05100464
AB
216 0.33384853
BB
398 0.61514683
Chisquared test for given probabilities
data: tmp.tb
Xsquared
= 224.1909, df = 2, pvalue
< 2.2e16
Empower also
output a graph as: