One Way Frequency Table and Chi-square Test (χ2 test)

 

One way frequency table represents the simplest method for showing categorical data. They are often used as one of the exploratory procedures to review how different categories of values are distributed in the sample. For example, in survey research, frequency tables can show the number of males and females who participated in the survey, the number of respondents from particular ethnic and racial backgrounds, and so on.

Customarily, if a data set includes any categorical data, then one of the first steps in the data analysis is to compute a frequency table for those categorical variables. For example, below is a frequency show the number of persons with different genotype of a SNP.

SNP genotype

AA

AB

BB

Total

Frequency

N1

N2

N3

N

Chi-square test (χ2 test)

The Chi-square test could be used to compare the observed frequency against the theoretical distribution.

For example, if the SNP has a B allele frequency 90%, the theoretical genotype frequency is AA = 0.01, AB=0.18, BB=0.81. The expected number of people with AA genotype should be 0.01*N, for AB: 0.18*N, and for BB: 0.81*N.

Then calculate the X2 statistics.

The value of the test-statistic is

\Chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}

where

Oi = an observed frequency;

Ei = an expected (theoretical) frequency, asserted by the null hypothesis;

n = the number of cells in the table.

Χ2 = Pearson's cumulative test statistic, which asymptotically approaches a Χ2 distribution.

If the Χ2 value is high with very low probability, say only 5%, we say the results are "statistically significant" at the ".05 or 5% level" and we reject null hypothesis and accept the alternative one that the two variables are related.

Below is the sample input window

 

 

 

Below is the sample output of the above model:

 

One way frequency table and Chi-square test

n frequency

AA 33 0.05100464

AB 216 0.33384853

BB 398 0.61514683

 

Chi-squared test for given probabilities

 

data: tmp.tb

X-squared = 224.1909, df = 2, p-value < 2.2e-16

 

 

Empower also output a graph as: