Calculate statistics within each record

When we need to calculate statistics within each record?

Two common scenarios that we need to calculate statistics, such as mean, median, range, percent change, slope, etc. are:

·         For a study with repeated measurements, if the data was organized as each subject (participant) each record, and you want to calculate statistics for each subject over multiple measurements.

·         For a microarray data, the data was organized as (1) each probe set (gene) each record, (2) each sample each column, and you have 2 or more groups of sample, or a time series sample (samples collect at different time points), you want to compare samples or do profile analysis over each genes.

Example 1:

A microarray study was designed to examine if the archived Formalin-fixed Paraffin-embedded (FFPE) sample can generate similar results as frozen sample.  There were 3 patients (“SA”, “SD” and “SF”), each patient provided a FFPE sample and a frozen sample, total 6 samples, and each sample was technically repeated 3 times.

In the data file “expression.xls”, column title with “SA.”, “SD.” and “SF.” represents 3 patients respectively, column title with “.A”, “.B” represents two different samples (Frozen or FFPE) respectively, column title with “.1”,”.2” and “.3” represents 3 technical replicates respectively.

We can use this function to compare “SA” versus “SD” and versus “SF” with FFPE samples, then do same comparisons with frozen sample, then compare the results from FFPE samples versus frozen sample to determine if FFPE sample can generate similar results as frozen sample.

Below is the input window for doing comparisons (ANOVA) for frozen samples (“.A”)

·         Check “ANOVA/t-test” in “Complex statistics”.

·         Select variables “SA.A.1”,”SA.A.2”,…,”SF.A.3” from left panel and put to the variables list for calculation.

·         Assign the group indicator “1” to patient “SA” and 2 to “SD” and 3 to “SF” respectively.  To change the group indicator, highlight the variables (or group indicator) and then click “+” or “-“ button.

·         This function will output a set of new variables, assign a prefix for naming the new variable and name an output file. Click “Run”.

A new tab delimited text file “expression_vsta.xls” will be saved and following message will be prompted:

Analysis of variance/t-test within each record and create expression_vstat.xls succeed

Variables used:

[1] "SA.A.1" "SA.A.2" "SA.A.3" "SD.A.1" "SD.A.2" "SD.A.3" "SF.A.1" "SF.A.2"

[9] "SF.A.3"

Variables were classified to groups:

[1] 1 1 1 2 2 2 3 3 3

Levels: 1 2 3

New variables created in new data:

 [1] "A.BMS"           "A.WMS"           "A.P"             "A.N.1"         

 [5] "A.N.2"           "A.N.3"           "A.MEAN.1"        "A.MEAN.2"      

 [9] "A.MEAN.3"        "A.SD.1"          "A.SD.2"          "A.SD.3"        

[13] "A.DIFF.2.1"      "A.DIFF.3.1"      "A.DIFF.3.2"      "A.DIFF.LOW.2.1"

[17] "A.DIFF.LOW.3.1"  "A.DIFF.LOW.3.2"  "A.DIFF.UPP.2.1"  "A.DIFF.UPP.3.1"

[21] "A.DIFF.UPP.3.2"  "A.DIFF.PADJ.2.1" "A.DIFF.PADJ.3.1" "A.DIFF.PADJ.3.2"

Number of records in original data: 2000

Number of records in new data: 2000

 

Among the new variables:

A.BMS:  between group mean square from ANOVA

A.WMS: within group mean square from ANOVA

A.P:  p value from ANOVA

A.N.1, A.N.2, A.N.3:  sample size for group 1, 2, 3 respectively

A.MEAN.1, A.MEAN.2, A.MEAN.3:  mean value for group 1, 2, 3 respectively

A.SD.1, A.SD.2, A.SD.3:  standard deviation for group 1, 2, 3 respectively

A.DIFF.2.1:  mean difference compare group 2 versus group 1

A.DIFF.3.1:  mean difference compare group 3 versus group 1

A.DIFF.3.2:  mean difference compare group 3 versus group 2

A.DIFF.LOW.2.1:  low 95% confidence interval of the mean difference compare group 2 versus 1

A.DIFF.UPP.2.1:  high 95% confidence interval of the mean difference compare group 2 versus 1

The group here represents the patient. The 95% confidence intervals of mean difference had been adjusted for multiple comparisons.

Next step to continue the analysis will be reload the expression_vsta.xls and do similar analysis for FFPE samples (name the new output variables with different prefix such as “B”). After we got the results for both frozen sample and FFPE sample, we can then compare the consistence between the two ANOVA results (e.g. compare “A.P” versus “B.P” to see if both are significant or not).

Example 2

If the above 3 patients (“SA”,”SD”,”SF”) represents 3 different time points, we can compare profile from “.A” sample versus “.B” sample. The input window is as:

·         Check “Profile comparison” in “Complex statistics”.

·         Select variables “SA.A.1”,”SA.A.2”,…,”SF.A.3” from left panel and put to the “Group 1”, and “SA.B.1”,”SA.B.2”,…,”SF.B.3” to “Group 2”.

·         Assign time value. First select “Time” and click “Add”; then enter time value for each sample.

·         This function will output a set of new variables, assign a prefix for naming the new variable and name an output file. Click “Run”.

 

A new tab delimited text file “expression_vsta.xls” will be saved and following message will be prompted:

Profile comparison within each record and create expression_pfl.xls succeed

Group 1:

[1] "SA.A.1" "SA.A.2" "SA.A.3" "SD.A.1" "SD.A.2" "SD.A.3" "SF.A.1" "SF.A.2"

[9] "SF.A.3"

        

 Group 2:

[1] "SA.B.1" "SA.B.2" "SA.B.3" "SD.B.1" "SD.B.2" "SD.B.3" "SF.B.1" "SF.B.2"

[9] "SF.B.3"

             

 Time/Test ID:

[1]  1  1  1 12 12 12 48 48 48

                                  

 New variables created in new data:

 [1] "PFL.SLOPE.1"       "PFL.SLOPE.SE.1"    "PFL.SLOPE.P.1"   

 [4] "PFL.SLOPE.2"       "PFL.SLOPE.SE.2"    "PFL.SLOPE.P.2"   

 [7] "PFL.SLOPE.DIFF"    "PFL.SLOPE.DIFF.SE" "PFL.SLOPE.DIFF.P"

[10] "PFL.MEAN.1.1"      "PFL.MEAN.1.2"      "PFL.MEAN.1.3"    

[13] "PFL.SD.1.1"        "PFL.SD.1.2"        "PFL.SD.1.3"      

[16] "PFL.MEAN.2.1"      "PFL.MEAN.2.2"      "PFL.MEAN.2.3"    

[19] "PFL.SD.2.1"        "PFL.SD.2.2"        "PFL.SD.2.3"      

[22] "PFL.TTEST.P.1"     "PFL.TTEST.P.2"     "PFL.TTEST.P.3"   

[25] "PFL.EQUAL.P"       "PFL.PARALLEL.P"    "PFL.COINCIDE.P"  

[28] "PFL.FLAT.P"      

                                      

Number of records in original data: 2000

Number of records in new data: 2000

 

Among the new variables:

PFL.SLOPE.1, PFL.SLOPE.2:  Slope for group 1, 2 respectively

PFL.SLOPE.SE.1, PFL.SLOPE.SE.2:  Standard error of the slope for group 1, 2 respectively

PFL.SLOPE.P.1, PFL.SLOPE.P.2:  P value of the slope for group 1, 2 respectively

PFL.SLOPE.DIFF, PFL.SLOPE.DIFF, PFL.SLOPE.DIFF.P:  The difference of the slope, the standard error of the slope difference and its P value comparing the two groups.

PFL.MEAN.1.1, PFL.MEAN.1.2, PFL.MEAN.1.3:  The mean of 3 times points of group 1

PFL.SD.1.1, PFL.SD.1.2, PFL.SD.1.3:  The standard deviation of 3 times points of group 1

PFL.MEAN.2.1, PFL.MEAN.2.2, PFL.MEAN.2.3:  The mean of 3 times points of group 2

PFL.SD.2.1, PFL.SD.2.2, PFL.SD.2.3:  The standard deviation of 3 times points of group 2

PFL.TTEST.P.1, PFL.TTEST.P.2, PFL.TTEST.P.3:  The p value from t test comparing group 1 versus group 2 for 3 time points respectively. These p values were not adjusted for multiple comparisons.

PFL.EQUAL.P:  The p value of equal means from profile comparison

PFL.PARALLEL.P:  The p value of parallel test

PFL.COINCIDE.P: The p value of coincidence

PFL.FLAT.P: The p value of flatness

 

The group here represents 2 different sample process (frozen versus FFPE)