**Calculate statistics within each record**

*When we need to calculate
statistics within each record?*

Two common scenarios that we need
to calculate statistics, such as mean, median, range, percent change, slope,
etc. are:

·
*For a study
with repeated measurements, if the data was organized as each subject
(participant) each record, and you want to calculate statistics for each
subject over multiple measurements.*

·
*For a
microarray data, the data was organized as (1) each probe set (gene) each
record, (2) each sample each column, and you have 2 or more groups of sample,
or a time series sample (samples collect at different time points), you want to
compare samples or do profile analysis over each genes.*

*Example 1:*

A microarray study was designed to examine if the archived Formalin-fixed Paraffin-embedded (*FFPE*) sample
can generate similar results as frozen sample.
There were 3 patients (“SA”, “SD” and “SF”), each patient provided a
FFPE sample and a frozen sample, total 6 samples, and each sample was
technically repeated 3 times.

*In the data file “expression.xls”, column title with “SA.”, “SD.” and
“SF.” represents 3 patients respectively, column title with “.A”, “.B”
represents two different samples (Frozen or FFPE) respectively, column title
with “.1”,”.2” and “.3” represents 3 technical replicates respectively.*

We can use this function to compare
“SA” versus “SD” and versus “SF” with FFPE samples, then do same comparisons
with frozen sample, then compare the results from FFPE samples versus frozen
sample to determine if FFPE sample can generate similar results as frozen
sample.

Below is the input window for doing
comparisons (ANOVA) for frozen samples (“.A”)

·
*Check
“ANOVA/t-test” in “Complex statistics”.*

·
*Select
variables “SA.A.1”,”SA.A.2”,…,”SF.A.3” from left panel and put to the variables
list for calculation.*

·
*Assign the
group indicator “1” to patient “SA” and 2 to “SD” and 3 to “SF”
respectively. To change the group
indicator, highlight the variables (or group indicator) and then click “+” or
“-“ button.*

·
*This
function will output a set of new variables, assign a prefix for naming the new
variable and name an output file. Click “Run”.*

*A new tab delimited text file
“expression_vsta.xls” will be saved and following message will be prompted:*

Analysis
of variance/t-test within each record and create expression_vstat.xls succeed

Variables
used:

[1]
"SA.A.1" "SA.A.2" "SA.A.3" "SD.A.1"
"SD.A.2" "SD.A.3" "SF.A.1" "SF.A.2"

[9]
"SF.A.3"

Variables
were classified to groups:

[1] 1 1 1 2 2
2 3 3 3

Levels: 1
2 3

New
variables created in new data:

[1] "A.BMS" "A.WMS" "A.P" "A.N.1"

[5] "A.N.2" "A.N.3" "A.MEAN.1" "A.MEAN.2"

[9] "A.MEAN.3" "A.SD.1" "A.SD.2" "A.SD.3"

[13]
"A.DIFF.2.1"
"A.DIFF.3.1"
"A.DIFF.3.2"
"A.DIFF.LOW.2.1"

[17]
"A.DIFF.LOW.3.1"
"A.DIFF.LOW.3.2"
"A.DIFF.UPP.2.1"
"A.DIFF.UPP.3.1"

[21]
"A.DIFF.UPP.3.2"
"A.DIFF.PADJ.2.1" "A.DIFF.PADJ.3.1"
"A.DIFF.PADJ.3.2"

Number of
records in original data: 2000

Number of
records in new data: 2000

Among the new variables:

A.BMS: between group mean square from ANOVA

A.WMS:
within group mean square from ANOVA

A.P: p value from ANOVA

A.N.1,
A.N.2, A.N.3: sample size for group 1,
2, 3 respectively

A.MEAN.1,
A.MEAN.2, A.MEAN.3: mean value for group
1, 2, 3 respectively

A.SD.1,
A.SD.2, A.SD.3: standard deviation for
group 1, 2, 3 respectively

A.DIFF.2.1: mean difference compare group 2 versus group
1

A.DIFF.3.1: mean difference compare group 3 versus group
1

A.DIFF.3.2: mean difference compare group 3 versus group
2

A.DIFF.LOW.2.1: low 95% confidence interval of the mean
difference compare group 2 versus 1

A.DIFF.UPP.2.1: high 95% confidence interval of the mean
difference compare group 2 versus 1

…

*The group here represents the
patient. The 95% confidence intervals of mean difference had been adjusted for
multiple comparisons.*

*Next step to continue the analysis will be reload the
expression_vsta.xls and do similar analysis for FFPE samples (name the new
output variables with different prefix such as “B”). After we got the results
for both frozen sample and FFPE sample, we can then compare the consistence
between the two ANOVA results (e.g. compare “A.P” versus “B.P” to see if both
are significant or not).*

*Example 2*

If the above 3 patients
(“SA”,”SD”,”SF”) represents 3 different time points, we can compare profile
from “.A” sample versus “.B” sample. The input window is as:

·
*Check
“Profile comparison” in “Complex statistics”.*

·
*Select
variables “SA.A.1”,”SA.A.2”,…,”SF.A.3” from left panel and put to the “Group
1”, and “SA.B.1”,”SA.B.2”,…,”SF.B.3” to “Group 2”.*

·
*Assign
time value. First select “Time” and click “Add”; then enter time value for each
sample.*

·
*This
function will output a set of new variables, assign a prefix for naming the new
variable and name an output file. Click “Run”.*

*A new tab delimited text file
“expression_vsta.xls” will be saved and following message will be prompted:*

Profile
comparison within each record and create expression_pfl.xls succeed

Group 1:

[1]
"SA.A.1" "SA.A.2" "SA.A.3" "SD.A.1"
"SD.A.2" "SD.A.3" "SF.A.1" "SF.A.2"

[9]
"SF.A.3"

Group 2:

[1]
"SA.B.1" "SA.B.2" "SA.B.3" "SD.B.1"
"SD.B.2" "SD.B.3" "SF.B.1" "SF.B.2"

[9]
"SF.B.3"

Time/Test ID:

[1] 1 1 1 12 12 12 48 48
48

New variables created in new data:

[1] "PFL.SLOPE.1" "PFL.SLOPE.SE.1" "PFL.SLOPE.P.1"

[4] "PFL.SLOPE.2" "PFL.SLOPE.SE.2" "PFL.SLOPE.P.2"

[7] "PFL.SLOPE.DIFF" "PFL.SLOPE.DIFF.SE"
"PFL.SLOPE.DIFF.P"

[10]
"PFL.MEAN.1.1"
"PFL.MEAN.1.2"
"PFL.MEAN.1.3"

[13]
"PFL.SD.1.1"
"PFL.SD.1.2"
"PFL.SD.1.3"

[16]
"PFL.MEAN.2.1"
"PFL.MEAN.2.2"
"PFL.MEAN.2.3"

[19]
"PFL.SD.2.1"
"PFL.SD.2.2"
"PFL.SD.2.3"

[22]
"PFL.TTEST.P.1" "PFL.TTEST.P.2" "PFL.TTEST.P.3"

[25]
"PFL.EQUAL.P"
"PFL.PARALLEL.P"
"PFL.COINCIDE.P"

[28]
"PFL.FLAT.P"

Number of
records in original data: 2000

Number of
records in new data: 2000

Among the new variables:

PFL.SLOPE.1,
PFL.SLOPE.2: Slope for group 1, 2
respectively

PFL.SLOPE.SE.1,
PFL.SLOPE.SE.2: Standard error of the
slope for group 1, 2 respectively

PFL.SLOPE.P.1,
PFL.SLOPE.P.2: P value of the slope for
group 1, 2 respectively

PFL.SLOPE.DIFF,
PFL.SLOPE.DIFF, PFL.SLOPE.DIFF.P: The
difference of the slope, the standard error of the slope difference and its P
value comparing the two groups.

PFL.MEAN.1.1,
PFL.MEAN.1.2, PFL.MEAN.1.3: The mean of
3 times points of group 1

PFL.SD.1.1,
PFL.SD.1.2, PFL.SD.1.3: The standard
deviation of 3 times points of group 1

PFL.MEAN.2.1,
PFL.MEAN.2.2, PFL.MEAN.2.3: The mean of
3 times points of group 2

PFL.SD.2.1,
PFL.SD.2.2, PFL.SD.2.3: The standard
deviation of 3 times points of group 2

PFL.TTEST.P.1,
PFL.TTEST.P.2, PFL.TTEST.P.3: The p
value from t test comparing group 1 versus group 2 for 3 time points
respectively. *These p values were not
adjusted for multiple comparisons*.

PFL.EQUAL.P: The p value of equal means from profile
comparison

PFL.PARALLEL.P: The p value of parallel test

PFL.COINCIDE.P:
The p value of coincidence

PFL.FLAT.P:
The p value of flatness

*The group here represents 2
different sample process (frozen versus FFPE)*