Correlation Coefficients

Correlation coefficient is a measure of the strength of the linear relationship between two variables.

Pearson correlation coefficient:

If the variables fit normal distribution, select the method as “Pearson, Pearson product-moment correlation coefficient (ρ) will be calculated, which is calculated as the covariance of the variables divided by their standard deviations.

 \rho = \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i(y_i-\bar{y})^2}}.

If the variables did not fit normal distribution, two methods “Spearman (rank based)” or “Kendall (rank based)” can be applied.

Spearman rank based correlation coefficient:

The two variables will be ranked first. Tied values are assigned a rank equal to the average of their positions in the ascending order of the values. The correlation (ρ) will be calculated (same formula as above) based on ranked value after that.

Kendall rank correlation coefficient:

 

Kendall rank correlation coefficient, commonly referred to as Kendall's tau (τ) coefficient, measures the portion of ranks that match between two data sets.

Let (x1y1), (x2y2), …, (xnyn) be a set of joint observations from two random variables X and Y respectively, such that all the values of (xi) and (yi) are unique. Any pair of observations (xiyi) and (xjyj) are said to be concordant if the ranks for both elements agree: that is, if both xi > xj and yi > yj or if both xi < xj and yi < yj. They are said to be discordant, if xi > xj and yi < yj or if xi < xj and yi > yj. If xi = xj or yi = yj, the pair is neither concordant nor discordant.

The Kendall τ coefficient is defined as:

\tau = \frac{(\text{number of concordant pairs}) - (\text{number of discordant pairs})}{\frac{1}{2} n (n-1) } .

The denominator is the total number of pairs, so the coefficient must be in the range −1 ≤ τ ≤ 1.

·         If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1.

·         If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value −1.

·         If X and Y are independent, then we would expect the coefficient to be approximately zero.

 

Below is the sample input window

 

SNAGHTML14ac0ac1

 

In the above example, only one list of variables in “Selected variable(s)” box was given, no “With variables (Options)” list, Empower will calculate correlation for any possible pairs of 2 variables.  If “With variables” is given, Empower will calculate correlation between each of “Selected variables” with each of the “With variables”.

 

 

Below is the sample output and explanation of the above model:

 

Correlation

 

        Pearson's product-moment correlation

 

data:  AGE and SBP

t = 15.0454, df = 793, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 0.4153127 0.5236118

sample estimates:

      cor

0.4712365

 

 

        Pearson's product-moment correlation

 

data:  AGE and DBP

t = 9.7845, df = 793, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 0.2647198 0.3888680

sample estimates:

      cor

0.3282105

 

 

        Pearson's product-moment correlation

 

data:  BMI and SBP

t = 0.2441, df = 793, p-value = 0.8072

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 -0.06089938  0.07815404

sample estimates:

       cor

0.00866924

 

 

        Pearson's product-moment correlation

 

data:  BMI and DBP

t = 0.1835, df = 793, p-value = 0.8545

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 -0.06304556  0.07601237

sample estimates:

        cor

0.006514901

 

In the above output, Person’s correlation coefficient is calculated. The output includes the value (sample estimate) of correlation coefficient, its 95% confidence interval, and statistical significance test (p-value).  For example, the correlation between AGE and SBP is 0.4712365, 95% CI is 0.4153127 – 0.5236118, P value is <2.2e-16.

 

In addition, Empower also output scatter plots for each pair of variables: