Spearman Rank Correlation Coefficient

Spearman Rank Correlation Coefficient is a non-parametric measure of correlation, using ranks to calculate the correlation.

Spearman Rank Correlation Coefficient uses ranks to calculate correlation.

Whenever we are interested to know if two variables are related to each other, we use a statistical technique known as correlation [3]. If the change in one variable brings about a change in the other variable, they are said to be correlated.

A well known measure of correlation is the Pearson product moment correlation coefficient which can be calculated if the data is in interval/ ratio scale.

It is also known as the "spearman rho" or "spearman r correlation".

The Spearman Rank Correlation Coefficient is its analogue when the data is in terms of ranks. One can therefore also call it correlation coefficient between the ranks. The correlation coefficient is sometimes denoted by rs.

Example

As an example, let us consider a musical (solo vocal) talent contest where 10 competitors are evaluated by two judges, A and B. Usually judges award numerical scores for each contestant after his/her performance.

A product moment correlation coefficient [4] of scores by the two judges hardly makes sense here as we are not interested in examining the existence or otherwise of a linear relationship [5] between the scores.

What makes more sense is correlation between ranks of contestants as judged by the two judges. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's views as far as talent of the contestants are concerned (though they might award different numerical scores) - in other words if the judges are unanimous.

Interpretation of Numerical Values

The numerical value of the correlation coefficient, rs, ranges between -1 and +1. The correlation coefficient is the number indicating the how the scores are relating.

r_s= correlation coefficient

In general,

r_s > 0 implies positive agreement among ranks
r_s < 0 implies negative agreement (or agreement in the reverse direction)
r_s = 0 implies no agreement

Closer r_s is to 1, better is the agreement while rs closer to -1 indicates strong agreement in the reverse direction.

Assigning Ranks

In order to compute Spearman Rank Correlation Coefficient, it is necessary that the data be ranked. There are a few issues here.

Suppose that scores of the judges (out of 10 were as follows):

Contestant No.	1	2	3	4	5	6	7	8	9	10
Score by Judge A	5	9	3	8	6	7	4	8	4	6
Score by Judge B	7	8	6	7	8	5	10	6	5	8

Ranks are assigned separately for the two judges either starting from the highest or from the lowest score. Here, the highest score given by Judge A is 9.

If we begin from the highest score, we assign rank 1 to contestant 2 corresponding to the score of 9.

The second highest score is 8 but two competitors have been awarded the score of 8. In this case both the competitors are assigned a common rank which is the arithmetic mean [6] of ranks 2 and 3. In this way, scores of Judge A can be converted into ranks.

Similarly, ranks are assigned to the scores awarded by Judge B and then difference between ranks for each contestant are used to evaluate rs. For the above example, ranks are as follows.

Contestant No.	1	2	3	4	5	6	7	8	9	10
Ranks of scores by Judge A	7	1	10	2.5	5.5	4	8.5	2.5	8.5	5.5
Ranks of scores by Judge B	5.5	3	7.5	5.5	3	9.5	1	7.5	9.5	3

Spearman Rank Correlation Coefficient is a non-parametric measure of correlation.

Spearman Rank Correlation Coefficient tries to assess the relationship between ranks without making any assumptions about the nature of their relationship.

Hence it is a non-parametric measure [7] - a feature which has contributed to its popularity and wide spread use.

Advantages and Caveats

Other measures of correlation are parametric in the sense of being based on possible relationship of a parameterized form, such as a linear relationship [5].

Another advantage with this measure is that it is much easier to use since it does not matter which way we rank the data, ascending or descending. We may assign rank 1 to the smallest value or the largest value, provided we do the same thing for both sets of data.

The only requirement is that data should be ranked or at least converted into ranks.