Pearson Product-Moment Correlation
Pearson Product-Moment Correlation is one of the measures of correlation which quantifies the strength as well as direction of such relationship. It is usually denoted by Greek letter ρ.
This article is a part of the guide:Discover 34 more articles on this topic
This coefficient is used if two conditions are satisfied
- the variables are in the interval or ratio scale of measurement
- a linear relationship between them is suspected
Positive and Negative Correlation
The coefficient (ρ) is computed as the ratio of covariance between the variables to the product of their standard deviations. This formulation is advantageous.
First, it tells us the direction of relationship. Once the coefficient is computed, ρ > 0 will indicate positive relationship, ρ < 0 will indicate negative relationship while ρ = 0 indicates non existence of any relationship.
Second, it ensures (mathematically) that the numerical value of ρ range from -1.0 to +1.0. This enables us to get an idea of the strength of relationship - or rather the strength of linear relationship between the variables. Closer the coefficients are to +1.0 or -1.0, greater is the strength of the linear relationship.
As a rule of thumb, the following guidelines are often useful (though many experts could somewhat disagree on the choice of boundaries).
Range of Ρ
|Value of ρ||Strength of relationship|
|-1.0 to -0.5 or 1.0 to 0.5||Strong|
|-0.5 to -0.3 or 0.3 to 0.5||Moderate|
|-0.3 to -0.1 or 0.1 to 0.3||Weak|
|-0.1 to 0.1||None or very weak|
Properties of Ρ
This measure of correlation has interesting properties, some of which are enunciated below:
- It is independent of the units of measurement. It is in fact unit free. For example, ρ between highest day temperature (in Centigrade) and rainfall per day (in mm) is not expressed either in terms of centigrade or mm.
- It is symmetric. This means that ρ between X and Y is exactly the same as ρ between Y and X.
- Pearson's correlation coefficient is independent of change in origin and scale. Thus ρ between temperature (in Centigrade) and rainfall (in mm) would numerically be equal to ρ between temperature (in Fahrenheit) and rainfall (in cm).
- If the variables are independent of each other, then one would obtain ρ = 0. However, the converse is not true. In other words ρ = 0 does not imply that the variables are independent - it only indicates the non existence of a non-linear relationship.
Caveats and Warnings
While ρ is a powerful tool, it is a much abused one and hence has to be handled carefully.
- People often tend to forget or gloss over the fact that ρ is a measure of linear relationship. Consequently a small value of ρ is often interpreted to mean non existence of relationship when actually it only indicates non existence of a linear relationship or at best a very weak linear relationship.
Under such circumstances it is possible that a non linear relationship exists.
A scatter diagram can reveal the same and one is well advised to observe the same before firmly concluding non existence of a relationship. If the scatter diagram points to a non linear relationship, an appropriate transformation can often attain linearity in which case ρ can be recomputed.
- One has to be careful in interpreting the value of ρ.
For example, one could compute ρ between size of a shoe and intelligence of individuals, heights and income. Irrespective of the value of ρ, such a correlation makes no sense and is hence termed chance or non-sense correlation.
- ρ should not be used to say anything about cause and effect relationship. Put differently, by examining the value of ρ, we could conclude that variables X and Y are related.
However the same value of ρ does not tell us if X influences Y or the other way round - a fact that is of grave import in regression analysis.