Content area
Full text
The 'correlation coefficient' was coined by Karl Pearson in 1896. Accordingly, this statistic is over a century old, and is still going strong. It is one of the most used statistics today, second to the mean. The correlation coefficient's weaknesses and warnings of misuse are well documented. As a 15-year practiced consulting statistician, who also teaches statisticians continuing and professional studies for the Database Marketing/Data Mining Industry, I see too often that the weaknesses and warnings are not heeded. Among the weaknesses, I have never seen the issue that the correlation coefficient interval [-1, +1] is restricted by the individual distributions of the two variables being correlated. The purpose of this article is (1) to introduce the effects the distributions of the two individual variables have on the correlation coefficient interval and (2) to provide a procedure for calculating an adjusted correlation coefficient , whose realised correlation coefficient interval is often shorter than the original one.
The implication for marketers is that now they have the adjusted correlation coefficient as a more reliable measure of the important 'key-drivers' of their marketing models. In turn, this allows the marketers to develop more effective targeted marketing strategies for their campaigns.
CORRELATION COEFFICIENT BASICS
The correlation coefficient, denoted by r , is a measure of the strength of the straight-line or linear relationship between two variables. The well-known correlation coefficient is often misused, because its linearity assumption is not tested. The correlation coefficient can - by definition, that is, theoretically - assume any value in the interval between +1 and -1, including the end values +1 or -1.
The following points are the accepted guidelines for interpreting the correlation coefficient:
0 indicates no linear relationship.
+1 indicates a perfect positive linear relationship - as one variable increases in its values, the other variable also increases in its values through an exact linear rule.
-1 indicates a perfect negative linear relationship - as one variable increases in its values, the other variable decreases in its values through an exact linear rule.
Values between 0 and 0.3 (0 and -0.3) indicate a weak positive (negative) linear relationship through a shaky linear rule.
Values between 0.3 and 0.7 (0.3 and -0.7) indicate a moderate positive (negative) linear relationship through a...





