Content area
Full Text
Abstract
Correlation and linear regression are frequently used to evaluate the degree of linear association between two variables and also to find the empirical relationship. However, violations of assumptions often give results which are not valid. High value of correlation coefficient is taken as degree of linearity between two variables and attempt is made to fit linear regression equation. However, linearity implies high correlation but the converse is not true. The paper describes with examples that concept of linearity is different from correlations, effect of violation of assumptions of correlations and linear regressions and suggests procedures to improve correlation between two variables which can be extended to multi variables.
Keywords: Linearity; Correlation coefficient; Standard error; Normal distribution; Generalized inverse
Introduction
Correlations are often used in various fields of research. There are different kinds of correlations depending on nature of variables. Cause and effect relationship along with direction of the linear relationship between two variables, is reflected by Pearsonoian correlation, assumptions of which include: measurement on each variable is at least interval level, data on each variable follows normal distribution and has no outliers, etc. However, correlation does not always imply causation [1]. Correlation between two variables could be due to a third variable affecting both the variables under study viz. Item reliability in terms of item-total correlation. Variables may be correlated over time where data is longitudinal. For example, earth's temperature and levels of greenhouse gases are positively correlated. Estimating correlation between two such trending variables after removing the trend is desirable [2].
By definition, correlation between X and Y is the ratio of Cov(X,Y) and product of SD(X)and SD(Y). Thus, average of k-number of correlations Г =(I_(j=1)Ak raja )/k is meaningless for correlations with mixed signs and of same sample size (Field, 2003). However, computation of average inter-item correlations is used in psychological literature to reflect level of consistency of a test and is regarded as a quality of test as a whole. Correlation between two variables (r_XY) is high if the ratio of change in one variable (Y) due to unit change in the second variable(X) is constant for all values of X [3].
Interpretation and use of correlation is important for measurement by practitioners and researchers since simple correlations are used...