How to read a paper: Statistics for the

Full text

Introduction

This article continues the checklist of questions that will help you to appraise the statistical validity of a paper. The first of this pair of articles was published last week. 1

Correlation, regression, and causation

Has correlation been distinguished from regression, and has the correlation coefficient (r value) been calculated and interpreted correctly?

For many non-statisticians, the terms "correlation" and "regression" are synonymous, and refer vaguely to a mental image of a scatter graph with dots sprinkled messily along a diagonal line sprouting from the intercept of the axes. You would be right in assuming that if two things are not correlated, it will be meaningless to attempt a regression. But regression and correlation are both precise statistical terms which serve quite different functions. 1

The r value (Pearson's product-moment correlation coefficient) is among the most overused statistical instrument. Strictly speaking, the r value is not valid unless the following criteria are fulfilled:

Summary points

An association between two variables is likely to be causal if it is strong, consistent, specific, plausible, follows a logical time sequence, and shows a dose-response gradient

A P value of <0.05 means that this result would have arisen by chance on less than one occasion in 20

The confidence interval around a result in a clinical trial indicates the limits within which the "real" difference between the treatments is likely to lie, and hence the strength of the inference that can be drawn from the result

A statistically significant result may not be clinically significant. The results of intervention trials should be expressed in terms of the likely benefit an individual could expect (for example, the absolute risk reduction)

The data (or, more accurately, the population from which the data are drawn) should be normally distributed. If they are not, non-itemmetric tests of correlation should be used instead. 1

The two datasets should be independent (one should not automatically vary with the other). If they are not, a paired t test or other paired test should be used.

Only a single pair of measurements should be made on each subject. If repeated measurements are made, analysis of variance should be used instead. 2

Every r value should be accompanied by a P value, which expresses how...

Show less

How to read a paper: Statistics for the non-statistician. II: "Significant" relations and their pitfalls

Full text

Suggested sources

How to read a paper: Statistics for the non-statistician. II: "Significant" relations and their pitfalls

Content area

Full text

Suggested sources