Abstract

Background

Stepwise regression is a popular data-mining tool that uses statistical significance to select the explanatory variables to be used in a multiple-regression model.

Findings

A fundamental problem with stepwise regression is that some real explanatory variables that have causal effects on the dependent variable may happen to not be statistically significant, while nuisance variables may be coincidentally significant. As a result, the model may fit the data well in-sample, but do poorly out-of-sample.

Conclusion

Many Big-Data researchers believe that, the larger the number of possible explanatory variables, the more useful is stepwise regression for selecting explanatory variables. The reality is that stepwise regression is less effective the larger the number of potential explanatory variables. Stepwise regression does not solve the Big-Data problem of too many explanatory variables. Big Data exacerbates the failings of stepwise regression.

Details

Title
Step away from stepwise
Author
Smith, Gary 1   VIAFID ORCID Logo 

 Department of Economics, Pomona College, Claremont, CA, USA 
Pages
1-12
Publication year
2018
Publication date
Sep 2018
Publisher
Springer Nature B.V.
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2104558426
Copyright
Journal of Big Data is a copyright of Springer, (2018). All Rights Reserved., © 2018. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.