Full text

Turn on search term navigation

© 2019 Meisner et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background

Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches.

Methodology/Principal findings

We conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation).

Conclusions/Significance

Automated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution.

Details

Title
The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
Author
Meisner, Julianne; Mooney, Stephen J; Rabinowitz, Peter M
First page
e0215982
Section
Research Article
Publication year
2019
Publication date
Apr 2019
Publisher
Public Library of Science
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2215677539
Copyright
© 2019 Meisner et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.