Full Text

Turn on search term navigation

© 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Species distribution modeling often involves high‐dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys (Meleagris gallopavo silvestris) across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife.

Details

Title
Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
Author
Farrell, Annie 1 ; Wang, Guiming 1   VIAFID ORCID Logo  ; Rush, Scott A 1 ; Martin, James A 2 ; Belant, Jerrold L 3 ; Butler, Adam B 4 ; Godwin, Dave 5 

 Department of Wildlife, Fisheries and Aquaculture, Mississippi State University, Mississippi State, Mississippi 
 Warnell School of Forestry and Natural Resources and Savannah River Ecology Laboratory, University of Georgia, Athens, Georgia 
 Camp Fire Program in Wildlife Conservation, State University of New York College of Environmental Science and Forestry, Syracuse, New York 
 The Mississippi Department of Wildlife, Fisheries, and Parks, Jackson, Mississippi 
 Mississippi Forestry Association, Jackson, Mississippi 
Pages
5938-5949
Section
ORIGINAL RESEARCH
Publication year
2019
Publication date
May 2019
Publisher
John Wiley & Sons, Inc.
e-ISSN
20457758
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2250320927
Copyright
© 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.