Abstract

Harmful algal blooms are becoming increasingly prevalent due to climate warming and eutrophication. Leveraging machine learning tools to forecast algal blooms is crucial and promising for bloom management in various water systems. Notably, previous findings are site-specific, especially regarding the impacts of forecasting periods and important input features. However, there is a significant research gap in the application of machine learning for predicting algal blooms in the Great Lakes, the world’s largest freshwater system. Thus, based on the measurements of 16 water quality parameters from 2012 to 2022, the author established the extreme gradient boosting (XGBoost) model to forecast chlorophyll a (Chl a, a proxy for algal biomass) concentration for 1–7 d in Lake Erie. The XGBoost model performance is quite satisfactory, with the lowest MSE of 10.94 and the highest R2 of 0.99 for the 1 d forecast and an MSE of 83.90 and an R2 of 0.90 for the 7 d forecast. Once trained, the model takes only a few seconds to run on an Intel Core i7 personal laptop. Based on Shapley additive explanations (SHAP) feature importance, water depth (Depth) and water temperature (Temp), are more important input features for the 7 d forecasting model than the well-recognized phosphorus and nitrogen nutrients, including particulate organic nitrogen (PON), soluble reactive phosphorous (SRP), nitrate + nitrite (NN), and total phosphorous (TP). Achieving relatively high accuracy for the 7 d forecast, with an R2 of 0.83 and an MSE of 144.40, is possible by using only the top six most important input features: initial Chl a, Depth, Temp, PON, SRP, and NN, based on SHAP feature importance results. These findings highlight the accuracy and efficiency of the developed XGBoost model to predict Chl a in the world’s largest freshwater system. The model can enhance algal bloom monitoring efficiency through early detection and key predictive features, supporting an early warning system for timely interventions, while also informing policy decisions and optimizing resource allocation.

Details

Title
Forecasting short-term chlorophyll a concentration in Lake Erie using the machine learning XGBoost algorithm
Author
Yang, Song  VIAFID ORCID Logo 
First page
064029
Publication year
2025
Publication date
Jun 2025
Publisher
IOP Publishing
e-ISSN
17489326
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3205968985
Copyright
© 2025 The Author(s). Published by IOP Publishing Ltd. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.