Content area

Abstract

The shift to sustainable energy systems necessitates scalable techniques for the valorization of organic waste via biogas production. This study presents a comprehensive data-driven framework encompassing statistical analysis, Explainable AI (XAI), clustering, and predictive modeling of methane yield to gain deeper operational insights into large-scale biogas production. By utilizing the operational data of a large-scale biodigester in Western cape province of South Africa including key biochemical and physicochemical variables such as temperature, pH, total solids (TS), volatile solids (VS), moisture content (MC), and FOS/TAC, key insights were derived through correlation mapping, scatter analysis, SHapley Additive exPlanations (SHAP)-based XAI for ranking digestion operational features, Principal component analysis (PCA) for addressing multicollinearity, and k-means cluster analysis to identify the operational clusters or groups which highlights critical shifts in system stability. Moreover, ensemble learning approaches, namely, XGBoost, Random Forest, as well as Support Vector Machine and Artificial Neural Network, were developed for methane yield prediction. The SHAP-based XAI identified FOS/TAC, volatile solids (VS), and moisture content (MC) as the most influential predictors of methane yield, while PCA explains 74% of the data variance in three Principal components (PCs), with PC1 dominated by VS, MC, and temperature as key drivers of methane yield. K-means clustering uncovered three distinct operational clusters, offering actionable guidance for feedstock management and process stabilization. Feedstock regression further established municipal solid waste (MSW) as the optimal input for maximizing methane output, with processed organic waste (POW) serving as an effective co-substrate. XGBoost achieved the best performance with an RMSE value of 1.18, followed by Random Forest (RMSE = 1.83), demonstrating the robustness of ensemble models in handling non-linear operational datasets. The research methodology is limited by its reliance on past operational data from a single digester and a lack of direct optimization experiments. However, the research strongly demonstrates the potential of data-driven approaches not only as powerful standalone tools but also as vital complements to experimental investigations. By transforming raw plant data into actionable intelligence, this study offers a scalable methodology for improving energy recovery, enhancing process control, and guiding sustainable development in industrial-scale bioenergy applications.

Full text

Turn on search term navigation

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.