Content area

Abstract

The shift to sustainable energy systems necessitates scalable techniques for the valorization of organic waste via biogas production. This study presents a comprehensive data-driven framework encompassing statistical analysis, Explainable AI (XAI), clustering, and predictive modeling of methane yield to gain deeper operational insights into large-scale biogas production. By utilizing the operational data of a large-scale biodigester in Western cape province of South Africa including key biochemical and physicochemical variables such as temperature, pH, total solids (TS), volatile solids (VS), moisture content (MC), and FOS/TAC, key insights were derived through correlation mapping, scatter analysis, SHapley Additive exPlanations (SHAP)-based XAI for ranking digestion operational features, Principal component analysis (PCA) for addressing multicollinearity, and k-means cluster analysis to identify the operational clusters or groups which highlights critical shifts in system stability. Moreover, ensemble learning approaches, namely, XGBoost, Random Forest, as well as Support Vector Machine and Artificial Neural Network, were developed for methane yield prediction. The SHAP-based XAI identified FOS/TAC, volatile solids (VS), and moisture content (MC) as the most influential predictors of methane yield, while PCA explains 74% of the data variance in three Principal components (PCs), with PC1 dominated by VS, MC, and temperature as key drivers of methane yield. K-means clustering uncovered three distinct operational clusters, offering actionable guidance for feedstock management and process stabilization. Feedstock regression further established municipal solid waste (MSW) as the optimal input for maximizing methane output, with processed organic waste (POW) serving as an effective co-substrate. XGBoost achieved the best performance with an RMSE value of 1.18, followed by Random Forest (RMSE = 1.83), demonstrating the robustness of ensemble models in handling non-linear operational datasets. The research methodology is limited by its reliance on past operational data from a single digester and a lack of direct optimization experiments. However, the research strongly demonstrates the potential of data-driven approaches not only as powerful standalone tools but also as vital complements to experimental investigations. By transforming raw plant data into actionable intelligence, this study offers a scalable methodology for improving energy recovery, enhancing process control, and guiding sustainable development in industrial-scale bioenergy applications.

Details

1009240
Location
Title
Data-driven and explainable AI (XAI) framework for optimizing methane yield in large-scale biogas production
Author
Adeleke, Oluwatobi 1 ; Jen, Tien-Chien 1 

 University of Johannesburg, Mechanical Engineering Science, Johannesburg, South Africa (GRID:grid.412988.e) (ISNI:0000 0001 0109 131X) 
Publication title
Volume
12
Issue
1
Pages
65
Publication year
2025
Publication date
Dec 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
Publication subject
e-ISSN
2198994X
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-10-31
Milestone dates
2025-10-18 (Registration); 2025-09-14 (Received); 2025-10-18 (Accepted)
Publication history
 
 
   First posting date
31 Oct 2025
ProQuest document ID
3267580384
Document URL
https://www.proquest.com/scholarly-journals/data-driven-explainable-ai-xai-framework/docview/3267580384/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-01
Database
ProQuest One Academic