Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

One of the most complex and prevalent diseases is heart disease (HD). It is among the main causes of death around the globe. With changes in lifestyles and the environment, its prevalence is rising rapidly. The prediction of the disease in its early stages is crucial, as delays in diagnosis can cause serious complications and even death. Machine learning (ML) can be effective in this regard. Many researchers have used different techniques for the efficient detection of the disease and to overcome the drawbacks of existing models. Several ensemble models have also been applied. We proposed a stacking ensemble model named NCDG, which uses Naive Bayes, Categorical Boosting, and Decision Tree as base learners, with Gradient Boosting serving as the meta-learner classifier. We performed preprocessing using a factorization method to convert string columns into integers. We employ the Synthetic Minority Oversampling TEchnique (SMOTE) and BorderLineSMOTE balancing techniques to address the issue of data class imbalance. Additionally, we implemented hard and soft voting using voting classifier and compared the results with the proposed stacking model. For the Artificial Intelligence-based eXplainability of our proposed NCDG model, we use the SHapley Additive exPlanations (SHAP) technique. The outcomes show that our suggested stacking model, NCDG, performs better than the benchmark existing techniques. The experimental results of our proposed stacking model achieved the highest accuracy, F1-Score, precision and recall of 0.91, 0.91, 0.91 and 0.91, respectively, and an execution time of 653 s. Moreover, we have also utilized K-Fold Cross-Validation method to validate our predicted results. It is worth mentioning that our prediction results and their validation strongly coincide with each other which proves our approach to be symmetric.

Details

Title
Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach
Author
Sara Qamar Sultan 1 ; Javaid, Nadeem 2 ; Alrajeh, Nabil 3   VIAFID ORCID Logo  ; Aslam, Muhammad 4   VIAFID ORCID Logo 

 Department of Mathematics, COMSATS University Islamabad, Islamabad 44000, Pakistan 
 ComSens Lab, International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Douliou 64002, Taiwan 
 Department of Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh 11633, Saudi Arabia 
 Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3FL, UK 
First page
185
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20738994
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3171252923
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.