Content area

Abstract

Missing data poses a significant challenge in clinical real-world studies, often arising from unplanned data collection, misplacement, patient loss to follow-up, and other factors. While multiple imputation by chained equations (MICE) is a widely used method, its sequential nature introduces uncertainty, potentially impacting the prediction model performance. We proposed and evaluated three uncertainty-aware functions (i.e., uncertainty sampling (US), probability of improvement (PI), and expected improvement (EI)) integrated with linear regression (LinearReg), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) using three large datasets: chronic kidney disease (CKD, n = 31,043), hypertension cohort from Ramathibodi Hospital (HT-RAMA, n = 140,047) and Khon Kaen University Hospital (HT-KKU, n = 108,942) with high missing rates. In the CKD cohort, uncertainty-aware models significantly improved performance (evaluated by root mean squared error (RMSE) and mean absolute error (MAE)) over standard MICE, except for XGBoost. LinearReg-EI performed best (RMSE 0.12, MAE 0.36), followed by RF-EI (RMSE 0.22, MAE 0.34), and DT-EI (RMSE 0.21, MAE 0.38). In HT-RAMA, LinearReg-US performed best (RMSE 0.24, MAE 8.15), outperforming RF-US (RMSE 0.92, MAE 8.58) and DT-PI (RMSE 0.96, MAE 8.74). Similarly, in HT-KKU, LinearReg-US performed best (RMSE 0.98, MAE 12.00), followed by RF-PI (RMSE 1.93, MAE 12.90) and DT-US (RMSE 2.10, MAE 12.63). Uncertainty-aware models produced imputed distributions closely resembling the original data, unlike standard MICE. Our findings suggest that incorporating uncertainty functions can improve MICE, particularly for LinearReg, RF and DT. Further research is warranted to validate these findings across diverse clinical settings and model types.

Details

1009240
Business indexing term
Title
Uncertainty-aware approach for multiple imputation using conventional and machine learning models: a real-world data study
Publication title
Volume
12
Issue
1
Pages
95
Publication year
2025
Publication date
Apr 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-04-17
Milestone dates
2025-03-25 (Registration); 2024-10-28 (Received); 2025-03-25 (Accepted)
Publication history
 
 
   First posting date
17 Apr 2025
ProQuest document ID
3191260274
Document URL
https://www.proquest.com/scholarly-journals/uncertainty-aware-approach-multiple-imputation/docview/3191260274/se-2?accountid=208611
Copyright
Copyright Springer Nature B.V. Apr 2025
Last updated
2025-11-14
Database
ProQuest One Academic