Content area

Abstract

Colorectal cancer is a significant global health issue, ranking as the third most common cancer and the second leading cause of cancer-related deaths worldwide. Early diagnosis of this disease is of utmost importance to increase the survival rate and enhance the healthcare system. Many machine learning (ML) and deep learning (DL) methods have been proposed to facilitate automated early diagnosis of this cancer. However, label noise in medical images and the dependence on a single model can lead to suboptimal model performance, which could potentially hinder the development of a sophisticated automated solution. In this paper, we address label noise in training data and propose a stacking-ensemble model for classifying colorectal cancer along with a trustworthy computer-aided diagnosis (CAD) system. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques. Second, a modified VGG-16 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, a prediction uncertainty and probabilistic local outlier factor (pLOF) were applied to the extracted features to address the label noise issue in the training data. Fourth, we adopted a random forest–based recursive feature elimination (RF-RFE) feature selection method with various combinations of features to recursively select the most influential ones for accurate predictions. Fifth, four base ML classifiers and a metamodel were selected to build our final stacking-ensemble model, which integrates the prediction probabilities of multiple models into a meta-feature set to ensure trustworthy predictions. Finally, we integrated these strategies and deployed them into a web application to demonstrate a CAD system. This system not only predicts the disease but also generates the prediction probabilities of each class, which enhances both clarity and diagnostic insight. Our proposed model was compared with different state-of-the-art ML classifiers on a publicly available dataset and demonstrated the highest accuracy of 92.43%.

Details

1009240
Business indexing term
Location
Title
Addressing Label Noise in Colorectal Cancer Classification Using Cross-Entropy Loss and pLOF Methods With Stacking-Ensemble Technique
Author
Tani, Ishrat Zahan 1 ; Kah Ong Michael Goh 2   VIAFID ORCID Logo  ; Islam, Md Nazmul 1 ; Md Tarek Aziz 3   VIAFID ORCID Logo  ; Mahmud, S M Hasan 4   VIAFID ORCID Logo  ; Nandi, Dip 5   VIAFID ORCID Logo 

 Department of Computer Science & Engineering Rajshahi University of Engineering & Technology (RUET) Kazla, Rajshahi 6204 Bangladesh; Centre for Advanced Machine Learning and Applications (CAMLAs) Dhaka 1229 Bangladesh 
 Faculty of Information Science & Technology (FIST) Multimedia University Jalan Ayer Keroh Lama, Melaka 75450 Malaysia 
 Centre for Advanced Machine Learning and Applications (CAMLAs) Dhaka 1229 Bangladesh 
 Centre for Advanced Machine Learning and Applications (CAMLAs) Dhaka 1229 Bangladesh; Department of Computer Science American International University-Bangladesh (AIUB) 408/1, Kuratoli, Khilkhet, Dhaka 1229 Bangladesh 
 Department of Computer Science American International University-Bangladesh (AIUB) 408/1, Kuratoli, Khilkhet, Dhaka 1229 Bangladesh 
Editor
Ali Qamar
Volume
2025
Publication year
2025
Publication date
2025
Publisher
John Wiley & Sons, Inc.
Place of publication
New York
Country of publication
United States
Publication subject
ISSN
16879724
e-ISSN
16879732
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2024-03-10 (Received); 2024-11-22 (Revised); 2024-12-04 (Accepted); 2025-01-30 (Pub)
ProQuest document ID
3164853086
Document URL
https://www.proquest.com/scholarly-journals/addressing-label-noise-colorectal-cancer/docview/3164853086/se-2?accountid=208611
Copyright
Copyright © 2025 Ishrat Zahan Tani et al. Applied Computational Intelligence and Soft Computing published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License (the “License”), which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Last updated
2025-07-22
Database
ProQuest One Academic