Full text

Turn on search term navigation

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

GI diseases are one of the leading causes of morbidity and mortality worldwide, and early and accurate diagnosis is considered to be very important. Traditional methods like endoscopy take time and depend majorly on the judgment of the physician. The proposed Efficient Vision Transformer (EfficientViT) is a new deep learning-based model using EfficientNetB0 in combination with the Vision Transformer (ViT) for the classification of eight different types of diseases in the GI system. EfficientViT utilizes the features of EfficientNetB0 to capture local textures and multi-scale features to achieve structural changes in the GI tract. At the same time, it includes the capacity of the ViT model to recognize the context of images of the GI tract for the detection of slight disease patterns and precursors of disease diffusion. Furthermore, we designed a dual-block in which input is divided into two parts (q1, q2) to better optimize the model q1 processed through an EfficientNet for local details and a q2 through encoder block for capturing the global dependencies, which enables EfficientViT to pay attention to multiple image regions simultaneously. We have tested the model using fivefold cross-validation and achieved an outstanding accuracy of 99.82% compared to the MobileNetV2-based model which reached 99.60%. In addition, EfficientViT demonstrated excellent precision, recall, and F1 scores. Our model, in general, outperforms existing methods, offering a promising tool for clinicians to more reliably and accurately diagnose GI diseases from endoscopic images.

Details

Title
Hybrid deep learning framework based on EfficientViT for classification of gastrointestinal diseases
Author
Tanwar, Vishesh 1 ; Sharma, Bhisham 2 ; Yadav, Dhirendra Prasad 3 ; Mehbodniya, Abolfazl 4 

 Chitkara University, Chitkara University Institute of Engineering and Technology, Rajpura, India (GRID:grid.428245.d) (ISNI:0000 0004 1765 3753) 
 Chitkara University, Centre of Research Impact and Outcome, Rajpura, India (GRID:grid.428245.d) (ISNI:0000 0004 1765 3753) 
 GLA University, Department of Computer Engineering & Applications, Mathura, India (GRID:grid.448881.9) (ISNI:0000 0004 1774 2318) 
 Kuwait College of Science and Technology (KCST), Department of Electronics and Communication Engineering, Kuwait City, Kuwait (GRID:grid.510476.1) (ISNI:0000 0004 4651 6918) 
Pages
26982
Publication year
2025
Publication date
2025
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3232917623
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.