Content area
Full text
Introduction
Oral epithelial dysplasia (OED) presents a significant challenge in the realm of oral pathology, where accurate diagnosis and early detection are paramount for effective intervention and prevention of malignant progression. OED is a potentially malignant histopathological diagnosis encompassing various lesions of the oral mucosa, typically manifesting as white (leukoplakia), red (erythroplakia) or mixed red-white (erythroleukoplakia) lesions1,2.
Histopathological grading of Haematoxylin and Eosin (H&E) stained tissue using the World Health Organisation (WHO, 20173) classification system remains the current accepted practice for diagnosis and risk stratification of OED lesions. This is a three-tier system for grading OED into mild, moderate and severe grades based on the presence, severity and location of a wide range of cytological and architectural histological features (28 in total4,5). By its nature, this approach suffers from significant intra- and inter-observer variability and has poor predictive value for malignant transformation risk, potentially impacting on patient management. An alternate binary grading system, categorising lesions as low- or high-risk, based on the number of cytological and architectural features (as listed in the WHO criteria) aimed to improve the reproducibility of grading6,7. However, studies have shown significant variability and unreliability in grading using both systems, highlighting the need for more objective and reproducible methods that can better predict malignant transformation risk in OED8,9.
To address challenges in subjectivity and misclassification of precancerous and cancerous lesions, there is a growing interest in leveraging advanced technologies, particularly deep learning (DL), which has seen extensive use in medical image analysis over the past decade10, 11–12. Several state-of-the-art models, such as U-Net13 and DeepLab14, have been developed to perform image classification and segmentation. These models typically use convolutional neural networks (CNN), such as ResNet15, as feature extractors. Within digital pathology, weakly supervised methods have became popular choices for the analysis of histology images, enabling slide-level classification based on patch-level predictions. These methods typically divide WSIs into smaller patches, before using CNNs to extract patch-level features16, 17–18. However, despite their success, CNN-based models have limitations such as high computational overhead and difficulty in capturing long-range dependencies in images, when being used for either segmentation or classification.
Transformers have gained widespread attention in...