Content area
Skin cancer remains a major global health concern where early detection can significantly improve treatment outcomes. Traditional methods rely on expert evaluation, which can be prone to errors. DSSCC-Net, a deep CNN model integrated with SMOTE-Tomek oversampling, improves classification accuracy and effectively handles class imbalance in dermoscopic datasets. Trained and validated on the HAM10000, ISIC 2018 and PH2 datasets, DSSCC-Net achieved an average accuracy of 97.82% ± 0.37%, precision of 97%, recall of 97% and an AUC of 99.43%. Additional analysis using Grad-CAM and expert-labeled masks validated the model’s explainability. DSSCC-Net demonstrates state-of-the-art performance and readiness for real-world clinical integration. Current CNN-based models struggle with accurately classifying underrepresented skin lesion classes due to dataset imbalances and fail to achieve consistently high performance across diverse populations. There is a pressing need for a robust, efficient, and interpretable model to aid dermatologists in early and accurate diagnosis. This study proposes DSSCC-Net, a novel deep learning framework that integrates an optimized CNN architecture with the SMOTE-Tomek technique to address class imbalance. The model processes dermoscopic images from the HAM10000 dataset, resized to 28
Introduction
Skin cancer is a rapidly growing concern worldwide, with early detection being critical to improving survival rates [1]. Traditional diagnostic approaches heavily rely on dermatological expertise, which can be both subjective and inconsistent. Deep learning (DL), particularly Convolutional Neural Networks (CNNs), has emerged as a promising tool in automating skin cancer diagnosis. However, challenges remain in achieving robust performance on imbalanced datasets and ensuring real-world applicability. Given these challenges, the integration of artificial intelligence (AI) and deep learning (DL) has emerged as a transformative approach to automating skin cancer diagnosis and classification with high accuracy and efficiency. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs), have demonstrated exceptional potential in medical image analysis [1]. CNNs have the ability to extract intricate patterns from dermoscopic images, making them a powerful tool for skin lesion classification. However, many existing models suffer from drawbacks such as high computational cost, overfitting, and suboptimal performance on imbalanced datasets [1]. Addressing these limitations, we propose DSSCC-Net, a novel and optimized CNN-based model specifically designed for skin cancer classification. By leveraging advanced augmentation techniques, dataset balancing methods, and an efficient architectural design, DSSCC-Net enhances classification performance while reducing computational complexity. The proposed model aims to improve real-world applicability, providing an effective and reliable tool for dermatologists to facilitate early and accurate skin cancer diagnosis [1].
This study proposes DSSCC-Net, an optimized CNN architecture integrated with SMOTE-Tomek, tailored specifically for handling data imbalance in skin lesion classification as shown in Fig. 1. The model was evaluated in three benchmark datasets - HAM10000, ISIC 2018, and PH2 - showing superior accuracy and generalizability. Key model components, including convolutional block configurations, resolution trade-offs, and dropout layers, were carefully optimized to maintain efficiency without compromising accuracy.
Fig. 1 [Images not available. See PDF.]
Architecture of DSSCC-Net for Enhanced Skin Cancer Classification.
The major contributions of this work are as follows:
Proposed DSSCC-Net: a compact, efficient CNN architecture designed for skin lesion classification on imbalanced data.
Integration of SMOTE-Tomek: post-split oversampling strategy employed to address class imbalance without causing data leakage.
Extensive evaluation: conducted on three publicly available datasets (HAM10000, ISIC 2018, PH2) to demonstrate the generalizability of the model.
Explainability analysis: both visual and quantitative explainability assessed using Grad-CAM, Dice score, and IoU, benchmarked against expert annotations.
Deployment readiness: model assessed in terms of size, inference speed, and feasibility for deployment on low-resource and edge devices.
Comparative performance: evaluated against recent CNN and transformer-based models, including MobileNetV3, Swin Transformer, and ConvNeXt.
The remainder of this manuscript is structured as Section Related work provides a review of relevant literature, while Section Materials and methods outlines the proposed skin cancer recognition approach. Section Experimental setup presents the experimental setup and discusses the obtained results. Finally, Section Conclusion summarizes the conclusions of the study.
Related work
Recent advancements in deep learning have significantly improved the automated classification of skin lesions. Various CNN-based architectures, such as VGG16, ResNet-50, DenseNet-121, and Inception-V3, have been utilized with varying success. However, these models often struggle with performance drop due to class imbalance and generalization limitations across datasets.
Melanoma is a type of cancer that is a major health concern across the world. Skin cancer, especially when diagnosed at an early stage, calls for early treatment and reduces chances for death, hence the increased concern in developing intelligent systems for the diagnosis of skin cancer. BI and DL have become imposing tools in this direction and have the capability to improve diagnostic efficiency, decrease the role of human errors, and provide results in less ti.me. Different datasets have been used in the above works to develop and test models that have the ability to diagnose skin lesions with high accuracy, including the HAM100001 dataset. The HAM10000 dataset is a widely used collection of dermoscopic images, containing 10,015 images of seven different skin lesion classes: AKIC, BCC, BKL, DF, NV, VASC, and MEL. This includes a dataset that has been used in the development and testing of various AI and DL tools developed for enhancing the accurate diagnosis of skin cancer. The HAM10000 dataset has been central to benchmarking, but many models fail to address issues arising from underrepresented lesion types. Studies like Abbas et al. [1] and Bajwa et al. [8] reached high accuracies, yet without interpretability validation or balancing techniques. Transformers, such as Swin Transformer and TransUNet, have recently been introduced with strong performance on ISIC 2018 and Derm7pt. The HAM100001 is one of the most used datasets in skin cancer research, where many studies have been performed adopting different AI and DL approaches in order to enhance the accuracy of diagnosis. Some of the models, limitations, and datasets used, as well as the accuracy achieved according to some of the studies, are summarized in Table 1.
Table 1. Existing models, datasets used and accuracy Achived.
Ref | Model Type | Dataset | Accuracy |
|---|---|---|---|
2 | Hybrid CNN | ISBI 2016 | 88.02% |
3 | Spiking Vgg-13 | ISIC 2018 | 89.57% |
4 | Vgg-16, Vgg-19, AlexNet | HAM10000 | 92.25% |
5 | Deep CNN | ISIC 2016, 2017, ISIC 2020 | 90.42% |
6 | CNN and ResNet-50 | HAM10000 | 86% |
7 | DenseNet-201 | ISIC 2018 | 76.08% |
8 | Vgg-19 V2 | ISIC 2018 | 98.2% |
9 | EfficientNets (B0-B7) | HAM10000 | 87.91% |
10 | DCNN | HAM10000 | 91.93% |
11 | ResNet-152, SE-ResNeXt101, DenseNet-161 | ISIC 2018 | 93% |
12 | CNN | HAM10000 | 78% |
13 | DenseNet-121 | HAM10000 | 85% |
14 | ResNet-50 (Transfer Learning) | ISIC 2019 | 93% |
15 | Vgg-16 (Fine-tuned) | ISIC 2019 | 82.8% |
16 | VGGNET | ISIC 2019 | 85.62% |
17 | CNN-Based Feature Extraction | ISIC 2020 | 89.5% |
18 | Ensemble of GoogleNet and Vgg-16 | ISIC 2020 | 80.1% |
19 | VGGNet (Transfer Learning) | ISIC 2020 | 81.3% |
20 | Linear Classifier (CNN Features) | HAM10000 | 85.8% |
21 | Sparse Coding, SVM, Deep Learning | ISIC | 93.1% |
22 | Various ML Classifiers | UCI Repository | 98% |
23 | Deep CNN (Multiple Datasets) | ISIC 2017, ISIC 2018, ISIC 2019 | 93.47%, 88.75%, 89.58% |
24 | Transformer-based CNN | HAM10000 (2024) | 96.8% |
Several studies have explored deep learning techniques for skin cancer classification. Recent works, including [63–67] and Mondal & Shrivastava (2023), emphasize architectural improvements and domain adaptation for enhanced robustness. Despite this, models with fewer parameters and higher inference speed remain in demand for real-world deployment.Kaur et al.2 demonstrated that deep CNNs could achieve up to 90.42% accuracy on ISIC datasets (2016, 2017, 2020), although segmentation performance remained sensitive to adverse conditions such as low contrast and occlusion. Alwakid et al.3 employed CNN and ResNet-50, achieving 86% accuracy on the HAM10000 dataset, emphasizing the importance of hyperparameter selection and dataset quality. Aljohani et al.4 reported a lower accuracy of 76.8% with DenseNet-201 on ISIC 2019, suggesting that incorporating more clinical data could improve model generalization. Abbas et al.5 achieved 98% accuracy using Vgg-19-V2 on ISIC 2020, highlighting the importance of lightweight yet accurate models for different domains.Investigated EfficientNet models, obtaining 87% accuracy, and stressed the significance of balanced training data. Raza et al.6 used deep CNNs, achieving 91% accuracy, with near 93% on HAM10000, cautioning against overfitting due to small sample sizes and advocating for more diverse datasets. Zhang et al.7 combined ResNet-152, SE-ResNeXt101, and DenseNet-161, achieving 93% accuracy on ISIC 2018, but noted time consumption as a limitation. Iqbal et al.12 developed a CNN model with 78% accuracy on HAM10000, underlining the necessity of larger datasets and high-quality images. Hussain et al.13 applied DenseNet-121, achieving 85% accuracy, and highlighted vulnerabilities to adversarial examples, recommending adversarial training techniques. Chen et al.14 implemented ResNet-50 with transfer learning, reaching 93% accuracy on ISIC 2019, emphasizing feature selection and preprocessing. Zhao et al.15 fine-tuned Vgg-16, attaining 82% accuracy on ISIC 2019, and pointed out the role of data augmentation in accuracy improvement. Liu et al.16 achieved 85% accuracy using VGGNet but observed a performance drop to 62% on ISIC 2019 due to image resizing, which led to overfitting. Patel et al.17 utilized CNN-based feature extraction and obtained 89.5% accuracy on ISIC 2020, suggesting improved detection accuracy through regularization techniques. Shah et al.18 investigated an ensemble of GoogleNet and Vgg-16, attaining 80.1% accuracy on ISIC 2020, and emphasized the impact of data augmentation and normalization. Jiang et al.18 employed VGGNet with transfer learning, achieving 81.3% accuracy on ISIC 2020, but faced challenges in early melanoma detection due to dataset constraints. Kartumi et al.19 used a linear classifier with CNN-derived features, achieving 85.8% accuracy on HAM10000, noting dataset dependency issues. Rizvi et al.20 combined sparse coding, SVM, and deep learning, achieving 93.1% accuracy on ISIC but identified inefficiencies in handling all skin cancer types. Singh et al.21 explored various ML classifiers, reaching up to 98% accuracy on the UCI dataset, emphasizing the sensitivity of classifier performance to dataset quality. Kumar et al.23 evaluated deep CNNs across multiple datasets, achieving 93% on ISIC 2018, 88.75% on ISIC 2019, and 90% on ISIC 2020, and highlighted challenges in generalizing performance across diverse datasets.
Recent studies have explored advanced CNN and transformer architectures for skin lesion classification, reporting improved performance and interpretability. For example, Ahamed et al.25 proposed a multi-scale CNN with attention modules, Ahamed et al.26 introduced an efficient CNN–Transformer hybrid, and Faysal et al.3 benchmarked lightweight CNNs.
Another study introduced SkinNet-14, a compact deep learning framework that accurately classifies skin cancer from low-resolution dermoscopy images while significantly reducing training time27. Tested on multiple datasets, it achieved up to 98% accuracy, proving effective in resource-limited medical settings.
Similarly, Ahamed et al.26 proposed customized CNN architectures that improve malaria diagnosis while ensuring interpretability for medical practitioners. This approach enhances diagnostic accuracy and reliability, contributing to more effective healthcare solutions.
Furthermore, Islam et al.28 conducted a systematic review synthesizing recent deep learning approaches for colorectal cancer screening, including detection, localization, segmentation, and classification tasks. The study highlights state-of-the-art models achieving over 99% accuracy while emphasizing challenges such as dataset variability and the lack of standardized benchmarks. We have considered these models in expanding our comparative experiments and in contextualizing our contribution.
In contrast, DSSCC-Net introduces a class-balancing-aware architecture optimized for diagnostic efficiency, and integrates interpretability and statistical validation to address gaps in current literature.
Materials and methods
This section details the dataset sources, preprocessing steps, proposed CNN architecture (DSSCC-Net), and training configuration. It also discuss the detailed experimental procedure employed to evaluate the performance of the proposed DSCC_Net model for skin cancer detection, comparing it against six well-known deep convolutional neural network (CNN) models. The selected models for this analysis include Vgg-19 [36], ResNet-152 [31], Vgg-16 [35], Enhanced Vgg-19 [25], Inception-V3 [5], and EfficientNet-B0 [30].The focus of this study is on 7 types of skin lesions: AKIEC, BCC, Bkl, DF, NV, VASC, and melanoma.
Experimental setup
Dataset overview: We used HAM10000 (10,015 images), ISIC 2018 (10,600 images), and PH2 (200 images). Each dataset covers seven primary skin lesion types. A summary of distribution is provided in Table 2.
Image preprocessing and resolution: All images were resized to to support deployment efficiency. Comparative studies at and showed only marginal performance gain (0.5–1.1%) at the cost of 3–5 training time. To assess the trade-off between image resolution and model performance, we conducted ablation experiments using three resolutions: 28 28, 64 64, and 128 128 pixels. All models were trained under identical conditions, and training time, memory usage, and accuracy were recorded for each setting.
SMOTE-Tomek integration: SMOTE-Tomek was applied after splitting data (train/val/test) to prevent data leakage. We compared it with MixUp, CutMix, and GAN-based augmentation, with t-SNE visualization confirming sample quality.
Architecture (DSSCC-Net): The model consists of five convolutional blocks, each followed by ReLU and MaxPooling layers. Dropout (0.5) and Dense layers aid regularization. The final Softmax layer outputs classification probabilities across seven classes. Figure 5 illustrates the updated architecture.
Training settings: Adam optimizer, Sparse Categorical Cross-Entropy loss, learning rate 0.001, batch size 32, early stopping at epoch 17.
The data used in this research work are from the HAM10000 [21] dataset, and it consists of dermoscopy images alone. These images are very essential in the diagnosis and differentiation of many skin lesions. The above images will help in the formation of questions as stated in the following section. There are many skin conditions in the dataset; therefore, it is good for use in training and conducting tests on skin cancer detection. Afterward, all the images are normalized and resized to a particular dimension of 28 28 pixels in an endeavor to retain the dimensionless feature of the data as required in most of the CNN models used in this study. Several studies have pointed out that data preprocessing is one of the processes in machine learning that is very important, especially when working with big data sets that are imbalanced. In the current study, the raw dermoscopic images were preprocessed since the pixel intensity values of the images may differ in order to minimize the problem of overfitting. For the approach to manage the class imbalance problem ocumented for the medical imaging datasets where one class can contain significantly fewer instances than the other, SMOTE with Tomek links has been utilized. Regarding the feature of creating samples for the minority classes, there is the SMOTE algorithm. Tomek links in making a filter that can be used to find cases on the boundary of classes in the data set. It is useful in the effort to come up with more proper means of data distribution and enhancing the ANNs data prediction model when in a generalization mode. It is also known commonly in the process of identification of the training, validation, and also the testing of the data set. In more detail, 60% of the data were used for the training sample, 20% for the validation sample, and 20% for the test sample. It is beneficial in the sense that it raises the amount of information used for developing the model and ensures that the datasets of validation and testing are used when assessing the performances of models, besides reducing the impacts of overfitting.
Dataset
Imbalance HAM10000 dataset
HAM10000 dataset, which is widely recognized for the classification of pigmented skin lesions. The dataset consists of 10,015 dermatoscopic images, covering seven key diagnostic categories of skin lesions: Actinic keratoses and intraepithelial carcinoma (AKIEC), basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic nevi (NV), and vascular lesions (VASC). These images were collected from diverse populations using various modalities, ensuring a broad representation of pigmented lesions. A notable challenge in training models with this dataset is its class imbalance, as certain categories, such as melanoma or actinic keratoses, are underrepresented compared to others like melanocytic nevi. To address this issue, we applied data preprocessing techniques along with SMOTE (Synthetic Minority Over-sampling Technique) and Tomek links to balance the dataset. These methods help improve the performance of the neural network by generating synthetic samples of the minority classes and removing ambiguous samples that could negatively affect the model’s learning. This helps in making the model more robust and offers a better, more fair environment for model training. Moreover, over half of the cases in this dataset are supported by histopathological examination; the rest are supported by follow-up examination, expert opinion, or in vivo confocal microscopy, making the dataset more reliable.
Frequency
Give an overview of the manner in which the various skin disease classes are distributed in the set by making use of the frequency distribution as presented in Fig. 1. Analyzing the results for the initial training set, one can understand that the diseases are plotted on the x-axis and the count of classes and their frequency on the y-axis. nv (melanocytic nevi): This disease class is the most common; its count is almost 7 thousand. This means to suggest that “nv” is a common illness in the dataset analyzed. Melanoma: The second most occurring disease type next to injury, with more than a thousand incidences. This implies that it is among the many skin cancer types in this data set. bkl (Benign keratosis-like lesions): This disease type also has a higher count as per the comparison, which testifies the theory that it is a frequent occurrence. bcc (basal cell carcinoma): It is less frequent as compared to “nv” and “mel,” but it is also one of the most important classes in the dataset in terms of the number of cases. df (Dermatofibroma), vasc (Vascular lesions), akiec (Actinic keratoses and intraepithelial carcinomae): These classes have relatively low frequencies, with the means for the counts falling below 500 for each of the conditions, denoting a lower prevalence of these conditions in the set.
Fig. 2 [Images not available. See PDF.]
Frequency distribution of the seven skin lesion classes in the HAM10000 dataset, highlighting the dominance of the ’nv’ class and the imbalance among categories.
The magnified (Fig. 2) difference in the occurrence of classes can be considered a class imbalance problem, where ‘nv’ occurs dominantly more than any other class. This may limit the performance of a model as it tends to favor the majority classes over the less frequent ones. To ensure balanced model performance, techniques like data augmentation, oversampling, or the SMOTE Tomek method could be employed to balance the dataset better. Additionally, understanding the reasons behind the frequency disparities can help focus on underrepresented classes for more accurate detection and diagnosis.
Age
The plot (Fig. 3) offers insights into the age distribution of patients within the dataset. The x-axis denotes the patients’ ages, while the y-axis reflects the frequency or count of occurrences for each age group
Ages 0–10: This age group has a relatively low frequency, with fewer than 200 cases. This suggests that younger patients are less frequently represented in the dataset.
Ages 10–20: The frequency starts to increase but remains below 400 cases. This indicates a gradual increase in skin-related conditions as patients grow older.
Ages 20–40: There is a sharp increase in the number of cases, peaking at around 1200 for patients in their 40s. This is the most common age group for the dataset.
Ages 40–60: The count remains high, around 1000 cases, showing a slightly downward trend from the peak, but still high. This indicates that skin conditions remain common among middle-aged patients.
Ages 60–80: The frequency gradually declines, but the number of cases is still substantial, indicating that elderly patients also form a significant portion of the dataset.
Ages 80+: The number of cases drops off sharply, indicating fewer cases of skin conditions in this age group. The (Fig. 3) suggests that skin conditions are more common among middle-aged patients (40–50 years old) and slightly decline in older age groups. There is a significant presence across all adult age groups, with fewer cases in very young or very old patients. The higher frequency of cases in middle-aged individuals could imply a need for targeted screening or awareness programs in this age group to address skin-related conditions.
Fig. 3 [Images not available. See PDF.]
Age distribution of patients in the HAM10000 dataset, with most lesions observed in middle-aged groups (40–60 years).
Location
The Fig. 4 provides insights into the frequency of different disease locations categorized by gender. The x-axis represents different disease locations, while the y-axis shows the frequency/count of occurrences. The plot is color-coded by gender: blue for males, orange for females, and green for unknown gender.
Back: This is the most frequent disease location for both males and females, with a count of about 1400 for males and 1300 for females. This suggests that back-related conditions are common across genders.
Lower extremity: The next most frequent disease location is the lower extremities, with males having around 900 cases and females close to 850.
Trunk: This is also a common disease location, with males slightly outnumbering females in frequency, with counts around 700 for males and 650 for females.
Upper extremity: Similar trends can be observed here, with males having slightly higher frequencies.
Abdomen, face, chest: These disease locations show a similar distribution, where males tend to have a higher count than females, though the differences are not as significant as in the earlier categories.
Foot, Scalp, Neck, Unknown: These categories show more variation, with neck and scalp-related conditions showing minor differences between males and females.
Hand, genital, ear, acral: These disease locations are the least frequent, with hand and genital conditions having a very low count across all genders. The unknown category includes a small but significant count, especially for foot-related conditions. This plot provides a comprehensive view of the distribution of disease locations across genders. Males tend to have slightly higher frequencies across most disease locations, but the differences between genders are generally small, indicating that the disease locations affect both genders relatively similarly. The dominance of diseases in locations like the back and lower extremities suggests that medical focus on these areas could benefit both genders equally. The inclusion of the unknown category suggests there might be room for better classification of diseases, possibly through improved data collection or categorization methods
Fig. 4 [Images not available. See PDF.]
Distribution of skin lesion locations by gender in HAM10000. The back and lower extremities are the most frequently affected areas across both sexes.
Model architecture and training
The DSCC_Net model is provided for the identification of all seven types of skin carcinoma lesions within the context of the sequential model, which served as the object of description for these types. The model consists of the following layers: The model comprises the following layers:
Input Layer: In all the other areas, Page permits images in the form of 28 * 28 * 3, which goes hand in hand with the RGB images.
Convolutional Layers: The first two convolution layers use filters of sizes 32 and 64, respectively, ReLU activation, and the same padding for detecting low-level features.
MaxPooling Layers: After Convolution, another important layer performed is ReLU activation. There is also Max-Pooling, where the pool size of 2*2 is performed in each layer in order to minimize the dimensionality of the network.
Additional Convolutional Layer: The second convolutional layer with 128 filters to contribute to extracting the features of the picture and the next max pooling layer to fine-tune the dimensionality of the picture.
Flatten Layer: Both of them take two 2D feature maps as input, and outputs have the convenience of fully connected layers, and each operation flattens it to a 1D vector.
Dense Layers: The first hidden layer was the dense layer that contained 128 nodes and the activation function as a rectified linear unit; the second layer was a dropout layer with a dropout rate of 0.5 to reduce overfitting.
Output Layer: The last and output layer has 7 neurons, where each neuron of the layer is for each skin cancer class; the activation function used is Softmax to give probability scores for the skin cancer classes.
Performance evaluation
To evaluate the DSCC_Net model, its performance was compared against six pre-trained CNN models: These machines include Vgg-19 [36], ResNet-152 [31], Vgg-16 [35], Enhanced Vgg-19 [25], Inception-V3 [5], and EfficientNet-B0 [30]. The same data was used while training these models, and to compare the performance of the models, similar performance measures were employed. It is evaluated; it differs only in that the comparison was meant to offer some level of empirical enhancement of DSCC_Net standing solely on the perspectives of accuracy, precision, recall, F1, and AUC since the analysis here is predominantly benchmarking rather than a comparison with other state-of-the-art methods. In addition, other criteria, such as the confusion matrix, were employed in determining the overall classification of all seven classes. It is also useful in finding out the cases of misclassification and the general performances of the model based on the kind of skin cancer that one has. As to the visualization of the outcome, we used the Grad-CAM heatmap that provided us with the information regarding what specific areas of images mattered to the decision-making and, as such, offered the information on the interpretability of the model and its focus. On counter to the class imbalance problem that arises in the given data set, the up-sampling techniques utilized here are SMOTE and Tomek. Figure 5 and Fig. 6 also shows how SMOTE Tomek creates synthetic samples for the minority classes. SMOTE [3] generates synthetic points for the examples belonging to the minority class using the KNN algorithm. This method, together with Tomek links, is beneficial in sorting out the balance of classes. Before applying up-sampling, the images’ distribution is depicted in the following Table 3.
Table 2. Original Data Distribution.
Class No | Class Name | No. of Samples |
|---|---|---|
0 | Actinic keratoses and intraepithelial carcinomae (akiec) | 1302 |
1 | Basal cell carcinoma (bcc) | 1382 |
2 | Benign keratosis-like lesions (bkl) | 1320 |
3 | Dermatofibroma (df) | 1331 |
4 | Melanocytic nevi (nv) | 1383 |
5 | Pyogenic granulomas and hemorrhage (vasc) | 1303 |
6 | Melanoma (mel) | 1366 |
Table 3. Data Distribution After Over-Sampling.
Class No | Class Name | No. of Samples |
|---|---|---|
0 | Actinic keratoses and intraepithelial carcinomae (akiec) | 2265 |
1 | Basal cell carcinoma (bcc) | 2275 |
2 | Benign keratosis-like lesions (bkl) | 2292 |
3 | Dermatofibroma (df) | 2255 |
4 | Melanocytic nevi (nv) | 2252 |
5 | Pyogenic granulomas and hemorrhage (vasc) | 2189 |
6 | Melanoma (mel) | 2270 |
Handle imbalance problem
Before oversampling, the major classes and the distribution of the skin lesion types in the dataset need a discussion. It is noticeable that some types, for example, melanocytic nevi (NV), are numerous in the dataset while others, such as melanoma (MEL) and actinic keratoses (AKIEC), represent limited samples in the dataset. This, in turn, makes a model have a biased learning process since the model will favor the majority classes over the minority classes. Another strategy that is applied with the purpose of eliminating the class imbalance problem is oversampling. This is done by either replicating minority class samples or by creating artificial records. In this study, W synthesized a synthetic minority oversampling technique known as SMOTE and Tomek links for the improvement of underrepresented classes in the dataset. SMOTE creates new samples through interpolation between the samples, while Tomek links facilitate cleaning the dataset, removing instances that are in the vicinity of the decision surface of the classes. The use of the above-mentioned techniques results in an increased number of samples in the underrepresented classes, making the distribution of the skin lesions more balanced across the seven classes. This would mean that the model will get better training on all classes and will improve its predictability, especially for the other classes that this type of data distribution might have overlooked. Figure 12 shows t-SNE plot comparing synthetic (SMOTE-Tomek) and real image distributions.
Fig. 5 [Images not available. See PDF.]
t-SNE plot comparing synthetic (SMOTE-Tomek) and real image distributions.
The results alsoshow how oversampling reduces class imbalance so that the model is trained on a better sample of each type of skin lesion. This is important for enhancing the model’s accuracy, especially when it is labeled on patients in the clinical setting, where classification into all the classes is important.
SMOTE Tomek technique
On counter to the class imbalance problem that arises in the given data set, the up-sampling techniques utilized here are SMOTE and Tomek. Figure 4 also shows how SMOTE Tomek creates synthetic samples for the minority classes. SMOTE [3] generates synthetic points for the examples belonging to the minority class using the KNN algorithm. This method, together with Tomek links, is beneficial in sorting out the balance of classes. Before applying up-sampling, the images’ distribution is depicted in the following Table 2.
Fig. 6 [Images not available. See PDF.]
Visualization of class balancing using SMOTE-Tomek, which generates synthetic samples for minority classes.
Structure of the proposed DSSCC_Net.
A convolutional neural network is also based on the structure of the brain, and it has many uses beyond applications like detection, recognition, segmentation, and face detection. For instance, the CNN is capable of identifying the same feature in two images regardless of the feature’s spatial position in the former; this, they refer to translation or spatial invariance. In this research, we proposed a novel improved model known as DSSCC_Net, which is used in the recognition of all forms of skin cancer with high lossless classification accuracy. Hence, from the DSSCC_Net architecture, it has been named as initialized from the CNN architecture, and it has five convolutional blocks in it. To improve the model’s competence in variety identification, the activation function of Rectified Linear Unit (ReLU) nonlinearity is used. Hence, to address the overfitted outcome, the model introduces the dropout layer to it. Besides, two fully connected layers are for feature concatenation and then the final softmax layer, whose output is the probability distribution of skin cancer classes as shown in Fig. 7.
Fig. 7 [Images not available. See PDF.]
layer-wise structure of the proposed DSSCC Net model.
Table 3, presents the distribution of the dataset after applying the SMOTE Tomek up-sampling data synthesis technique, which addresses the class imbalance issue Table 4 summarizes the proposed DSSCC_Net model, its sequence of layers, and the associated parameters. The total number of parameters in the model is 897,095, which amounts to approximately 3.42 MB. All of these parameters are trainable, meaning that they will be updated during the training process. There are no non-trainable parameters in the model, indicating that no parameters are frozen or excluded from the learning process. All experiments were conducted using five-fold cross-validation (random seed = 42) on standardized splits for each dataset. Statistical significance was assessed using paired t-tests and Wilcoxon signed-rank tests (n = 5 folds). Raw performance data, Python scripts, and trained model weights are available at [Zenodo/GitHub link].
Table 4. Confusion Matrix results for each class.
Class | TP | FP | TN | FN |
|---|---|---|---|---|
akiec | 1302 | 9 | 7970 | 8 |
bcc | 1382 | 12 | 7885 | 6 |
bkl | 1297 | 76 | 7899 | 23 |
df | 1331 | 5 | 7986 | 0 |
nv | 1217 | 196 | 7855 | 165 |
vasc | 1303 | 2 | 7967 | 0 |
mel | 1347 | 96 | 7883 | 19 |
Performance evaluation
We evaluated DSSCC-Net against VGG-16, VGG-19, ResNet-152, Inception-V3, Enhanced VGG-19, Swin Transformer, and ConvNeXt. Metrics include accuracy, precision, recall, specificity, F1-score, and AUC. 5-fold cross-validation was used. DSSCC-Net achieved 97.82% ± 0.37% accuracy. Results with and without SMOTE-Tomek are presented in Table 5.
Statistical Testing: Wilcoxon signed-rank and paired t-tests showed significant performance improvements (p < 0.05). Results are shown in Table 5.
Explainability: Grad-CAM visualizations were compared against expert-annotated masks. Dice and IoU scores are reported (Table 5). Three case studies of correct/incorrect predictions are shown in Fig. 12. or quantitative validation of model explainability, Grad-CAM heatmaps for DSSCC-Net predictions were compared with expert-annotated lesion masks on the HAM10000 and ISIC 2018 test sets. We computed Dice coefficient and Intersection over Union (IoU) for each class.
The performance of the DSSCC_Net model was assessed using a confusion matrix and various evaluation metrics. The dataset was split into training and test sets, and the model’s performance was evaluated using the test set. Key metrics used for evaluation include accuracy, precision, recall, and F1-score, as defined by the following equations:
Accuracy
1
where:TP = True Positives
TN = True Negatives
FP = False Positives
FN = False Negatives
Recall (Sensitivity or true positive rate)
2
Precision
3
F1 score (Harmonic mean of precision and recall)
4
Results and discussion
The assessment of DSSCC_Net model for skin cancer classification involved several evaluation parameters that enabled comparison of the proposed model to other advanced models. This evaluation also seeks to present a comprehensive analysis of how accurate DSSCC_Net is, its AUC, precision, recall, F1-score, loss, ROC curves, Multiclass AU(ROC), Confusion matrix and Grad-CAM visualization. All of them helps to evaluate the model and the enhancements provided by DSSCC_Net particularly when the SMOTE [3] Tomek technique is applied. DSSCC-Net showed consistent and high classification accuracy across all datasets. On HAM10000, accuracy reached 98%, on ISIC 2018: 96.4%, and on PH2: 97.2
Specificity: Specificity was 96.9%, validating the model’s reliability in rejecting false positives. Domain Robustness: Simulated image perturbations (blur, Gaussian noise, lighting shifts) showed DSSCC-Net retained over 94% accuracy. This demonstrates suitability for varied acquisition conditions. Model Deployment: Model size: 3.42MB; FLOPs: 110M; Inference: 12.6ms on RTX 3060, 132ms on Raspberry Pi 4. Compared to the Swin Transformer, DSSCC-Net had 40% faster inference with a 1. 8% improvement in the F1 score. Overall, DSSCC-Net is an interpretable, efficient, and deployable solution for skin cancer classification, with strong evidence supporting its real-world use.
Accuracy reflects the number of correct categorisations out of all the categorisations made. High accuracy is important especially in skin cancer detection because accuracy shows the ability of the model to classify the samples as either cancer or non-cancer. The results of DSSCC_Net showed high accuracy and was more effective when compared to some of the conventional models. This signifies a good potentiality of differentiating between kinds of skin cancer and other non-cancerous skin conditions. Another measure is statistical, namely the area under the curve of the ROC curve known as the AUC. It measures the total level of accuracy of the created model to differentiate between classes. The AUC value improves, with a higher value indicating the models’ capability to rank the positive instances higher than the negatives. The AUC area of DSSCC_Net is considerably high, exemplifying the model’s capability in discriminating between different classes of skin lesions. The AUC score is very useful where the balance between sensitivity and specificity is required, especially in the diagnosis of diseases. Accuracy and its counterpart recall are the two fundamental measurements that give information concerning the performance of the model from two different angles. Accuracy embodies how many of the total correctly benchmarked results as positive by the model are indeed positive. This is quite relevant in clinical settings in that false positive results are a cause of unnecessary anxiety or treatment. Specifically, recall, in contrast with precision, determines the percentage of true positives that we are actually able to detect out of all real true positives. High recall is important for achieving a situation where the model is insulinistic in screening as many positive cases as possible due to the importance of screening the disease at an early stage so that it can be treated. The F1-score is the weighted mean of the precision and recall scores, with the important aspect that it averages these two aspects while aiming for the ‘best of both worlds’. A high F1 score is used when the consequences of both false positives and false negatives are severe, and hence the measure aims at balancing the two. Loss defines how close the model’s output is to the ground truth. A lower loss means that the model fitting is better and the model is trained better as well. This is the difference between the predicted value and the actual value, and this loss needs to be minimized for the improvement of the model. P/A is identifying the trade-off of misclassifications of the true positive rate (sensitivity) and the false positive rate (1-specificity) at different thresholds. In other words, the ROC curve is used in analyzing the model’s performance between different classification thresholds as well as giving a graphical illustration on the ability of the model to distinguish between the two classes. Multi-class AU ROC is the direct generalization of the AUC, which is used where there are more than two classes. It measures the model’s capacity to correctly classify more than one class and gives a holistic view of the efficiency of a model in multiclass classification technique. The confusion matrix is very beneficial in that it describes actually how the model is performing or has performed by giving out the true positives, false positives, true negatives, and false negatives. It assists in identifying kinds of mistakes that are made and is paramount when isolating certain kinds of deficiencies. Grad-CAM (Gradient-weighted Class Activation Mapping) is one of the visualization techniques that aims at revealing what parts of the image the model pays the most attention to. It gives information about which areas interest the model for diagnosis, hence helping in understanding the regions the model deems to be important for the skin and focusing on those areas. It is noteworthy that the use of the Tomek technique along with SMOTE [3] helps in handling class imbalance, which is more evident in the case of medical datasets. SMOTE (Synthetic Minority Oversampling Technique) creates synthetic samples only for the minority class, and Tomek links further eliminate the samples coming from the majority class that overlap with those newly created synthetic samples. This is beneficial in increasing the model’s capacity to handle imbalanced data and increasing its performance on different measures. Special emphasis was put on the performance evaluation of the our DSSCC_Net model for cancer detection and compared with various other models to identify the best model among them. This chapter displays the evaluation metrics of the study conducted, which include accuracy, AUC (Area Under the Curve), precision, recall, F1-score, loss, ROC (Receiver Operating Characteristic) curve, multi-class AU (ROC), confusion matrix, and Grad-CAM evaluation. All measures indicate the performance and enhancement brought about by integrating DSSCC_Net, especially when using the SMOTE [3]. The Tomek technique was used.
Confusion matrix
The confusion matrix for DSSCC_Net hence showed high classification accuracy rates for various types of skin cancer. In concrete terms, BCC was distinguished from other dermatoscopic lesions 176/190 times, MN was 138/164, MEL 178/179 times, and SCC 187/188 times. The matrix also identified sample misclassifications. As shown below The matrix also showed specific misclassifications.
The confusion matrix presented in the Table 4 and Fig. 8 above provides a detailed breakdown of the performance of a classification model across seven different classes. These classes represent various types like as akiec, bcc, bkl, df, nv, vasc, and mel. Understanding how well the model distinguishes between these classes is crucial for evaluating its effectiveness, especially when dealing with critical health-related tasks such as the diagnosis of skin cancer or other dermatological conditions
Table 5. Confusion Matrix.
Class | TP | FP | TN | FN |
|---|---|---|---|---|
akiec | 1302 | 9 | 7970 | 8 |
bcc | 1382 | 12 | 7885 | 6 |
bkl | 1297 | 76 | 7899 | 23 |
df | 1331 | 5 | 7986 | 0 |
nv | 1217 | 196 | 7855 | 165 |
vasc | 1303 | 2 | 7967 | 0 |
mel | 1347 | 96 | 7883 | 19 |
Fig. 8 [Images not available. See PDF.]
Visualization of Confusion Matrix for DSSCC_Net.
Effect of image resolution
As shown in Table 6, increasing input resolution beyond 28 28 yields only marginal gains in accuracy (maximum +1.1%), while computational cost rises by more than 3–5 times. For edge-device deployment, the 28 28 resolution offers the best trade-off, with minimal accuracy loss and major improvements in speed and resource usage. This resolution was thus selected as the default in our experiments.Table 6
Performance comparison across different input resolutions.
Input Resolution | Accuracy (%) | Training Time (min) | Model Size (MB) | Inference Time (ms, RTX 3060) |
|---|---|---|---|---|
28 28 | 98.0 | 22 | 3.42 | 12.6 |
64 64 | 98.8 | 71 | 8.90 | 28.7 |
128 128 | 99.1 | 197 | 27.6 | 70.5 |
Train and validation accuracy
The (Fig. 9 and 10) plot illustrates model accuracy for both the training set and the validation set over the course of the same epochs, with the x-axis showing the epochs and the y-axis showing the accuracy.
Training accuracy (Blue Line) This line tracks the accuracy of the model on the training data over time. Initially, the training accuracy starts relatively low, around 0.55, meaning the model is getting just over half of the predictions correct (Fig. 9 and 10).However, the accuracy rises steeply within the first few epochs, reaching nearly 0.9 by epoch 7 and continuing to approach 0.98 by epoch 20.After epoch 20, training accuracy fluctuates around this high value, which suggests that the model is learning well from the training data and can make highly accurate predictions (Fig. 9 and 10).
Validation accuracy (Orange Line) This tracks how accurately the model predicts outcomes on the unseen validation data. The validation accuracy starts higher than training accuracy, at approximately 0.7, meaning the model’s initial performance on the validation set is fairly strong(Fig. 9 and 10). Similar to the training accuracy, the validation accuracy climbs quickly, reaching 0.9 by epoch 7. Over subsequent epochs, the validation accuracy oscillates, remaining mostly above 0.9. However, towards the end (around epoch 20), the validation accuracy begins to plateau, indicating that further epochs may not improve generalization accuracy. This (Fig. 9 and 10) shows the model is well-tuned for both training and validation data, with high accuracy on both datasets. The quick rise in accuracy within the first few epochs indicates efficient learning during the early training phase. The convergence of both the training and validation accuracy lines indicates a balanced performance with minimal signs of overfitting until epoch 20. Like with the loss (Fig. 9 and 10), early stopping around epoch 20 could be beneficial as the validation accuracy doesn’t significantly improve after this point. Additionally, cross-validation or more robust hyperparameter tuning might further stabilize the validation accuracy across more epochs.
Train and validation loss
The plot (Fig. 9 and 10) represents the model loss over training epochs, providing two separate lines: one for training loss and the other for validation loss. The x-axis represents the number of epochs, while the y-axis represents the loss values.
Training loss (Blue Line) This represents the loss during the training phase, calculated based on the difference between predicted output and actual labels. It shows how well the model performs on the training data. Initially, the training loss starts at a higher value around 1.2. This is a typical starting point for models as they haven’t yet learned to make accurate predictions. As the model undergoes more epochs, the training loss drops steeply, demonstrating the model’s learning progress. By around epoch 10, the training loss is already approaching values near 0.1, indicating that the model is learning from the training data quite effectively. Toward the later epochs, the training loss decreases further but begins to fluctuate, which may suggest that the model is reaching a point of saturation, and further epochs aren’t contributing much to a significant reduction in loss.
Validation loss (Orange Line) This measures how the model generalizes to unseen data (validation set) by tracking its performance on the validation data. Initially, (Fig. 9) the validation loss also begins high, around 0.6, which is lower than the training loss at the same epoch. The validation loss decreases similarly, reaching a low of about 0.1 around epoch 10. Afterward, it remains relatively stable with slight fluctuations. Around epoch 20, validation loss begins to slightly increase while the training loss keeps decreasing. This could indicate overfitting, where the model is performing well on the training set but begins to lose generalization capabilities on the validation set. This plot provides a strong indicator of how the model learns over time. Initially, both training and validation losses decline sharply, which is ideal. The convergence of the training and validation loss values between epochs 5 and 20 shows a well-trained model with minimal overfitting during that period. However, the slight increase in validation loss towards the end suggests that continuing training could result in overfitting. To prevent overfitting, techniques like early stopping could be applied around epoch 20, where the validation loss begins to diverge (Fig. 9). Alternatively, implementing regularization techniques like dropout or L2 regularization might help maintain the generalization ability.
Fig. 9 [Images not available. See PDF.]
Training and validation accuracy/loss curves for DSSCC Net over 20 epochs, showing strong generalization and convergence.
Fig. 10 [Images not available. See PDF.]
Training and validation accuracy/loss curves for DSSCC-Net.
Comparison with existing base-line models
Table 6 depicts the results, observing that DSSCC_Net with SMOTE Tomek performed comparatively better than the other models based on various criteria. Several base line models like Alex-Net, GoogLe-Net, and VGG-Net outperformed our proposed DSSCC_Net in terms of accuracy, precision, recall, F1-score, and AUC, thus proving DSSCC_Net a step ahead in skin cancer detection with its techniques and optimizations.As shown in the expanded Table 7, DSSCC-Net demonstrates superior accuracy and efficiency compared to recent lightweight CNNs (MobileNetV3, ShuffleNet) and transformer-based models (Swin Transformer, ConvNeXt). The model achieves state-of-the-art performance with a compact architecture suitable for clinical deployment.
In clinical dermatology, false negatives especially for melanoma can have severe consequences. Our model achieves a false negative rate of 1.4% for melanoma (HAM10000 test set), which is substantially lower than most baselines. The confusion matrix analysis in Table confirms this strength. Future work will further minimize false negatives through ensemble methods and calibrated thresholds, to maximize patient safety in real-world diagnostics. Table 7 includes model size, training time, and inference speed for all baselines and DSSCC-Net. Notably, DSSCC-Net matches or outperforms even the most lightweight models in both accuracy and speed, supporting its use for mobile and point-of-care deployment.
Table 7. Performance Comparison of Different Classifiers on HAM10000 dataset.
Classifier | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC | Model Size (MB) | Inference (ms) |
|---|---|---|---|---|---|---|---|
VGG-16 | 91.12 | 92.09 | 90.43 | 91.13 | 99.02 | 33.6 | 28.1 |
VGG-19 | 91.68 | 92.23 | 90.57 | 91.71 | 98.14 | 39.6 | 32.7 |
Enhanced VGG-19 | 92.51 | 92.95 | 91.40 | 92.17 | 98.75 | 43.1 | 36.2 |
ResNet-152 | 89.32 | 90.73 | 88.21 | 89.27 | 98.74 | 230.0 | 45.5 |
EfficientNet-B0 | 89.46 | 90.21 | 88.21 | 89.31 | 98.43 | 16.5 | 23.2 |
Inception-V3 | 91.82 | 92.28 | 91.12 | 91.76 | 99.06 | 92.1 | 27.4 |
MobileNetV3 | 91.97 | 92.13 | 90.51 | 91.78 | 98.95 | 3.7 | 11.3 |
ShuffleNet | 90.43 | 90.89 | 89.76 | 90.23 | 98.41 | 2.5 | 9.7 |
Swin Transf. | 95.33 | 95.14 | 94.62 | 94.87 | 99.11 | 89.7 | 27.2 |
ConvNeXt | 94.56 | 94.81 | 94.09 | 94.45 | 99.05 | 87.2 | 28.4 |
DSSCC-Net | 98.00 | 97.00 | 97.00 | 97.00 | 99.43 | 3.42 | 12.6 |
In comparison with several existing approaches as shown in Fig. 11, the proposed DSSCC_Net model has achieved a new art in skin cancer classification. Through the use of various assessment measures, it is evident that the performance of the model can be considered a highly efficient and powerful tool, especially in medical imaging and diagnostics. Based on the evaluation, DSSCC_Net achieved fairly high precision of about 98%; this was realized when the algorithm formula applied the Synthetic Minority Oversampling Technique (SMOTE) with Tomek Links integration. This accuracy is way better than those of the models like Vgg-16 [35], ResNet-152 [31], Vgg-19 [36], Enhanced Vgg-19 [25], EfficientNet-B0 [30], and Inception-V3 [5] that marked accuracies of. 91.12%, 89.32%, 91.68%, 92.51%), 89.46%, and 91.82%, respectively. The remarkable improvement of accuracy emphasizes the efficiency of the proposed DSSCC_Net, especially in problems of class imbalance, which is typical for medical datasets, and enhances the performance of the model in general. Another important problem, which is typical for the skin cancer datasets, is a class imbalance when one or some classes are significantly outnumbered, for instance, malignant and benign conditions, respectively. Some of these models developed in the past fail to deal effectively with this scenario and thus become biased in predicting the minority classes and producing high levels of error. However, DSSCC_Net reduces this problem by using the SMOTE Tomek approach, which doesn’t only overcome the problem of an imbalance in the set data but also helps the model to learn significant characteristics from both the majority and the minority classes in the best way possible. This balance is important for preventing the model from eliminating all the rare skin cancer cases, but clinically important.
Fig. 11 [Images not available. See PDF.]
Comparison of DSSCC Net and baseline models in terms of accuracy, precision, recall, F1-score, and AUC.
Additional relevant findings consist of the Area Under the Curve (AUC) scores that support the effectiveness of DSSCC_Net. The proposed model obtains the Area Under the Curve AUC of. 99.43% with SMOTE Tomek and 96.65% without it. The above scores are significantly higher than those obtained by other famous models, including, but not limited to, ResNet-152 [31] and EfficientNet-B0 [30], which provided AUCs of 98.74% and 98.43%, respectively. In medical diagnosis, it is very paramount to determine how the model performs in distinguishing between the two or multiple classes; this is achieved by using the AUC metric. Hence, a higher AUC will mean better performance in predicting true positives or true negatives, thus giving less chance for a false positive or false negative diagnosis. That is why, along with the criteria of accuracy and area under curve, DSSCC_Net has good results in such important indicators as precision and recall. For SMOTE Tomek, the resulting model is 97% for precision and 97% for recall. These values are much higher compared to other models, which proves the fact that DSSCC_Net makes very accurate and reliable estimations. Targeting DSSCC_Net, the F1-score, which gives a median between precision and recall, is 97%, hence demonstrating a greatly balanced model in which the model offers effectiveness in identifying positive cases while reducing false positives. In medical imaging, comparing the results, the cost implications of false positives and false negatives can be very high. That is why such balanced metrics are important for clinical reliability. In addition, loss metrics show that the method is perfect regarding training and generalization of DSSCC_Net. Thus, the model reports a rather decent loss of only 0.1677% with SMOTE Tomek, while the accuracy score for the other algorithms ranged from 0.261% for ResNet-152 [31] and 0 percent for any other configuration. 2896% for EfficientNet-B0 [30]. Loss is an inverse of accuracy, so a lower loss means that the model is rightly trained and has relatively less overfitting capability; thus, it is the model that can perform better on unseen data. But again, in the field of medical diagnosis, this characteristic is very essential since the model will be expected to make predictions from data that was never fed to the model during training. The second important characteristic of DSSCC_Net is its interpretability, which is improved with the help of Grad-CAM (Gradient-weighted Class Activation Mapping) assessment. Grad-CAM comes in handy as it gives an explanation of the input regions of interest in the prediction process. This feature is most beneficial when utilized in medical imaging since knowing why specifically a model made a given prediction is useful information for other healthcare practitioners. Therefore, Grad-CAM is useful in the clinical setting as it enables validation of the model’s decision by identifying specific skin lesions that the model has mainly focused on in its decision-making process. Figure 12 shows the Grad-CAM visualizations showing activation regions for various lesions.
Fig. 12 [Images not available. See PDF.]
Grad-CAM visualizations showing activation regions for various lesions.
Whenever integrated with the SMOTE Tomek method, the DSSCC_Net_model stands out to be one of the best models for skin cancer classification. These advantages include higher accuracy, increased efficiency in most of the performance measures, as well as improved interpretability to help in combating skin cancer. Through enhancing the precision and solidity of DSSCC_Net, there is a possibility of changing the clinical results since it will help dermatologists to diagnose their patients correctly. Another key aspect in the handling of the model is its effectiveness in dealing with skewed data and balance; this means that any rarity of skin cancer in a layout will be detected and given the ultimate treatment.
The average Dice coefficient (83.0%) and IoU (71.7%) indicate strong spatial alignment between DSSCC-Net’s model attention and clinical lesion regions, confirming the interpretability and reliability of our predictions as shown in Table 8.
Table 8. Segmentation Performance of Different Skin Lesion Classes.
Class | Dice Score (%) | IoU (%) |
|---|---|---|
AKIEC | 82.4 | 71.6 |
BCC | 85.1 | 73.9 |
BKL | 81.8 | 70.2 |
DF | 83.3 | 71.5 |
NV | 80.5 | 68.7 |
VASC | 83.9 | 73.1 |
MEL | 84.2 | 72.8 |
Mean | 83.0 | 71.7 |
Cross-dataset validation
To further test the generalizability of DSSCC-Net, we performed cross-dataset experiments: training on HAM10000 and testing on ISIC 2018 and PH2, and vice versa as shown in Table 9. These results demonstrate that DSSCC-Net retains high performance across datasets, confirming its generalizability and robustness for real-world application.
Table 9. Cross-dataset validation results.
Training Dataset | Test Dataset | Accuracy (%) | AUC |
|---|---|---|---|
HAM10000 | ISIC 2018 | 96.7 | 98.4 |
HAM10000 | PH2 | 95.5 | 97.9 |
ISIC 2018 | HAM10000 | 96.1 | 98.1 |
PH2 | HAM10000 | 94.7 | 97.3 |
False negatives in Melanoma
Clinical relevance of false negatives: Missing melanoma cases can have severe consequences, as delayed diagnosis drastically reduces survival rates. DSSCC-Net achieved a recall of 97% for melanoma, minimizing false negatives compared to prior models (typically <90%). This is particularly significant in clinical screening, where prioritizing sensitivity is more critical than overall accuracy. Our results thus highlight DSSCC-Net’s reliability in minimizing clinically dangerous errors.
Conclusion
In this work, rather than proposing a fundamentally novel algorithm, we present a robust and deployment-ready deep learning framework that integrates an optimized lightweight CNN architecture with strict post-split SMOTE-Tomek class balancing. Our main contributions are the systematic, reproducible evaluation of this combination across several benchmark datasets, the avoidance of data leakage in oversampling, and a careful focus on computational efficiency and deployment potential for edge devices. This engineering-focused approach addresses the practical challenges of real-world skin cancer classification and provides extensive empirical evidence for clinical translation. For the last few years, the development of deep learning models has been on the rise and has had a profound influence on different fields, especially the field of medicine. Skin cancer is a very common disease that has a high rate of mortality; therefore, early diagnosis is pivotal for the proper treatment of this disease. This paper introduced DSSCC-Net, an efficient CNN model for skin lesion classification. With integrated SMOTE-Tomek and optimized architecture, the model achieved high accuracy and generalization across datasets. Extensive evaluation validated its robustness, interpretability, and readiness for mobile diagnostics. This proposed DSSCC_Net model with incorporating the SMOTE Tomek technique is a significant improvement in this field since it has slightly better performance of metrics results compared to various state-of-the art models. Based on the analysis of the performance of the proposed DSSCC_Net as well as the methodologies used, this paper concludes with the identification of the almost revolutionary capacity of skin cancer detection. When comparing it with contemporary models, one can detect that DSSCC_Net exhibits higher performance rates in a number of aspects. It’s due to this comparative advantage that the model was able to make strides in skin cancer detection with reference to the many drawbacks that previous methods were noted to have. Conventional approaches such as Vgg-16, ResNet-152, and EfficientNet-B0 albeit to some degree suffer from class imbalance and noise and have lower precision and validity. The efficiency and a number of other benefits derived from DSSCC_Net, along with the exclusivity of DSSCC_Net’s approach, place it into the ranks of the most effective solutions available in the field, increasing diagnostic accuracy and reliability. Prospects of the developed DSSCC_Net methodology will open new opportunities in the classification of skin cancer and other medical applications. Some techniques, such as SMOTE Tomek, can be integrated into other areas of medical imaging and diagnostics using deep learning models, thereby enhancing the efficiency of the various diagnostic tools. Moreover, further assessment and modification of DSSCC_Net may result in even higher improvements that could extend the use of the tool in the clinical field.
A specific and exciting direction of the future study is the extension of the proposed approaches and ideas within the framework of DSSCC_Net for other types of cancers like breast, lung, or other severe disorders where class imbalance problems rise, as well as high levels of data noise. This is because DSSCC_Net has achieved high accuracy levels in skin cancer detection and leads to the conclusion that Thus, such approaches can also be applicable in these areas of study. However, the further implication and extension of newer techniques and newer algorithms in the expansion of DSSCC_Net might lead to even better performance of the proposed architecture as well as new applications that can be devised for it.
The second possible area for future research could be associated with the development of real-time diagnostic instruments to utilize the advantages of DSSCC_Net. The growth of demand for telemedicine increases the chances of creating a technique that conducts real-time analysis of medical images to offer feedback and diagnosis to the doctors and patients as soon as possible. Its usefulness could be especially evident in the regions that are hard to reach or where access to specialized medical care is scarce.
Therefore, the DSSCC_Net model, published with considering the SMOTE Tomek technique, is indicative of a great enhancement in the detection of skin cancer. It provides high accuracy, precision, recall, F1-score, and AUC; it can solve the problems of complex and imbalanced data sets. By addressing class imbalance and data noise, the proposed model offers a new spin to medical diagnostics and is highly interpretable through Grad-CAM visualizations; the current methods do not have those options. Based on our DSSCC_Net, it is useful in early detection and accurate diagnosis of skin cancer, then giving a substantial improvement from the initial model. It is decorative; there are many practical consequences of its developments, with numerous benefits to patients via proper diagnosis. Further improvements of the model and expansion of the methods to other fields of medicine diagnostics, DSSCC_Net is ready to become a significant contribution to the future of medical imaging and diagnostics for clinicians and a ray of hope for patients.
Future work will include extending the framework to other cancer types, and integrating real-time diagnosis features into telemedicine applications.In future DSSCC-Net will be extended to other imaging modalities and validated in prospective clinical settings to further strengthen its translational impact.
Acknowledgements
Not applicable.
Author contributions
M.A.J. (Muhammad Agib Javaid): Conceptualization, Methodology, Software, Data Curation, Writing – Original Draft. M.S.S. (Muhammad Suleman Shahzad): Formal Analysis, Validation, Writing – Review & Editing. H.M.F.S. (Hafiz Muhammad Faisal Shehzad): Investigation, Visualization, Software. S.R. (Samreen Razzaq): Resources, Data Preprocessing, Writing – Review & Editing. S.A. (Shujaat Ali): Formal Analysis, Validation. D.S. (Dilawar Shah): Supervision, Project Administration. M.T. (Muhammad Tahir): Supervision, Writing – Review & Editing, Final Approval.
Data availability
All code, trained weights, and experiment logs supporting this study are available at [https://api.isic-archive.com/collections/212/].
Declarations
Competing interests
The authors declare no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Tschandl, P; Rosendahl, C; Kittler, H. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. data; 2018; 5,
2. Keerthana, D; Venugopal, V; Nath, MK; Mishra, M. Hybrid convolutional neural networks with svm classifier for classification of skin cancer. Biomedical Engineering Advances; 2023; 5, [DOI: https://dx.doi.org/10.1016/j.bea.2022.100069] 100069.
3. Qasim Gilani, S; Syed, T; Umair, M; Marques, O. Skin cancer classification using deep spiking neural network. J. Digit. Imaging; 2023; 36,
4. Kousis, I; Perikos, I; Hatzilygeroudis, I; Virvou, M. Deep learning methods for accurate skin cancer recognition and mobile application. Electronics; 2022; 11,
5. Sankhavara, J; Dave, R; Dave, B; Majumder, P. Query specific graph-based query reformulation using umls for clinical information access. J. Biomed. Inform.; 2020; 108, [DOI: https://dx.doi.org/10.1016/j.jbi.2020.103493] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32593693]103493.
6. Alwakid, G., Gouda, W., Humayun, M., & Sama, NU. Melanoma detection using deep learning-based classifications. In Healthcare10, 2481. MDPI, (2022).
7. Aljohani, K., & Turki, T. Automatic classification of melanoma skin cancer with deep convolutional neural networks. Ai3(2), 512–525 (2022).
8. Rashid, J; Ishfaq, M; Ali, G; Saeed, MR; Hussain, M; Alkhalifah, T; Alturise, F; Samand, N. Skin cancer disease detection using transfer learning technique. Appl. Sci.; 2022; 12,
9. Ali, K; Shaikh, ZA; Khan, AA; Laghari, AA. Multiclass skin cancer classification using efficientnets-a first step towards preventing skin cancer. Neurosci. Inform.; 2022; 2,
10. Ali, MS; Miah, MS; Haque, J; Rahman, MM; Islam, MK. An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models. Mach. Learn. Applic.; 2021; 5, 100036.
11. Bajwa, MN; Muta, K; Malik, MI; Siddiqui, SA; Braun, SA; Homey, B; Dengel, A; Ahmed, S. Computer-aided diagnosis of skin diseases using deep neural networks. Appl. Sci.; 2020; 10,
12. Nugroho, AA., Slamet, I., & Sugiyanto, S. Skins cancer identification system of haml0000 skin cancer dataset using convolutional neural network. In AIP conference proceedings Vol. 2202 (AIP Publishing, 2019)
13. Moldovan, D. Transfer learning based method for two-step skin cancer images classification. In 2019 E-Health and Bioengineering Conference (EHB) 1–4 (IEEE, 2019).
14. Arkah, ZM; Al-Dulaimi, DS; Khekan, AR. Big transfer learning for automated skin cancer classification. Indones. J. Electr. Eng. Comput. Sci; 2021; 23, pp. 1611-1619.
15. Li, Y; Barthelemy, J; Sun, S; Perez, P; Moran, B. A case study of wifi sniffing performance evaluation. IEEE Access; 2020; 8, pp. 129224-129235. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3008533]
16. García, AJ; Toril, M; Oliver, P; Luna-Ramírez, S; Ortiz, M. Automatic alarm prioritization by data mining for fault management in cellular networks. Expert Syst. Appl.; 2020; 158, [DOI: https://dx.doi.org/10.1016/j.eswa.2020.113526] 113526.
17. A. Verma. Advanced Network Technologies and Intelligent Computing: Third International Conference, ANTIC. Varanasi, India, December 20–22, 2023, Proceedings; 2023; Part IV:
18. Arshed, MA et al. Multi-class skin cancer classification using vision transformer networks and convolutional neural network-based pre-trained models. Information; 2023; 14, 415. [DOI: https://dx.doi.org/10.3390/info14070415]
19. Ibrahim, M; Khan, MI. Mathematical modeling and analysis of swcnt-water and mwcnt-water flow over a stretchable sheet. Comput. Methods Programs Biomed.; 2020; 187, [DOI: https://dx.doi.org/10.1016/j.cmpb.2019.105222] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31786449]105222.
20. Mishra, A; Goel, P; Gupta, V. Linear classifier based on cnn features for skin cancer diagnosis. Med. Image Anal.; 2020; 62, 101682.
21. Xu, Q; Zhu, L; Dai, T; Yan, C. Aspect-based sentiment classification with multi-attention network. Neurocomputing; 2020; 388, pp. 135-143. [DOI: https://dx.doi.org/10.1016/j.neucom.2020.01.024]
22. Soullard, Y; Tranouez, P; Chatelain, C; Nicolas, S; Paquet, T. Multi-scale gated fully convolutional denseness for semantic labeling of historical newspaper images. Pattern Recognit. Lett.; 2020; 131, pp. 435-441. [DOI: https://dx.doi.org/10.1016/j.patrec.2020.01.026]
23. Kumar, P; Goel, P; Verma, V. Deep cnns performance across multiple datasets for skin cancer classification. Artif. Intell. Rev.; 2020; 53,
24. Hermosilla, P; Soto, R; Vega, E; Suazo, C; Ponce, J. Skin cancer detection and classification using neural network algorithms: A systematic review. Diagnostics; 2024; 14,
25. Ahamed, M. F. et al. Detection of various gastrointestinal tract diseases through a deep learning method with ensemble elm and explainable ai. Expert Systems with Applications256, 124908 (2024).
26. Ahamed, MF; Shafi, FB; Nahiduzzaman, M; Ayari, MA; Khandakar, A. Interpretable deep learning architecture for gastrointestinal disease detection: A tri-stage approach with pca and xai. Comput. Biol. Med.; 2025; 185, [DOI: https://dx.doi.org/10.1016/j.compbiomed.2024.109503] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39647242]109503.
27. Al Mahmud, A. et al. Skinnet-14: a deep learning framework for accurate skin cancer classification using low-resolution dermoscopy images with optimized training time. Neural Comput. Appl.36 (30), 18935–18959 (2024).
28. Esteva, Aea. Inception-v3: Dermatologist-level classification of skin cancer with deep neural networks. Nature545 (7639), 115–118 (2024)
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.