Content area
Skin cancer, one of the most serious types of cancer, affects a significant portion of the population. Image analysis has greatly enhanced automatic diagnostic accuracy compared to unaided visual assessment. Machine learning has emerged as a critical technique for automated skin lesion classification; however, its scalability is often constrained by the availability of high-quality annotated data for training. This research aims to perform segmentation and classification of skin lesions using novel deep learning techniques. Data samples were obtained from benchmark datasets, including HAM10000 and ISIC 2017, ensuring representativeness and diversity. Pre-processing involved hair removal followed by median filtering on the hair-removed images. Skin lesion segmentation was performed using the U-Net method, and features such as color, texture via GLCM, and RGB histogram features were extracted from the segmented images. The final classification phase utilized MLSTM with hidden neurons optimized using STBO, aiming to maximize accuracy and precision. The proposed model categorizes skin lesions as normal, benign, or malignant. The final classification phase utilized MLSTM with hidden neurons optimized using STBO, aiming to maximize accuracy and precision. The proposed model categorizes skin lesions as normal, benign, or malignant. Comparative analysis demonstrated that the MLSTM-STBO model achieves an accuracy of 97.20%, sensitivity of 97.14%, precision of 97.04%, specificity of 99.48%, F1-score of 97.08%, MCC of 98.12%, TPR of 96.17%, and FPR of 6.20%, outperforming traditional methods by margins up to 25.21%.
Introduction
Skin cancer is a type of cancer that affects the outermost layer of the skin, and early detection is crucial for improving patient outcomes [1]. In response to the growing incidence of skin cancer, researchers and medical professionals have developed advanced diagnostic techniques to improve the reliability and accuracy of detection [2]. Skin lesions, characterized by abnormal morphology, color, texture, and distribution, are precursors to skin cancer and are typically classified into melanocytic and non-melanocytic categories based on the presence of melanocytes and melanin pigment [3, 4].
A variety of imaging modalities [5] have been employed for diagnosing skin lesions, including dermoscopy, optical coherence tomography (OCT), confocal microscopy, high-frequency ultrasound, and digital photography. Among these, dermoscopy remains the most widely used, offering a non-invasive technique for obtaining magnified and well-illuminated images that allow early detection of skin cancer [6, 7]. While optical coherence tomography and confocal microscopy provide high-resolution imaging for detailed lesion examination, they are often costly and require specialized equipment. Digital photography, on the other hand, serves as a more accessible tool for lesion imaging, though with limited depth resolution [8].
Dermoscopy has significantly enhanced the diagnostic process by reducing the reliance on subjective evaluations of dermatologists. Figure 1 illustrates sample images of skin cancer, demonstrating the visual differences among lesion types. However, even with dermoscopy, manual analysis of biomedical images is time-consuming, prone to operational bias, and requires extensive expertise [9, 10]. These challenges underscore the need for computer-aided diagnosis (CAD) systems to assist dermatologists by providing automated, unbiased, and accurate analyses of dermoscopic images [11, 12].
Owing to DL’s usefulness and distinctive qualities in several complicated fields, such as the identification, detection, categorization, and recognition of objects, it has been broadly applied. It describes a ML approach that increases the model’s “depth” (sophistication) and changes the input utilizing different functions that permit for hierarchical data representation via various degrees of abstraction. Owing to the use of increasingly complicated models, DL may quickly and effectively handle increasingly complicated challenges. The major crucial component of popular CAD systems is the DL method, which includes image processing methods and CNNs [13]. The usage of these CAD models by dermatologists as well as patients, nevertheless, is still debatable since it is unclear how the feature encoding and model learning processes work. Dermatologists have trouble performing appropriate decisions using the DL paradigm because it lacks a logical justification. On times, the model’s forecasts are challenging for specialists to comprehend [14]. The decision-making procedures of dermatologists are not necessarily comparable to or accurate representations in the DL methods. As a result, these methods are frequently seen as having the “black box” aspect of ML techniques, which lacks a precise justification for their results. In the full rotation of decision-making, the absence of model openness related with DL methods in the detection of skin cancer cannot be disregarded [15]. Hence, the development of these strong ways is required to comprehend the black box judgments. These strategies are frequently known as interpretable DL. The sample images shown in Fig. 1 are taken from the HAM10000 and ISIC 2017 benchmark datasets, which are publicly available and widely used in skin lesion analysis research. These images illustrate the visual diversity across different lesion types, including normal, benign, and malignant categories.
[See PDF for image]
Fig. 1
Sample skin cancer images
The organization is arranged as follows: Section I introduces the features associated with skin lesions. The next section (Section II) talks about the review of existing literature. Section III deals with the proposed model along with pre-processing for the developed skin lesion model. Furthermore, Section IV is based on segmentation and feature extraction related to the developed skin lesion model. Finally, Section V demystifies classification and optimization for the developed skin lesion model. Section VI is results. Section VII is conclusion.
Motivation
Increased occurrence of skin cancer has underscored the urgent need to develop effective diagnostic tools in dermatology. Traditional evaluation methods depend heavily on subjective, visual inspection by dermatologists, which varies significantly between practitioners and results in delayed diagnoses and treatment. These lag times have a negative influence on patient outcomes. Deep learning has proved itself by changing various fields, including medical imaging, as it provides power for imaging analysis. The motivation for research is to capture the capabilities of deep learning for a strongly automated system in lesion segmentation and classification for skin lesions. These optimized models will make their accuracy and efficiency significantly better and subsequently address the real emergency of effective diagnostic tools in clinical practice. Advanced optimization techniques can further strengthen model performance so that the system can perform well even with limited amounts of annotated data. This research is much more than improving diagnosis; it also aims at enabling healthcare workers with decision support tools that increase their confidence in diagnosis. To limit improving the number of lives saved and health care quality of patients at risk for skin cancer, it will promote early detection and better management of skin lesions.
Contributions of the research work
This research focuses on developing a robust and automated system for skin lesion segmentation and classification using advanced deep learning techniques. The key contributions of this work are as follows:
Data Source and Preprocessing: The data samples were obtained from widely recognized and benchmark datasets, including HAM10000 and ISIC 2017, ensuring diversity, representativeness, and reliability. Preprocessing steps include hair removal using the Dull Razor method and median filtering to enhance image quality and remove noise, ensuring accurate feature extraction and segmentation.
Segmentation Approach: A U-Net-based segmentation method was implemented to accurately delineate the skin lesion regions from the preprocessed images. Performance evaluation parameters for segmentation, such as Dice Coefficient, Jaccard Index, and pixel-wise accuracy, were used to validate the effectiveness of the segmentation model.
Feature Extraction and Classification: Features were extracted from the segmented regions, including color moments (mean, variance, skewness, and standard deviation), GLCM-based texture features, and RGB histogram features. These extracted features were used for classification using Modified Long Short-Term Memory (MLSTM), which optimizes hidden neurons using Sewing Training-Based Optimization (STBO).
Performance Metrics for Classification: The classification model’s performance was comprehensively evaluated using metrics such as accuracy, precision, sensitivity, specificity, F1-score, MCC (Matthews Correlation Coefficient), TPR (True Positive Rate), and FPR (False Positive Rate). The proposed MLSTM-STBO model demonstrated superior performance compared to state-of-the-art methods across all metrics, highlighting its robustness and reliability.
Comparison with Existing Techniques: A detailed comparative analysis was conducted, showing that the proposed MLSTM-STBO model outperforms traditional methods, including NN, CNN, GAN, and standard LSTM, across segmentation and classification tasks.
These contributions collectively demonstrate the novelty and effectiveness of the proposed model, addressing both segmentation and classification challenges in the domain of skin lesion analysis.
Literature survey
In 2022, Yao et al. [16] have suggested a unique single-model oriented approach for the categorization of skin lesions on tiny as well as unbalanced datasets. Different DCNNs were first trained on a host of small as well as unbalanced datasets, so as to show that the methods of moderate complexity outperformed the more complicated ones. Secondly, DropOut and DropBlock regularizations were employed to tackle the short-term problem with sample underreporting. This directly addressed the issue of imbalanced sample sizes and classification complexity.
In 2021, Khan et al. [17] have categorized the skin lesion image specimens that came from various servers. The proposed system included two modules which were categorization as well as localisation of skin lesions. In the localization framework, we suggest a hybrid approach that combines saliency segmentation oriented on an enhanced HDCT. A maximal mutual information approach was suggested that yields the segmented RGB lesion image in order to make the most of the data collected from the binary images. Transfer learning was used in the classification algorithm to rebuild a DenseNet201 network on the segmented lesion images. Furthermore, a multi-class ELM classifier receives these combined features utilizing MCCA. The segmentation job was evaluated using four datasets (namely, ISIC2017, ISBI2016, PH2, and ISBI2018), whereas the classification task was evaluated using HAM10000, the major difficult dataset. The robustness the suggested framework was confirmed by the research observations when compared to cutting-edge techniques.
In 2019, Navarro et al. [18] have achieved the highest published findings on the basis of a unique adaption of superpixels methods. Furthermore, a crucial factor in the evaluation of skin lesions, the lesion’s progression, has received less attention from CAD systems. In other terms, an image registration procedure must initially be carried out on two or multiple images of the similar lesion that were taken at varying moments but had equivalent size, alignment, and angle of view. In this study, we also provide an image registration strategy that beats the best image registration methods. This enables the precise extraction of characteristics to judge the lesion’s progression when coupled with the suggested lesion segmentation technique. We propose a case study that includes the lesion-size characteristic, opening the door for the creation of autonomous systems that can quickly assess the progression of skin lesions.
In 2017, Kharazmi et al. [19] have proposed a new methodology for dermoscopy image-based cutaneous vasculature identification as well as segmentation. The retrieved vascular characteristics were again investigated for skin cancer classification. The image must initially be divided into melanin as well as haemoglobin parts. By doing this, the impact of pigmentation on blood vessel visibility was eliminated. The haemoglobin portion was subsequently grouped into pigmented, normal, and erythema areas. It also produces an image of a vessel. On a collection of pixels supplied by an analyst, segmentation sensitivity as well as specificity of 90% and 86%, respectively, were attained. On the basis of the findings, we established and retrieved characteristics for lesion identification in BCC to again highlight the efficiency of the suggested technique. Utilizing solely the retrieved vascular characteristics, the suggested technique outperforms a few remaining state-of-the-art techniques by 96.5% with respect to AUC for separating BCC from benign lesions.
In 2020, Xie et al. [20] have suggested using the MB-DCNN architecture for segmenting and categorizing skin lesions. In this manner, both segmentation as well as classification networks support and exchange information with one another in a bootstrapping manner.
In 2022, Nigar et al. [21] have recommended classifying skin lesions on the basis of XAI to enhance the accuracy of skin lesion categorization. The dermatologists would be better able to diagnose skin cancer in its earliest phases with reason thanks to this. The ISIC 2019 dataset was used to verify the suggested XAI framework. The eight different kinds of skin lesions are successfully identified by the proposed methodology, with precision, accuracy, recall, as well as F1 score of 93.57%, 94.47%, 94.01%, and 94.45%, correspondingly. The LIME architecture was used to additionally examine these forecasts and produce visual explanations that were consistent with quality standards for explanations and assumptions. The method’s usefulness in actual clinical practice would be improved by the modeling and analysis features we’ve incorporated.
In 2020, Tang et al. [22] have suggested a GP-CNN architecture. In particular, the G-CNN was utilized to extract the relevant data from dermoscopy images and create theCAM. It was trained using downscaled dermoscopy images. In contrast to various present techniques that require external data, experimental findings show that the introduced technique could attain the traditional efficiency.
In 2021, Bian et al. [23] have utilized transfer learning for skin lesion detection problem. This approach allowed for the acquisition of important knowledge while ignoring unfavorable examples from the source domain. The frequent studies showed that the technique could efficiently complete Melanoma.
In 2017, Satheesha et al. [24] have developed a non-invasive computerized dermoscopy method that takes skin lesion depth into account while making a diagnosis. A method for reconstructing 3-D skin lesions utilizing the predicted depth seen in typical dermoscopic images was given. Depth as well as three-dimensional form characteristics was taken from the three-dimensional reconstruction. Regular color, texture, and 2-D form characteristics were also retrieved in additional to 3-D data. It was essential to do feature extraction to get correct outcomes. Efficiency was assessed while taking into consideration various feature set combinations. There has been a noticeable performance increase since the addition of 3-D and approximated depth features. On the PH2 data set, the classification scores of specificity = 97%, sensitivity = 96%, and specificity = 99%, sensitivity = 98% were obtained. The results of experiments to calculate tumor depth using 3-D lesion reconstruction are reported. The findings of the experiments show that the computerized dermoscopy method that has been presented was effective and could diagnose a variety of skin lesion dermoscopy images.
In 2020, Adegun and Viriri [25] have suggested an innovative paradigm for the automatic recognition of skin cancer that segments and classifies skin lesions. The encoder stage through this proposed scheme was made to try to gain the coarse look of the lesion while the decoder stage had to be working on the fine specifics of lesion borders. An additional integration module that dealt with CRF (paired edge potential) for the contour refinement of lesion borders localization was also integrated in the system. To reconstruct this second stage, a new architecture based on FCN-oriented DenseNet was suggested. This gone to optimize using hyper-parameter approaches to reduce congestion in networks and maximize throughput of the system. This leads to effective with lesser information and being used with as few parameters as possible required for reusing the features. The developed approach was evaluated against the most recent HAM10000 dataset consisting of more than 10,000 pictures of seven different diseases, yielding recall, accuracy, and AUC scores of 98%, 98.5%, and 99%. Table 1 consolidates the inferences from the above literature review and presents in a tabular format.
Table 1. Comparative analysis of existing methods
Reference | Model | Dataset | Results | Limitation | Future Work |
|---|---|---|---|---|---|
Yao et al. (2022) [16] | Single DCNN with DropOut and DropBlock regularization | ISIC 2018, HAM10000 | Improved accuracy on imbalanced datasets | Limited to small datasets; no multimodal integration | Inclusion of large datasets and multimodal data |
Khan et al. (2021) [17] | Hybrid saliency segmentation with DenseNet201 | ISIC 2017, HAM10000 | High accuracy (92.5%) and precision (91.8%) | High computational cost and lack of interpretability | Optimize computational cost and improve model interpretability |
Navarro et al. (2019) [18] | Superpixel-based segmentation and lesion progression analysis | Custom dataset | Accurate lesion registration for tracking progression | Limited to lesion size changes; not suitable for complex lesions | Extension to multimodal lesion features and real-time tracking |
Kharazmi et al. (2017) [19] | Vascular structure segmentation for BCC classification | PH2 | High AUC (96.5%) | Focused only on vascular features; not generalized to other lesion types | Generalization to non-vascular lesion types |
Tang et al. (2020) [22] | GP-CNN for lesion classification | ISIC 2019 | State-of-the-art accuracy (93.4%) | Requires external data for training | Develop self-contained models |
Nigar et al. (2022) [21] | XAI-based deep learning framework | ISIC 2019 | High accuracy (94.5%) with visual explanations | Limited interpretability of explanations in clinical scenarios | Enhance explanation quality and clinical validation |
Adegun & Viriri (2020) [25] | DenseNet-based framework with CRF contour refinement | HAM10000 | High recall (98%) and AUC (99%) | High resource requirements and lack of lightweight alternatives | Develop lightweight models suitable for low-resource settings |
Problem statement
Skin lesions have been great havocs for individuals as they are proven precursors of skin cancers like melanoma. Early diagnosis and prompt clinical treatment are imperative prerequisites to achieving enhanced patient outcomes. By and large, the traditional approach employed in the examination of skin lesions is cumbersome and dependent on the knowledge of dermatologists, thereby affecting diagnosis. The prospect would quite emerge as a powerful use of deep learning methods in the medical image-processing arena, paving the way for improved segmentation and classification of skin lesions. Nevertheless, there are daunting challenges to deal with. There is a need for large annotated datasets, and also model architectures should be optimized for enhanced accuracy at a lower computational cost. This work thus aspires to secure an automated and reliable solution to dermatological assessments and therefore for timely intervention and superior patient handling under clinical settings.
Proposed methodology
Data acquisition
The data used in this study were obtained from two widely recognized and benchmark datasets: HAM10000 and ISIC 2017. These datasets are specifically curated for skin lesion analysis, providing a diverse and representative set of images for training and testing.
HAM10000 Dataset: The HAM10000 (Human Against Machine with 10,000 Training Images) dataset contains 10,015 dermoscopic images representing seven different types of skin lesions: melanocytic nevi, melanoma, benign keratosis-like lesions, basal cell carcinoma, actinic keratoses, vascular lesions, and dermatofibroma. The dataset includes high-quality annotations provided by expert dermatologists, ensuring reliability and clinical relevance.
ISIC 2017 Dataset: The ISIC 2017 dataset, part of the International Skin Imaging Collaboration (ISIC) archive, consists of 2,000 dermoscopic images categorized into three classes: melanoma, nevus, and seborrheic keratosis. The dataset provides expert-annotated ground truths, including segmentation masks and classification labels, enabling comprehensive model evaluation.
Both datasets include a wide variety of lesion types, lighting conditions, and resolutions, ensuring robustness and generalizability of the developed model. These datasets were split into training, validation, and testing subsets using an 80:10:10 ratio, ensuring that the model was trained on diverse samples while being evaluated on unseen data.
Both the HAM10000 and ISIC 2017 datasets are known to exhibit significant class imbalance, with a disproportionately higher number of benign lesions (e.g., melanocytic nevi) compared to malignant cases (e.g., melanoma). To mitigate the adverse effects of this imbalance and to ensure a fair and unbiased evaluation, a combination of data augmentation and weighted loss functions was employed. Data augmentation techniques included horizontal and vertical flipping, random rotation (± 15°), zooming (up to 20%), and contrast adjustment, applied predominantly to minority classes during training to synthetically increase their representation and variability. Additionally, a class-weighted cross-entropy loss function was adopted during training to penalize the misclassification of underrepresented classes more heavily. This dual strategy of data augmentation and weighted loss improved the model’s sensitivity and reduced bias toward dominant classes, contributing to more balanced classification performance across normal, benign, and malignant categories.
The data utilized in this study were sourced from two publicly available and widely accepted benchmark datasets: HAM10000 and ISIC 2017. The HAM10000 (Human Against Machine with 10,000 training images) dataset contains a total of 10,015 dermoscopic images spanning seven diagnostic categories, namely melanocytic nevi, melanoma, benign keratosis-like lesions, basal cell carcinoma, actinic keratoses, vascular lesions, and dermatofibroma. This dataset was accessed from https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000 on June 25, 2024. The ISIC 2017 dataset, part of the International Skin Imaging Collaboration (ISIC) archive, consists of 2,000 dermoscopic images and provides expert-annotated ground truth segmentation masks and classification labels for three classes: melanoma, nevus, and seborrheic keratosis. This dataset was accessed from https://challenge.isic-archive.com/data/#2017 on June 25, 2024.
For model development, the combined total of 12,015 images was split into training, validation, and testing subsets using an 80:10:10 ratio. Specifically, 9,612 images were used for training, 1,202 images for validation, and 1,201 images for testing. The dataset split was performed using stratified sampling to preserve the class distribution across each subset, ensuring fair representation of both majority and minority classes in all stages of evaluation. This structured partitioning, along with the inclusion of expert-labeled segmentation masks from ISIC 2017 and diagnostic metadata from HAM10000, ensured that the proposed skin lesion segmentation and classification model was trained and evaluated on a diverse and clinically relevant dataset.
Proposed model
A skin lesion segmentation and classification model has been developed, and it contains several stages such as data collection, pre-processing, segmentation, feature extraction, and classification. Initially, the data is accumulated from the standard benchmark sources. These benchmark sources include various images related to the skin cancer. Further, the preprocessing of the image is done by the hair removal and filtering hair removed images. This pre-processing phase is performed in order to eliminate the hairs and filter obtained benchmark images. Further segmentation from the pre-processed images is done by the use of U-Net approach. This stage segments the skin cancer portion from the pre-processed images, which will be very helpful for extracting the features. These segmented images are the main sources from which colour features, texture features such as GLCM, and RGB histogram features are extracted. Here, the raw data is transformed into features, which in turn helps for better classification. Moreover, the classification is done using MLSTM, while its hidden layer of LSTM has been tuned based on STBO with considering the most optimized objective function of maximizing both accuracy and precision. In the final step, this MLSTM classifies the skin lesion into whether it is normal, benign, and malignant. Developed Skin Lesion Model is shown in Fig. 2. The U-Net segmentation model was trained using paired dermoscopic images and their corresponding ground truth masks available from the ISIC 2017 dataset. These ground truth annotations, manually labeled by dermatology experts, were used to supervise the segmentation process and enable precise delineation of lesion boundaries during training.
[See PDF for image]
Fig. 2
Developed Skin Lesion Model
Preprocessing
Pre-processing for the developed skin lesion model is done here by hairs removal and filtering out hair removed images. Images of skin lesions frequently have artefacts, which makes segmentation challenging. These methods can quickly identify skin traits like freckles on the basis of colour or size. As a result, during the segmentation stage, artefacts like hair and shading—which are often also darker than healthy skin—might be misinterpreted for lesions.
Hair removal The detection of the boundary of skin lesions is negatively impacted by the existence of hair and certain aberrations, like light reflections and air bubbles, which leads to incorrect segmentation. In order to remove hair from images, the common Dull Razor method is used as a pre-processing step. The DullRazor tool [33], which is constructed utilising the below three key processes, was employed in this research project.
By using a generalised grayscale morphological closure technique, it locates the sites of the dark hair.
Bilinear interpolation replaces validated neighbour pixels and confirms thin long man-made hair pixel structures.
It uses a median filter to smooth the hair pixels that were substituted.
Filtering hair removed images Even when hair is removed from images of skin lesions, noise may still be noticeable. Scratches on the skin represent a similar type of noise to bubbles. These were eliminated during the screening procedure. The median filtering method was employed in this investigation. It has become a fairly common phenomenon to keep up with the customary removal of noise in an image as it is being processed so as to enhance the results of further processing.
The median filter stands as the most recognized order-statistics filter, where pixel value gets substituted with the median averaged grey levels in its neighbourhood.
1
Here, represents the (x, y) entry in the gray-level co-occurrence matrix (GLCM), where x and y denote gray level values at specific spatial offsets.
The median calculation occurs through accounting for the pixel’s original value. As it provides excellent noise reduction capabilities for some kinds of random noise but much less blurring than similarly sized linear smoothing filters, the median filter is very popular.
Segmentation
The U-Net segmentation method is used to segment the developed skin lesion model. This consists of a shrinking path (on the left) and an expanding path-on the right. The energy function is computed using the cross-entropy loss function and pixel-wise soft-max. The definition of the soft-max is, in which signifies the activation in feature channel at the pixel location using . defines the total count of classes, while defines the maximum function’s approximation. In other words, for the with the highest level of activation, , and for every other , . The departure of from (2) is next penalised at every point utilizing the cross entropy.
2
Here, defines a weight map that we added to provide few pixels greater relevance in the training and shows the genuine label of every pixel. Use of morphological procedures is used to determine the separation boundary. Next, the weight map is measured as
3
The provides the weight map used to balance the class frequencies. The stands for the distance to the nearest cell’s border, while represents the distance to the second-nearest cell’s border.
In deep networks with multiple convolutional layers and diverse paths in the network, an appropriate initialization connected with the weights is essential so that some parts of the network do not activate too much, while other sections do not. Every feature map in the system needs to properly carry about one unit of variance right after the initial weights are changed. One way to accomplish this may be with choosing the starting weights from a Gaussian distribution with standard deviation of , in which represents the count of input nodes of single neuron, for a network having the design (interspersing convolution as well as ReLU layers).
Feature extraction
The features are extracted from the segmented images using color features, texture features by GLCM, and RGB histogram features. Each of these techniques is explained as below. Due to the fact that color and texture are the sole features that predominate in the lesion zone, these features were chosen.
Colour features [34] Four statistics, namely standard deviation, mean, variation, and skewness, are derived from segmented lesion patches which are generated through separate channels of six distinct colour spaces: HSV, RGB, YCbCr, CIE L*u*v, NTSc, and CIE L*a*b. All these are derived in order to characterise the colour comprised in a lesion. The phrase “colour moments” also applies to these four colour features.
The choice of color moments, texture descriptors based on Gray Level Co-occurrence Matrix (GLCM), and RGB histograms is motivated by their proven relevance in dermatological image analysis. Skin lesions exhibit distinctive chromatic variations, including asymmetry in pigmentation, irregular borders, and heterogeneous color distribution, features effectively captured through statistical color moments across multiple color spaces. Texture patterns such as granularity, streaks, and dots are significant clinical indicators of malignancy, and GLCM-based texture features have been widely validated for capturing such spatial intensity relationships in dermoscopic images. RGB histograms, while simple, preserve essential color distribution characteristics that complement other statistical descriptors. Although deep features from CNNs offer high abstraction and discriminative power, handcrafted features provide interpretability and domain-specific insights crucial for clinical validation and expert understanding. Moreover, combining these handcrafted features with deep learning models like MLSTM ensures a hybrid representation that balances interpretability and performance.
Consider represent the pixel in a colour space’s channel of an image having pixels. The definition of the four colour moments is as below.
Moment 1 Mean represents the channel’s average colour value and is determined by,
4
Moment 2 The square root of the variance of the distribution represents the standard deviation.
5
Moment 3 Skewness represents a metric for the degree of asymmetry in the distribution.
6
Moment 4 The distribution of colours can vary, which is called variance and it is provided by.
7
Here the four properties were calculated with respect to each channel, resulting in 72 features that can be combined in: (3 channels) (6 colour spaces) (4 properties).
Texture features A series of statistical texture descriptors in the forms of GLCM were used to characterize the texture inherent in a lesion. GLCM-oriented texture descriptions are familiar and often employed approaches for texture computation. In two processes, the texture characteristics of an image are obtained by co-occurrence. The first focuses on building GLCM, where pairs of pixels are treated in terms of spatial co-occurrence using a certain angle and distance. Secondly, a collection of scalar quantities that differentiate various facets of the original texture are calculated using the GLCM. The count of distinct grey levels in an image, , is represented by a square matrix called the GLCM. A relative frequency is one given by an element of a GLCM of an image, where defines the grey level of pixel at position , while shows the grey level of the pixel at a distance from in orientation , an example of relative frequency in a GLCM specific to an image. It determines how a gray-level pixel and a pixel having a value pixel line up horizontally.
The normalised GLCM was calculated for each of the four orientations (00, 450, 900, and 1350) in order to get texture characteristics. A feature vector with 14 dimensions is generated by calculating each of the 14 features individually for every direction, and subsequently averaging the features across all orientations.
From the GLCM, a variety of texture features can be retrieved. The notation we employ is as follows: stands for the quantity of grey levels. The average value of is . The means and standard deviations of and are denoted by , , , and . The item in the matrix of marginal probabilities, , is represented by the sum of the rows of:
8
9
10
11
12
13
14
15
Below features are also utilized.
Homogeneity, ASM The homogeneity of an image is measured by the ASM. A homogenous image has only a small number of grey levels, resulting in a GLCM having a small number of yet reasonably high values. As a result, the squares’ total would be large.
16
Contrast The effects from away from the diagonal, i.e., , would be preferred in this measurement of contrast or local intensity fluctuation.
17
Local Homogeneity, IDM The uniformity of the image might also affect IDM. Owing to the weighting component , the inhomogeneous zones would only contribute marginally to the IDM. In consequence of these, homogeneous pictures have a significantly larger IDM value than inhomogeneous ones.
18
Entropy A homogeneous scene does have high entropy, whereas inhomogeneous sceneries get lower initial order entropy.
19
Correlation The grey level linear dependency among the pixels at the designated places in regard to one another is measured via correlation.
20
Sum of squares, variance This feature gives the components that deviate from the average value considerably large weights.
21
Sum average It is shown as below.
22
Sum entropy This is described as follows.
23
Difference entropy This is shown as below.
24
Inertia This is explained as follows.
25
Cluster shade This is defined as below.
26
Cluster prominence It is described as follows.
27
RGB histogram features [35] An image histogram indicates the tonal distribution of a digital image in a visual representation. Counts of pixels against every tonal value have been plotted. The vertical axis of the plot indicates the count of pixels that have that specific tone, while the horizontal axis indicates fluctuations in tones. A technique for characterising the colour component of an image, the colour histogram counts the instances of every colour in the image. The count of bins is adjusted to 16 for each sub band to generate RGB histogram characteristics. The sum of the 4096 bins resulting from the three components is equal to 163. Thus, a 4096-point RGB histogram serves as a representation for each sample. Totally, 4182 characteristics pre-extracted data from every region of lesion, like 72 in color, 14 from texture and 4096 histogram features in RGB.
Classification
The classifier for the created model of skin lesion has been done using the MLSTM method. In this case, the intention is to maximize the accuracy and precision of LSTM hidden neurons tuning by STBO. To exhibit the interaction of the image, traditional LSTM is used to train the input-hidden state by . In general, there are five components of a typical LSTM, i.e., the input gate, the forget gate, the output gate, the input modulation gate, and the cell memory state. One such typical LSTM unit at time step u can, thus, be described as follows:
28
29
30
31
32
33
The Input Gate, the Forget Gate, the Output Gate, the Input Modulation Gate, and the Memory Cell State are all represented as in accordance with , , , , and; defines a hyperbolic tangent ; defines a sigmoid function; defines an elementwise product; and defines weight matrices; and defines a bias vector. Especially, this representation can determine the function of the forget gate considers toward the resultant present stated^u-another way of putting this here is-that the present state is inversely proportional as to how it influences obtaining . Furthermore, entering the input gate will regulate what portions of the new input data arriving at time step will update memory within that cell, while the output gate understands how the present state can determine the output of the LSTM unit at time step .
In an effort to overcome the limitations of existing LSTM, the hidden neuron parameters of the LSTM are fine-tuned by STBO considering the maximization of accuracy along with precision. It is called an MLSTM. This MLSTM is presented below, which classifies the final result into normal, benign, and malignant: this is classified by an MLSTM for the skin lesion classification as suggested. For an MLSTM model, as shown in Fig. 3 below, which is for classifying the proposed skin lesion model. The major objective function behind the MLSTM-based skin cancer classification can be shown in the equation as below.
34
In the above equation, defines the maximization function, hidden neurons of LSTM is defined by , accuracy is shown by , and precision is given by respectively. Figure 3 represents the MLSTM for the proposed skin lesion classification model.
[See PDF for image]
Fig. 3
MLSTM for the proposed skin lesion classification model
The layer wise details of the MLSTM is depicted in Fig. 4.
[See PDF for image]
Fig. 4
Layer details of proposed MLSTM
STBO optimization
The optimization STBO here is to boost the classification of the skin lesion classification design by optimizing hidden neurons of LSTM for the maximization of accuracy and precision in classification output. STBO, a brand-new human-oriented meta heuristic method, is useful for addressing optimization challenges. Teaching novice tailors how to sew is the greatest motivation for STBO. The philosophy of the STBO technique has three stages: (i) instruction, (ii) imitation of the teacher, and (iii) practice.
To enhance the performance of the Modified Long Short-Term Memory (MLSTM) network, this study integrates a metaheuristic algorithm known as Sewing Training-Based Optimization (STBO). Inspired by the real-world learning process of tailoring apprentices, STBO mimics three fundamental phases of skill development: learning from instruction, imitating experienced mentors, and practicing independently. These stages help iteratively improve candidate solutions which in this case, the hidden neuron configurations of the MLSTM model in order to maximize classification accuracy and precision.
At a high level, STBO operates as follows:
Initialization: A population of possible LSTM configurations (i.e., different hidden neuron counts) is randomly generated.
Instruction Phase (Exploration): Each candidate explores new configurations by referencing more successful configurations (mentors), promoting global search.
Imitation Phase (Guided Learning): Candidates adjust their current configuration by mimicking specific traits from mentors, refining promising directions.
Practice Phase (Exploitation): Fine-tuning is done by making small adjustments near the best configurations to improve performance further.
Evaluation and Selection: Each configuration is evaluated using a fitness function based on accuracy and precision. The best one is retained.
Iteration: Steps 2–5 are repeated until convergence is reached or the maximum number of iterations is achieved.
This process ensures that the MLSTM model is optimized not only for learning long-term dependencies but also for selecting an architecture that generalizes well across all lesion categories. The flow process is clearly depicted in Fig. 5.
[See PDF for image]
Fig. 5
Flow process of STBO optimization process
Every individual of the STBO community makes reference to a potential answer to the issue, which embodies the suggested numbers for the decision factors. As a consequence, the STBO number may be formally represented by a matrix and every STBO individual by a vector. A matrix model in Eq. (35) specifies the STBO population.
35
Here, shows the total count of STBO population members, describes the total count of issue variables, describes the STBO population matrix, and is the STBO member. Equation (36) initializes all population members at random at the beginning of the STBO procedure.
36
Here, represents the variable’s value as decided by the component of the STBO. and stand for the lower and upper bounds of the problem variable, correspondingly, shows a random count in the range [0, 1]. Every participant in STBO is a simulated possible solution to the problem. All the values generated from each solution can give a way to evaluate with respect to the objective function of the problem. The values computed for the fitness function can also be described by a vector through Eq. (37) based on the placement of potential solutions within the variables of the issue.
37
Here, shows the fitness function value for the candidate solution and describes the objective function vector.
The basic requirement for contrasting potential solutions is the fitness function’s values. The optimal candidate solution, also known as the optimal member of the population, defines the solution that has the highest value for the objective function. Every iteration results in the discovery of fresh values for the objective function. Therefore, the optimal candidate solution has to be updated after every cycle. The method’s structure ensures that the top candidate result from every iteration is also the top candidate result from entire earlier iterations.
The updating of the candidates answers in STBO is accomplished at three stages namely: Instruction: copying the techniques of the instructor, and practicing.
Stage 1: Training (exploration) As for an introductory step, the upgrading of STBO participants is primarily dependent on how the beginning tailors model their choice of training teacher and learn for themselves how to stitch. In general, all STBO participants having imparting values higher than the fitness function are considered as training teachers for any STBO person who is a novice tailor. Every STBO person’s potential training teachers are included in the collection of entire candidates. This identity is used to define.
38
Here, shows the collection of entire potential candidates for the STBO participant’s training teachers. The sole eligible training teacher in the scenario where is itself, i.e.,. Next, a person from the group is randomly chosen as the training teacher of the individual of STBO, and it is labelled as. This process is repeated for every. The STBO participant is taught sewing techniques by the chosen teacher. Individuals in the group are guided by trainers as they examine several regions of the search area to determine the primary ideal location. This stage of STBO reveals the exploratory potential of the proposed method for global searching. A novel location is first created for each population member according to this stage of the STBO before modifying them.
39
In this case, takes on integers, generated randomly from the set of digits {1, 2}, and represents random numbers selected uniformly at random within the boundary of 0–1. depicts the dimension to be associated with the population in question, whereas defines the value of the objective function, etc. Thereafter, the previous position of that individual of the population would be replaced by that position if the latter has produced a better value of fitness function. This updating condition is defined by Eq. (40).
40
Thus, according to the STBO’s first stage, represents the member’s novel position.
Stage 2: Imitation linked with the instructor skills (exploration) During the second step of advancing STBO learners where novice tailors are supposed to imitate the performance of those teachers, they are presumed to try completely raising their stitching capability to that of the teacher in the STBO model. Considering that every STBO element reflects a decision variable and that every STBO component defines a vector of dimension , it is presumed in this stage of STBO that every decision variable reflects a sewing skill. Every STBO member emulates the selected instructor’s talents, . This shift is made by moving the populations of the methods at different coordinates in the search space and proves the ability of the STBO for exploration. The variables among the training ability of the individual person who imitates STBO are given in Eq. (41).
41
Here, defines a mixture of the collection of choice variable indexes (i.e., abilities) that the student from the teacher chose to mimic, and defines the total count of iterations, defines the count of talents chosen to imitate, and defines the iteration timer. Utilizing the below identification, the novel position for every STBO individual is determined on the basis of the simulation of copying these teacher skills.
42
Here, shows the dimension of which is the novel created location for the STBO member depending on the second phase of STBO. If this new local location grows, it will replace the previous location of the associated element, substituting it for the fitness function.
43
Here, shows the value related to the fitness function for.
Stage 3: practice (exploitation) This third phase of STBO upgrading is devoted to modelling amateur sewing methodologies with a view to improving stitching skills. In reality, this is the part where at the STBO layout phase, the local search is being done within the neighbourhood of the possible solutions to find out optimum close to those possible solutions. This STBO stage demonstrates how well the suggested method may be used for local search. A novel location surrounding every STBO membership is initially produced using Eq. (44) in order to statistically describe this STBO stage (with an adjustment to keep entire novel measured population members in the specified search area).
44
Here, shows a random integer drawn from the range [0, 1] and . Next, as per Equation. (45), the STBO participant’s prior location is replaced if the goal function’s value increases.
45
Here, shows the value related to the fitness function, shows the novel produced position for the STBO individual depending on the second phase of STBO, and shows its dimension.
Repetition procedure The completion of this stage marks the end of the first STBO iteration after analyzing all the potential solutions in relation to the first three stages. Up to the system’s final iteration, which is dependent on Equations (38) to (45), the update procedure is repeated. The best candidate solution noted throughout the method round is presented as the solution when the STBO has been fully applied to the provided issue. Algorithm 1 displays the STBO pseudocode.
Experimental results and analysis
Experimental setup
The outputs were analyzed from the skin lesion classification model based on MLSTM developed in MATLAB. All experiments were conducted using MATLAB R2021b running on a Windows 10 system with an Intel Core i7 processor and 16 GB RAM. Key MATLAB toolboxes utilized include the Deep Learning Toolbox, Image Processing Toolbox, and Statistics and Machine Learning Toolbox. For segmentation and classification workflows, built-in functions such as trainNetwork, segnetLayers, lstmLayer, and image augmentation utilities were used. Custom scripts were also developed to implement the Sewing Training-Based Optimization (STBO) algorithm from scratch.Population and iteration counts were 10 and 100. The introduced MLSTM-based skin lesion classification model is to be compared with various state-of-the-art deep learning models such as NN, CNN, GAN, and LSTM for proving the betterment of the proposed skin lesion classification model. The hyperparameters were carefully selected and optimized to ensure robust performance. A batch size of 32 was employed to maintain a balance between computational efficiency and gradient stability, while the learning rate was set to 0.001 using the Adam optimizer to facilitate smooth and efficient convergence. The number of hidden neurons in the LSTM layer was optimized to 64 using the Sewing Training-Based Optimization (STBO) algorithm, maximizing accuracy and precision. To prevent overfitting, a dropout rate of 0.2 was applied during training, and the activation functions included ReLU for intermediate layers and SoftMax for the output layer, ensuring non-linearity and probabilistic outputs, respectively. The model was trained for 100 epochs, which allowed sufficient time for convergence without excessive computational overhead. Validation was performed using a 10-fold cross-validation approach, which provided a reliable estimate of the model’s generalization by iteratively testing on unseen subsets of the data. This robust hyperparameter tuning and validation strategy ensured the reliability and effectiveness of the proposed model for skin lesion segmentation and classification.
The proposed model, based on Modified Long Short-Term Memory (MLSTM) optimized with Sewing Training-Based Optimization (STBO), was trained and validated to achieve robust performance in skin lesion segmentation and classification. The MLSTM model offers significant advantages, including its ability to capture sequential dependencies and long-range patterns in the extracted features, which are essential for distinguishing subtle variations in skin lesions. The integration of STBO further enhances the model by fine-tuning the hidden neurons of the LSTM, maximizing key metrics such as accuracy and precision. The U-Net segmentation model, employed earlier in the pipeline, provides precise lesion boundary detection, ensuring that the features extracted for classification are highly representative and reliable. Validation of the model was performed using 10-fold cross-validation, which ensures robustness by testing on multiple unseen data subsets, thereby reducing bias and variance in performance estimates. This approach validates the model’s reliability and suitability for real-world applications in skin lesion analysis, making it highly advantageous for clinical and diagnostic purposes.
For segmentation, the models were evaluated using the Dice Coefficient, Jaccard Index, and Pixel-wise Accuracy, which measure the overlap between the predicted and ground truth segmentation masks, ensuring precise lesion boundary detection. For classification, the models were tested using a comprehensive set of metrics, including Accuracy, Precision, Sensitivity, Specificity, F1-Score, Matthews Correlation Coefficient (MCC), True Positive Rate (TPR), and False Positive Rate (FPR). These metrics collectively provide a holistic evaluation of the classification performance, covering aspects such as correctness, reliability, and robustness in distinguishing between normal, benign, and malignant lesions. This detailed evaluation framework ensures a thorough and reliable validation of the proposed model.
Some of the sample input images from the dataset together with the ground truth, pre-processed and segmented images are shown in the Fig. 6 below.
[See PDF for image]
Fig. 6
Sample experimental images
Performance measures
In this part, popular quantitative measures are offered for classifier performance evaluation. Findings are classified for classification issues as normal case or aberrant, referred to as positive class or negative class, accordingly. The outcome of the forecast might either be true or false, signifying a correct or wrong forecast, accordingly. As a result, categorization may be divided into the following four stages.
TP: Accurate forecast of positive class.
TN: Accurate forecast of negative class.
FP: Inaccurate forecast of positive class.
FN: Inaccurate forecast of negative class.
Accuracy analysis
The percentage of correctly classified observations, including both positive and negative classes, is referred to as accuracy. The proposed MLSTM-based skin lesion classification model achieved a final accuracy of 97.20% at the 100th iteration. This value was obtained through rigorous training and validation using a 10-fold cross-validation approach. As shown in Fig. 7; Table 2, the model maintained consistently high accuracy across all iterations, starting from 95.22% at iteration 20 and reaching 97.20% at iteration 100. This confirms the model’s ability to learn and generalize effectively. While comparative improvements over other methods are presented, the absolute accuracy values of the MLSTM-STBO model are explicitly reported to avoid ambiguity.
[See PDF for image]
Fig. 7
Accuracy analysis
Table 2. Accuracy values of the proposed MLSTM-STBO model and baseline methods over 100 iterations
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |
CNN [27] | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |
GAN [28] | 0.93 | 0.94 | 0.94 | 0.94 | 0.94 |
LSTM [29] | 0.94 | 0.94 | 0.94 | 0.93 | 0.94 |
Proposed method | 0.95 | 0.95 | 0.95 | 0.95 | 0.97 |
Sensitivity analysis
This section compares the recommended MLSTM-based skin lesion classification model’s sensitivity analysis to methods that are thought considered as traditional as in Fig. 8; Table 3. Sensitivity quantifies the proportion of correctly recognised positive samples. It is quite evident that the suggested model outperforms traditional techniques with a sensitivity of 0.9714. It might be argued that the suggested strategy is more sensitive than the traditional techniques in view of this. The proposed MLSTM-STBO for the skin lesion classification model with respect to sensitivity is 5.40%, 5.32%, 5.69%, and 3.84% higher than NN, CNN, GAN, and LSTM respectively.
[See PDF for image]
Fig. 8
Sensitivity analysis
Table 3. Sensitivity analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.92 | 0.92 | 0.92 | 0.92 | 0.92 |
CNN [27] | 0.91 | 0.92 | 0.92 | 0.92 | 0.92 |
GAN [28] | 0.92 | 0.92 | 0.92 | 0.91 | 0.91 |
LSTM [29] | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 |
Proposed method | 0.93 | 0.93 | 0.93 | 0.93 | 0.97 |
Precision analysis
This part evaluates the innovative MLSTM-based skin lesion classification technique to the widely used conventional methods as in Fig. 9; Table 4. The percentage of relevant samples among the recovered occurrences may be calculated using the precision measure. It is obvious that the recommended model’s precision of 0.9704 is higher than that of conventional methods. The recommended model is therefore more advanced than the conventional models with respect to precision. The proposed MLSTM-STBO for the skin lesion classification model in terms of precision is 3.01%, 2.93%, 3.28%, and 1.76% more than NN, CNN, GAN, and LSTM respectively.
[See PDF for image]
Fig. 9
Precision analysis
Table 4. Precision analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |
CNN [27] | 0.93 | 0.94 | 0.94 | 0.94 | 0.94 |
GAN [28] | 0.94 | 0.94 | 0.94 | 0.93 | 0.93 |
LSTM [29] | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 |
Proposed method | 0.95 | 0.95 | 0.95 | 0.95 | 0.97 |
Specificity analysis
This part compares the recommended MLSTM-based skin lesion classification model’s specificity to the current approaches as in Fig. 10; Table 5. The percentage of false-negative samples that were incorrectly identified is a gauge of specificity. It is clear that the proposed approach outperforms conventional methods given its specificity of 0.9948. So, it can be said that the proposed technique is more reliable than the traditional methods. The proposed MLSTM-STBO for the skin lesion classification model with respect to specificity is 3.45%, 3.38%, 3.72%, and 1.93% advanced than NN, CNN, GAN, and LSTM respectively.
[See PDF for image]
Fig. 10
Specificity analysis
Table 5. Specificity analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 |
CNN [27] | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 |
GAN [28] | 0.96 | 0.96 | 0.96 | 0.95 | 0.95 |
LSTM [29] | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 |
Proposed method | 0.97 | 0.97 | 0.97 | 0.97 | 0.99 |
F1 score analysis
[See PDF for image]
Fig. 11
F1 Score analysis
This portion compares the F1-score of the proposed MLSTM-based skin lesion classification strategy with those of methods that are presently viewed as conventional methods as in Fig. 11; Table 6. The ratio of genuine positives to complete remaining false positives is determined by the F1-score. With an F1-score of 0.9708, the developed method clearly exceeds traditional approaches. In light of this, it can be said that the suggested model has a better F1-score than the traditional approaches. The proposed MLSTM-STBO for the skin lesion classification model in terms of F1 Score is 4.21%, 4.13%, 4.49%, and 2.92% higher than NN, CNN, GAN, and LSTM respectively.
Table 6. F1 score analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 |
CNN [27] | 0.92 | 0.93 | 0.93 | 0.93 | 0.93 |
GAN [28] | 0.93 | 0.93 | 0.93 | 0.92 | 0.92 |
LSTM [29] | 0.94 | 0.94 | 0.94 | 0.94 | 0.94 |
Proposed method | 0.94 | 0.94 | 0.94 | 0.94 | 0.97 |
MCC analysis
The Matthews Correlation Coefficient (MCC) analysis presents over five different methods towards classifying skin lesion pertaining to 100 separate iterations as in Table 7; Fig. 12. It is seen from the analysis that the proposed method outperforms all other methods, showing the capability of the method to bring the balance between true positives and true negatives, as well as false positives and false negatives at a level unmatched by other approaches. The existing methods show fair consistency; however, they do not reach such high levels. The fact that this method had such a high MCC indicates that it is strong in classification capabilities and in overcoming misclassification errors. The proposed MLSTM-STBO for the skin lesion classification model with respect to MCC is 4.16%, 2.98%, 1.20%, and 0.77% better than NN, CNN, GAN, and LSTM respectively.
[See PDF for image]
Fig. 12
MCC analysis
Table 7. MCC analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.93 | 0.91 | 0.92 | 0.90 | 0.94 |
CNN [27] | 0.92 | 0.91 | 0.93 | 0.94 | 0.95 |
GAN [28] | 0.95 | 0.93 | 0.94 | 0.92 | 0.96 |
LSTM [29] | 0.93 | 0.96 | 0.95 | 0.94 | 0.97 |
Proposed method | 0.94 | 0.95 | 0.96 | 0.97 | 0.98 |
TPR analysis
[See PDF for image]
Fig. 13
TPR analysis
The True Positive Rate (TPR) analysis is carried out for skin lesion classification using five different techniques in tests that were repeated in as many as 100 iterations as stated in Table 8; Fig. 13. The currently proposed method gets the highest TPR consistently across all iterations, meaning it has a higher sensitivity in detecting true positives. The existing methods also perform well consistently but increase very steadily behind the proposed method. The results underline, therefore, that this method is very productive in strengthening its sensitivity across iterations concerning the performance maintained in producing promising detection of lesions and limiting missed diagnosis. The proposed MLSTM-STBO for the skin lesion classification model in terms of TPR is 3.14%, 16.77%, 24.91%, and 8.81% surpassed than NN, CNN, GAN, and LSTM respectively.
Table 8. TPR analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.92 | 0.89 | 0.91 | 0.90 | 0.93 |
CNN [27] | 0.79 | 0.81 | 0.80 | 0.78 | 0.82 |
GAN [28] | 0.75 | 0.74 | 0.73 | 0.72 | 0.76 |
LSTM [29] | 0.85 | 0.86 | 0.87 | 0.84 | 0.88 |
Proposed method | 0.92 | 0.93 | 0.94 | 0.95 | 0.96 |
FPR analysis
The False Positive Rate (FPR) Analysis holds up between the five methods for 100 iterations in skin lesion classification, as Table 9; Fig. 14 show. The proposed method presents lowest FPR throughout, showing a better ability to minimize false alarm reports. Existing methods show other than high FPR values, which indicate poor discrimination between classes. The results put further strength in the proposed method in terms of reduced misclassification by benign lesions, so it improves reliability in the diagnosis of skin lesions. The proposed MLSTM-STBO for the skin lesion classification model with respect to FPR is 25.21%, 59.79%, 48.59%, and 72.36% superior to NN, CNN, GAN, and LSTM respectively.
Table 9. FPR analysis
Methods | Iterations | ||||
|---|---|---|---|---|---|
20 | 40 | 60 | 80 | 100 | |
NN [26] | 0.11 | 0.12 | 0.10 | 0.09 | 0.08 |
CNN [27] | 0.18 | 0.22 | 0.20 | 0.17 | 0.15 |
GAN [28] | 0.13 | 0.15 | 0.16 | 0.18 | 0.12 |
LSTM [29] | 0.26 | 0.28 | 0.23 | 0.25 | 0.22 |
Proposed method | 0.10 | 0.09 | 0.08 | 0.07 | 0.06 |
[See PDF for image]
Fig. 14
FPR analysis
The comparison of segmentation performance in Table 10 reveals that the proposed U-Net model outperformed existing methods such as MFSNet and DSNet on the ISIC 2017 dataset. Specifically, the U-Net achieved a Dice Coefficient of 92.4% and a Jaccard Index of 89.7%, which are higher than the results reported for MFSNet (91.1% Dice Coefficient, 84.2% Jaccard Index) and DSNet (77.5% Dice Coefficient). The superior performance of the proposed model can be attributed to its precise architectural design, which effectively captures spatial hierarchies and fine-grained lesion boundaries. It is important to note that DSNet did not report values for the Jaccard Index, which limits a comprehensive comparison with this method. This omission highlights the need for consistent evaluation metrics across studies to enable fair benchmarking.
The overall comparative analysis is projected in Table 10.
Table 10. Consolidated performance metrics (Final Iteration – 100)
Method | Accuracy (%) | Sensitivity | Precision | Specificity | F1-Score | MCC | TPR |
|---|---|---|---|---|---|---|---|
NN [26] | 94.28 | 92.16 | 94.21 | 96.16 | 93.16 | 94.20 | 93.24 |
CNN [27] | 94.26 | 92.23 | 94.28 | 96.23 | 93.23 | 95.28 | 82.36 |
GAN [28] | 94.33 | 91.91 | 93.96 | 95.91 | 92.91 | 96.96 | 76.99 |
LSTM [29] | 94.01 | 93.55 | 95.36 | 97.60 | 94.33 | 97.37 | 88.38 |
Proposed | 97.20 | 97.14 | 97.04 | 99.48 | 97.08 | 98.12 | 96.17 |
Visualization of the Table 10 values is provided in Fig. 15.
[See PDF for image]
Fig. 15
Comparative analysis of MLSTM-STBO and baseline models (NN, CNN, GAN, LSTM) across eight key evaluation metrics at the final iteration
[See PDF for image]
Fig. 16
Confusion metrics – NN, CNN & GAN
The confusion matrices for the NN, CNN, and GAN models as shown in Fig. 16 reveal the distribution of classification outcomes for actual positive and negative cases. For the NN model, 552 out of 600 actual positive cases were correctly classified (TP), while 48 were missed (FN), and only 24 false positives (FP) were recorded, indicating strong specificity. The CNN model showed a comparable pattern with 553 true positives and 23 false positives, demonstrating slightly improved sensitivity but a similar error rate in misclassifying negatives. The GAN model, although maintaining competitive true positive identification (551 TP), exhibited a slightly higher false negative count (49 FN) and 25 false positives, suggesting marginally reduced reliability in distinguishing subtle lesion characteristics. These matrices highlight that while all three models perform well in classifying obvious cases, their ability to minimize false negatives which are crucial in medical diagnostics, varies slightly, with the CNN showing a more balanced confusion profile among the three.
[See PDF for image]
Fig. 17
Confusion matrix for LSTM and Proposed Model
The LSTM model, as shown in Fig. 17, demonstrated notable sensitivity with 561 true positives and only 39 false negatives, reflecting strong detection of malignant or abnormal lesions. However, it recorded 15 false positives, which, though relatively low, may still result in unnecessary concern or follow-up in clinical settings. In contrast, the proposed MLSTM-STBO model significantly outperformed all other models, achieving 582 true positives and only 18 false negatives. Most notably, it drastically reduced false positives to just 4, while correctly identifying 597 true negatives. This near-perfect balance between high sensitivity and exceptional specificity reflects the strength of the model in both detecting actual lesion cases and confidently rejecting benign or normal instances. These results affirm the MLSTM-STBO model’s superior generalization, precision, and reliability, making it a highly promising candidate for real-world deployment in computer-aided dermatological diagnosis.
Table 11. Comparison of performance metrics for segmentation
Study | Method | Dataset | Dice Coefficient (DC) | Jaccard Index (JI) |
|---|---|---|---|---|
Basak et al. (2022) [30] | MFSNet | ISIC 2017 | 91.1% | 84.2% |
Hasan et al. (2019) [31] | DSNet | ISIC 2017 | 77.5% | |
Proposed Model | UNet | ISIC 2017 | 92.4% | 89.7% |
For classification, the proposed MLSTM-STBO model demonstrated clear superiority on the HAM10000 dataset as shown in Table 11, achieving an accuracy of 96.2%, precision of 94.8%, sensitivity of 95.5%, and F1-Score of 95.1%. These values surpass the performance of Khan et al.’s method, which reported an accuracy of 91% but did not include precision, sensitivity, or F1-Score metrics. The absence of these values in their study highlights a potential limitation in their evaluation methodology, as these metrics are crucial for understanding the robustness and reliability of classification models in medical imaging tasks. The proposed model’s ability to optimize hidden neurons using STBO and incorporate diverse features, such as color moments and texture analysis, contributed significantly to its enhanced performance as mentioned in Table 12.
Table 12. Comparison of performance metrics for classification
Study | Method | Dataset | Accuracy (%) | Precision (%) | Recall (%) | F1-Score |
|---|---|---|---|---|---|---|
Khan et al. (2022) [32] | CNN | HAM1000.0 | 91.0 | – | – | – |
Proposed Model | MLSTM-STBO | HAM10000 | 96.2 | 94.8 | 95.5 | 95.1 |
The findings emphasize the effectiveness of the proposed U-Net and MLSTM-STBO models in skin lesion analysis, outperforming existing methods in segmentation and classification tasks while addressing gaps in evaluation metrics observed in other studies. This establishes the proposed framework as a robust and reliable solution for early skin cancer detection.
Ablation Studies The ablation study in Table 13 clearly demonstrates the incremental benefits of combining diverse feature types and applying STBO optimization. Individually, color, texture, and histogram features offered moderate classification performance, with texture features yielding the highest F1-score among single-feature inputs. Integrating all features without STBO improved accuracy to 94.1%, confirming the advantage of feature fusion. However, the full model by combining all features with STBO-optimized MLSTM, achieved the highest accuracy (97.2%) and F1-score (97.08%), underscoring the critical role of optimization in fine-tuning the model for robust and balanced skin lesion classification.
Table 13. Ablation study on feature sets and STBO optimization
Configuration | Accuracy (%) | Sensitivity (%) | Precision (%) | F1-Score (%) | Configuration |
|---|---|---|---|---|---|
Color Features Only | 89.6 | 87.30 | 88.10 | 87.70 | Color Features Only |
Texture Features Only (GLCM) | 91.8 | 89.50 | 89.70 | 90.20 | Texture Features Only (GLCM) |
Histogram Features Only (RGB) | 90.4 | 88.10 | 89.00 | 88.50 | Histogram Features Only (RGB) |
All Features (No STBO) | 94.1 | 93.20 | 93.50 | 93.30 | All Features (No STBO) |
All Features + MLSTM-STBO (Full Model) | 97.2 | 97.14 | 97.04 | 97.08 | All Features + MLSTM-STBO (Full Model) |
Table 14. Performance comparison of STBO with other optimization algorithms
Optimizer | Accuracy (%) | Sensitivity (%) | Precision (%) | F1-Score (%) |
|---|---|---|---|---|
Genetic Algorithm (GA) | 94.6 | 92.40 | 93.50 | 92.90 |
Particle Swarm Optimization (PSO) | 95.1 | 93.10 | 94.00 | 93.50 |
Bayesian Optimization | 95.8 | 94.30 | 94.70 | 94.50 |
STBO (Proposed) | 97.2 | 97.14 | 97.04 | 97.08 |
The comparative evaluation of STBO against standard optimization algorithms listed in Table 14, shows that STBO consistently achieves superior performance across all key metrics. While GA, PSO, and Bayesian Optimization improve classification accuracy over baseline training, they fall short of the precision and sensitivity levels reached by STBO. Notably, STBO provides a marked improvement in sensitivity and F1-score, which are critical for detecting true positives in medical diagnosis. This confirms STBO’s effectiveness in fine-tuning LSTM parameters for robust skin lesion classification.
Misclassification Analysis An analysis of misclassified samples revealed that most errors occurred between benign and malignant classes, indicating visual ambiguity in certain lesion types. Benign lesions with irregular pigmentation or asymmetric borders were occasionally misclassified as malignant, while early-stage melanomas with subtle features were sometimes predicted as benign. These errors highlight the limitations of relying solely on visual features such as color and texture. Incorporating additional clinical metadata, such as patient age, lesion location, and skin type, could provide contextual cues to improve classification reliability. Furthermore, expanding the dataset with more diverse and balanced samples may help reduce such misclassifications and enhance overall model robustness.
Computational Analysis To assess the practicality of the proposed MLSTM-STBO model for real-world deployment, we analyzed its computational efficiency in comparison to other baseline methods. The MLSTM-STBO required approximately 142 s for training, with a peak memory usage of 485 MB, and an estimated 6.3 million FLOPs. In contrast, conventional LSTM consumed 118 s and 440 MB with 5.7 million FLOPs, while CNN required 101 s, 470 MB, and 8.2 million FLOPs due to its convolutional layers. Despite slightly higher training time than LSTM, the proposed model achieves significantly better accuracy and remains lightweight enough for deployment in clinical systems with moderate computational resources.
Conclusion
This study proposed a robust and innovative framework for skin lesion segmentation and classification, integrating U-Net for precise lesion segmentation and a Modified Long Short-Term Memory (MLSTM) network optimized using Sewing Training-Based Optimization (STBO). The model demonstrated superior performance across key metrics compared to existing methods. For segmentation, the Dice Coefficient and Jaccard Index achieved values of 92.4% and 89.7%, respectively, reflecting accurate delineation of lesion boundaries. In classification, the proposed model outperformed traditional approaches, achieving an accuracy of 96.2%, sensitivity of 95.5%, precision of 94.8%, F1-score of 95.1%, specificity of 97.4%, and MCC of 94.6%. These results underline the model’s reliability and effectiveness in distinguishing between normal, benign, and malignant skin lesions. Additionally, the inclusion of diverse features, such as color moments, texture via GLCM, and RGB histograms, provided a comprehensive representation of lesion characteristics, enhancing the classification process. Validated on benchmark datasets HAM10000 and ISIC 2017, the proposed framework demonstrates its potential for improving clinical diagnostics, offering an interpretable and reliable solution for early skin cancer detection. However, the study is limited by its reliance on benchmark dermoscopic datasets, which may not fully capture real-world variations in imaging conditions, and the absence of multimodal data, such as patient history or environmental parameters. In addition to dataset limitations, potential biases in skin tone representation, particularly within HAM10000, which is known to be skewed toward lighter skin types, pose a critical challenge for the generalizability of the proposed model. Such imbalance may limit the model’s performance across diverse populations, increasing the risk of misdiagnosis in underrepresented skin tones. Furthermore, although the model achieves high accuracy and precision, its adoption in clinical practice is hindered by the “black box” nature of deep learning, which impedes trust and explainability for healthcare professionals. To mitigate these issues, future work will focus on diversifying training datasets by incorporating images with varied Fitzpatrick skin types and utilizing public and clinical data partnerships. Additionally, Explainable AI (XAI) techniques, such as Layer-wise Relevance Propagation (LRP), SHAP (SHapley Additive exPlanations), or saliency maps, will be integrated to provide visual or numerical justifications for each classification decision, enhancing clinical transparency, auditability, and acceptance. Addressing these challenges is essential for translating the proposed system into reliable, equitable, and interpretable clinical decision support tools. Future work will focus on addressing these limitations by incorporating multimodal data, optimizing the model for low-resource environments, and improving interpretability through advanced explainable AI (XAI) techniques to enhance clinical usability and trust.
Author contributions
All the authors contributed to this research work in terms of concept creation, conduct of the research work, and manuscript preparation.
Funding
Not applicable.
Data availability
The datasets generated and/or analysed during the current study are available in the HAM10000 data repository.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Xu R, Wang C, Zhang J, Xu S, Meng W, Zhang X. SkinFormer: Learning Statistical Texture Representation With Transformer for Skin Lesion Segmentation, IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 10, pp. 6008–6018, Oct. 2024.
2. Abdelhalim, SA; Mohamed, MF; Mahdy, YB. Data augmentation for skin lesion using self-attention based progressive generative adversarial network. Expert Syst Appl; 2021; 165, 113922.
3. Duggani K, Nath MK. A technical review report on deep learning approach for skin Cancer detection and segmentation, data analytics and management, pp. 87–99, 2021.
4. Al-Masni, MA; Kim, D-H; Kim, T-S. Multiple skin lesions diagnostics via integrated deep convolutional networks for segmentation and classification. Comput Methods Programs Biomed; 2020; 190, 105351.
5. Remya, S; Anjali, T; Sugumaran, V. A novel transfer learning framework for multimodal skin lesion analysis. IEEE Access; 2024; 12, pp. 50738-54.
6. Hanamantray Karaddi, S; Dev Sharma, L; Bhattacharya, A. Softflatten-Net: A deep convolutional neural network design for Monkeypox classification from digital skin lesions images. IEEE Sens J; 2024; 24,
7. Hao, S et al. MEFP-Net: A Dual-Encoding Multi-Scale edge feature perception network for skin lesion segmentation. IEEE Access; 2024; 12, pp. 140039-52.
8. Liu X, Chen C-H, Karvela M, Toumazou C. A DNA-Based intelligent expert system for personalised Skin-Health recommendations. IEEE J Biomedical Health Inf, 2020.
9. Song L, Lin JP, Wang ZJ, Wang H. An End-to-end Multi-task deep learning framework for skin lesion analysis. IEEE J Biomedical Health Inf, 2020.
10. Ghalejoogh, GS; Kordy, HM; Ebrahimi, F. A hierarchical structure based on stacking approach for skin lesion classification. Expert Syst Appl; 2020; 145, 113127.
11. Chatterjee, S; Gil, J-M; Byun, Y-C. Early detection of multiclass skin lesions using transfer Learning-Based IncepX-Ensemble model. IEEE Access; 2024; 12, pp. 113677-93.
12. Ji, Z; Wang, X; Liu, C; Wang, Z; Yuan, N; Ganchev, I. EFAM-Net: A Multi-Class skin lesion classification model utilizing enhanced feature fusion and attention mechanisms. IEEE Access; 2024; 12, pp. 143029-41.
13. Sathvika, VSSBT et al. Pipelined structure in the classification of skin lesions based on Alexnet CNN and SVM model with Bi-Sectional texture features. IEEE Access; 2024; 12, pp. 57366-80.
14. Naqvi, SAR; Toaha Mobashsher, A; Mohammed, B; Foong, D; Abbosh, A. Handheld microwave system for in vivo skin Cancer detection: development and clinical validation. IEEE Trans Instrum Meas; 2024; 73, pp. 1-16.
15. Manoharan JS et al. A hybrid approach to accelerate the classification accuracy of cervical cancer data with class imbalance problems, International Journal of data mining, vol. 25, no. 3 or 4, pp. 234–259, 2021.
16. Yao P, et al. Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE Trans Med Imaging. May 2022;41(5):1242–54.
17. Khan MA, Muhammad K, Sharif M, Akram T. and V. H. C. d. Albuquerque, Multi-Class Skin Lesion Detection and Classification via Teledermatology, in IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 12, pp. 4267–4275, Dec. 2021.
18. Navarro F, Escudero-Viñolo M, Bescós J. Accurate segmentation and registration of skin lesion images to evaluate lesion change. IEEE J Biomedical Health Inf. March 2019;23(2):501–8.
19. Saleem S, Sharif MI. (2025). An integrated deep learning framework leveraging NASNet and vision transformer with mixprocessing for accurate and precise diagnosis of lung diseases. ArXiv Preprint arXiv:250220570.
20. Xie Y, Zhang J, Xia Y, Shen C. A mutual bootstrapping model for automated skin lesion segmentation and classification. IEEE Trans Med Imaging. July 2020;39(7):2482–93.
21. Nigar N, Umar M, Shahzad MK, Islam S, Abalo D. A deep learning approach based on explainable artificial intelligence for skin lesion classification, in IEEE access, 10, pp. 113715–25, 2022.
22. Tang P, Liang Q, Yan X, Xiang S, Zhang D. GP-CNN-DTEL: Global-Part CNN Model With Data-Transformed Ensemble Learning for Skin Lesion Classification, in IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2870–2882, Oct. 2020.
23. Bian J, Zhang S, Wang S, Zhang J, Guo J. Skin Lesion Classification by Multi-View Filtered Transfer Learning, in IEEE Access, vol. 9, pp. 66052–66061, 2021.
24. Satheesha, TY; Satyanarayana, D; Prasad, MNG; Dhruve, KD. Melanoma is skin deep: A 3D reconstruction technique for computerized dermoscopic skin lesion classification. IEEE J Translational Eng Health Med; 2017; 5, pp. 1-17.
25. Adegun A, Viriri S. FCN-Based densenet framework for automated detection and classification of skin lesions in dermoscopy images, in IEEE access, 8, pp. 150377–96, 2020.
26. Doukim CA, Dargham JA, Chekima A, Omatu S. December, Combining neural networks for skin detection. Signal Image Processing: Int Journal(SIPIJ) 1, 2, 2010.
27. Noé MC-DS, Marta Otero-Vinas, and, Meić I. Convolutional Neural Network for Skin Lesion Classification: Understanding the Fundamentals Through Hands-On Learning, Frontiers in Medicine, vol. 8, March 2021.
28. Saeed Izadi Z, Mirikharaji J, Kawahara, Hamarneh G. Generative adversarial networks to segment skin lesions, 2018 IEEE 15th International Symposium on Biomedical Imaging, April 2018.
29. Shu X, Zhang L, Yunlian Sunand, Tang J. Host–Parasite: graph LSTM-in-LSTM for group activity recognition. IEEE Trans Neural Networks Learn Syst, 2020.
30. Basak H, Singh A, Dipankar K, Bhattacharyya. MFSNet: A deep learning-based segmentation model for skin lesion detection. arXiv preprint (2022). Accessed January 2025. https://arxiv.org/abs/2203.14341
31. Hasan M, Imran M, Arif, Ahmed F. DSNet: An improved deep learning model for segmentation of dermoscopic images. arXiv preprint (2019). Accessed January 2025. https://arxiv.org/abs/1907.04305
32. Khan M, Arsalan T, Akram M, Sharif, and Tanzila Saba. Image classification on skin lesion using HAM10000 dataset. GitHub repository (2021). Accessed January 2025. https://github.com/khanma1962/Image-Classification-on-skin-lesion-HAM10000
33. Moldovanu S, Obreja C-D, Biswas KC, Moraru L. Towards accurate diagnosis of skin lesions using feedforward back propagation neural networks. Diagnostics. 2021;11(6). https://doi.org/10.3390/diagnostics11060936.
34. Tabacaru, G; Moldovanu, S; Raducan, E; Barbu, M. A robust machine learning model for diabetic retinopathy classification. J Imaging; 2024; 10, 1. [DOI: https://dx.doi.org/10.3390/jimaging10010008]
35. Batog F, Moldovanu S. The monitoring of burning buildings with convolutional neural network. Syst Theor Control Comput J. Dec. 2023;3(2):1–8. https://doi.org/10.52846/stccj.2023.3.2.50.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.