Abstract. Skin cancer is one of the most dangerous cancer types in the world. Like any other cancer type, early detection is the key factor for the patient's recovery. Integration of artificial intelligence with medical image processing can aid to decrease misdiagnosis. The purpose of the article is to show that deep learning-based image classification can aid doctors in the healthcare field for better diagnosis of skin lesions. VGG16 and ResNet50 architectures were chosen to examine the effect of CNN networks on the classification of skin cancer types. For the implementation of these networks, the ISIC 2019 Challenge has been chosen due to the richness of data. As a result of the experiments, confusion matrices were obtained and it was observed that ResNet50 architecture achieved 91.23% accuracy and VGG16 architecture 83.89% accuracy. The study shows that deep learning methods can be sufficiently exploited for skin lesion image classification.
Keywords: Deep Learning, Image classification, ISIC 2019, ResNet50, VGG16
1.Introduction
Cancer is defined as a disease consisting of uncontrolled proliferation of foreign cells in organs (Cooper, 2019). Cancer is the highest cause of death in the world and skin cancer is one of the most common types of cancer in the world (WHO, 2022). Skin cancer is one of the three most dangerous types of cancer caused by damaged DNA that can cause death (Ali et al., 2021). Early detection of skin cancer can highly increase the curing rate (Codella et al., 2017). Since the time-consuming visual examination of the lesions is dependent on the dermatologists prior experience, this can cause the disease to be misdiagnosed (Kazakevičiute-Januškevičiene et al., 2015).
Computer-aided diagnostic systems help clinicians improve the accuracy of their diagnoses by providing a second perspective (Yanese and Triantaphyllou, 2019). In this context, utilization of image processing and deep learning algorithms can increase dermatologist performance and minimize the diagnostic time in the detection of skin cancer (Hosny et al. 2019).
The diagnosis and detection process in medicine with convolutional neural networks has increased in use in recent years. Gessert et al. (2020) have used ensembles of multiresolution EfficientNet to classify ISIC 2019 datasets that have achieved results with 74.2% sensitivity. Mahbod et al. (2020) have used EfficientNet architectures and ISIC 2017 dataset to explain the impact of segmentation on classification. Three different segmentation models were used and the results of manual segmentation achieved the highest classification accuracy with an area under the receiver operating characteristic curve (AUC) score of 93%. Harangi et al. (2020) created a supported deep learning framework using the GoogleNeT Inception-v3 architecture and performed a seven-class classification with an accuracy score of approximately 90% for each class. Sekhar et al. (2021) have used raw dermoscopic images as an input to the CNN and features of segmented dermoscopic images as additional information. The proposed method gives a classification accuracy of 98.13% for the identification of Melanoma. Maron et al. (2021) have implemented VGG16_BN, ResNet50, DenseNet121 and AlexNet architectures to test the robustness of convolutional neural networks in skin cancer using 3 different datasets (Skin Archive Munich (SAM), SAM-corrupted (SAM-C) and SAMperturbed (SAM-P)). Calderon et al. (2021) performed classification on the HAM10000 dataset to compare the state-of-the-art architectures with the bilinear approach created in the VGG16 and ResNet50 architectures. It was seen that the new approach achieved higher accuracy than other methods, with an F1 score of 0.9321. Hasan et al. (2022) have used a hybrid convolutional neural network (DermoExpert) to classify ISIC 2016, ISIC 2017 and ISIC 2018 datasets that have achieved the area under the receiver operating characteristic curve (AUC) of 0.96, 0.95, and 0.97, respectively. Indraswari et al. (2022) have used MobileNetV2 network to classify melanoma datasets and achieved an accuracy of over 85%.
With these algorithms, results as successful as an expert can be obtained and human error can be eliminated (Kassam, 2016). Therefore, we performed classification using images from the largest published dermoscopic open datasets - the International Collaboration on Skin Imaging (ISIC Archive, 2019) dataset. Due to their success in the literature, we have chosen VGG16 and ResNet50 architectures for dermoscopic image classification since the depth of models is quite different.
In terms of deep learning-based dermoscopic image classification, this study mainly aims to answer the following questions:
* Can architectures with different depths achieve similar accuracy results on the same dataset?
* How does the balance of the dataset affect accuracy?
* Does the accuracy of the architectures increase as the dataset grows?
2.Materials and Methods
2.1. ISIC 2019
The dataset used in the research was obtained from the ISIC 2019 (ISIC Archive, 2019) challenge. There are 9 classes in the dataset and a total of 25331 image data. Classes in the dataset are specified as Actinic Keratosis (AK), Basal Cell Carcinoma (BCC), Benign Keratosis (BKL), Dermatofibroma (DF), Melanoma (MEL), Nevus (NV), Squamous Cell Carcinoma (SCC), Vascular Lesion (VASC) and None of The Others (UNK). A part of the ISIC dataset was obtained by cropping the lesion areas of the images in the HAM10000 dataset at 600x450 sizes, and histogram corrections were applied to some images (Tschand et al., 2018). There are also different datasets consisting of skin lesion images. BCN_20000 and MSK datasets are a few of them. The images in the BCN_20000 are referred to as difficult dataset since the datasets consist of lesions that occur in rare regions, and the size of the images is 1024x1024 (Combalia et al., 2019). The images in the MSK dataset do not have a fixed size. On the other hand, it contains additional information such as the patient's age group, gender, and the region of the lesion. However, since these data are missing in some images, the dataset cannot be used in its full form (Gessert et al., 2020).
While determining the classes to be used in the research, homogeneity was taken into consideration in the data distribution. Among these classes, BCC, MEL and NV classes with the highest number of images were selected for this project. In the dataset, BCC class has the minimum number of images, which is 3323. For this reason, in the first stage of the research, the number of images for 3 classes was equalized to 3323 for a balanced dataset. The dataset created with these images was used in the processes conducted with Dataset 1. For the second part of the study, the number of images was increased using augmentation to 10911 which is the highest number of images for a class (NV). The second dataset created was used in the Dataset 2 phase. The Dataset 2 were generated using augmentation techniques such as horizontal flip, vertical flip and random brightness contrast augmentation.
The number of original images, Dataset 1 images and Dataset 2 images belonging to the classes in the dataset are shown in Table 1. Examples from the classes in the dataset are shown in Figure 1.
2.2. Architectures
In this study, ResNet and VGG architectures were used to classify skin lesion images. ResNet (He et al., 2016) architecture made its name by winning the ImageNet Classification challenge held in 2015. ResNet50 architecture consists of 48 convolutions, 1 maximum pooling and 1 average pooling layer. The input image size of the architecture is 224x224 and the first layer is 7x7. It is the convolution layer with kernel size, and then there is the max-pooling layer with a 3x3 kernel size. The most important feature of the ResNet architecture is that it uses the residual learning method in the learning process for which it exploits Residual blocks (Figure 2). Residual blocks consist of 3 convolution layers and these layers have kernel sizes of 1x1, 3x3 and 1x1, respectively. Finally, after the residual blocks, there is an average pooling and a fully connected layer with 1000 neurons (Calderon et al., 2021).
The VGG architecture was first introduced by Simonyan and Zisserman (2015) in the ImageNet Classification Challenge. The VGG16 architecture consists of 16 layers, including 13 convolutional layers and 3 fully connected layers. The convolution layers form 5 convolution blocks and the input size is 224x224 pixels (Figure 3). There are convolution and maximum pooling layers within the convolution blocks. The convolution layer kernel size is 3x3, the maximum pooling kernel size is 2x2, and the fully connected layer kernel size is 1x1. After the convolution blocks, there are 3 fully connected layers with 4096, 4096 and 1000 neurons. (Göçeri, 2019). SoftMax activation is located on the final layer as can be seen in Figure 3.
It is aimed to perform the mentioned deep learning approaches for lesion classification from dermoscopic images which can be captured in various angles and lightening conditions.
Information about the equipment on which the trainings are carried out is shown in Table 2. The hyperparameter information of the architectures is shown in Table 3 which are determined empirically considering hardware limitations.
3.Result and Discussion
The datasets have been split as 70%, 20% and 10% for train, validation and test sets, respectively. In Dataset-1, the number of images is 6978, 1993 and 997 for train, validation and test sets, respectively. In Dataset-2, the number of images is 22913, 6547 and 3273 for train, validation and test sets, respectively.
Overall accuracy test results of VGG-16 and ResNet-50 architecture are shown in Table 4. The test datasets of Dataset 1 and Dataset 2 were also cross-compared with the trainings conducted with both datasets. For example, both networks are trained with Dataset-1 and Dataset-2, and tested with the test set of Dataset-1 and Dataset 2. The results show that increasing the number of images in Dataset 2 does not affect the performance of VGG16 as the overall accuracy increased around only 1% for the test set of Dataset-2. On the contrary, ResNet50 performed significantly better when it is trained with Dataset-2 since the overall accuracy is increased by almost 10% for the test set of Dataset-2. For both networks, increasing number of images in the training (Dataset-2) does not seem to perform well on the small test set of images (Dataset-1), since the overall accuracy of both networks is decreased by approximately 14% and 13% for VGG16 and ResNet50, respectively.
The confusion matrix (Table 5) shows the relationships between the test classes as a result of the prediction of the data whose real classes are known. Additionally, Table 6 shows precision, recall and F1 values for each class in all experiments. The observations on both tables show that the class that was confused the most and reduced the accuracy as a result of tests was the MEL class as it has the lowest F1 value. It was observed that the BEL and NV classes were better distinguished. In terms of F1 values, ResNet50 has outperformed VGG16 for all classes in all experiments. In line with the overall accuracy, the best results seem to be obtained with ResNet50 trained with Dataset-2. Some TP examples are shown in Figure 4.
Considering the confusion matrices (Table 5) and incorrectly predicted images (Figure 5), MEL class should be more investigated. Even though we have created a balanced dataset with an equal number of images for each class, it does not look sufficient for MEL classification.
4.Conclusion
As a result of the study, it was observed that the accuracy of the ResNet50 architecture increased in parallel with the number of image data. However, when the accuracies obtained with the VGG-16 architecture were examined, it was observed that the accuracy is decreased as a result of the training with more data. For this reason, it has been determined that VGG-16 does not provide the desired performance in our data set.
The performance of ResNet50 can be improved even further in terms of MEL classification by increasing the number of images. For future studies, we aim to perform recently popular visual information transformers and semantic segmentation of skin lesions to extract morphology and boundary. Thus, a semantic segmentation-based computer-aided diagnosis approach will be developed to give physicians a second opinion for improvement of their diagnosis.
References
Ali S., Miah S., Haque J., Rahman M., Islam K. (2021). An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models, Machine Learning with Applications, 5, 100036.
Calderon C., Sanchez K., Castillo S., Arguello H. (2021). BILSK: A bilinear convolutional neural network approach for skin lesion classification, Computer Methods and Programs in Biomedicine Update 1
Codella, N., Nguyen, Q. B., Pankanti, S., Gutman, D., Helba, B., Halpern, A., Smith, J. R. (2018). Deep Learning Ensembles for Melanoma Recognition in Dermoscopy Images. IBM Journal of Research and Development 61(5), 1-15
Combalia, M., Codella, N. C. F., Rotemberg, V., Helba, B., Vilaplana, V., Reiter, O., Carrera, C., Barreiro, A., Halpern, A. C., Puig, S., Malvehy J. (2019). BCN20000: Dermoscopic Lesions in the Wild, arxiv: 1908.02288
Cooper, G. M. (2019) The Cell: A Molecular Approach, Sinauer Associates, Oxford University Press
Gessert N.,Nielsen M.,Shaikh M.,Werner R.,Schlaefer A. (2020). Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data, MethodsX 7
Goçeri E., (2019), Analysis of Deep Networks With Residual Blocks and Different Activation Functions: Classification of Skin Diseases. 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), IEEE, pp. 1-6.
Harangi B., Baran A., Hajdu A., (2020), Assisted deep learning framework for multi-class skin lesion classification considering a binary classification support, Biomedical Signal Processing and Control 62
Hasan, K. ,Elahi, T. E., Alam, A., Jawar, T., Marti R. (2022), DermoExpert: Skin lesion classification using a hybrid convolutional neural network through segmentation, transfer learning, and augmentation, Informatics in Medicine Unlocked, 28, 100819.
He K., Zhang X., Ren S., Sun J. (2016), Deep Residual Learning for image Recognition, IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770-778.
Hosny, K. M., Kassem, M. A., Foaud, M. M. (2019) Skin Cancer Classification using Deep Learning and Transfer Learning, 9th Cairo International Biomedical Engineering Conference (CIBEC), 20-22 Dec. 2018, Cairo, Egypt
Indraswari R., Rokhana R., Herulambang W. (2022), Melonoma image classification based on MobileNetV2 network, Sixth Information Systems International Conference (ISICO 2021)
ISIC Archive. International skin imaging collaboration: Melanoma project website [Online] .https://isic-archive.com, 2019.
Kassam, A. (2016), Segmentation Of Skin Cancer By Using Image Processing Techniques, Master Thesis, Yıldız Technical University Department Of Computer Engineering, Istanbul
Kazakevičiūte-Januškevičiene, G., Ušinskas, A., Januškevičius, E., Ušinskiene, J. (2015) Regionbased Annotations for the Medical Images, Baltic Journal of Modern Computing 3(4), 248267
Sekhar, K. S. R., Tummala, R. B., Goriparthi, P., Kotra, V. (2021), Dermoscopic image classification using CNN with Handcrafted features, Journal of King Saud University - Science 33
Mahbod A., Tschandl P., Langs G., Ecker R., Ellinger I. (2020). The effects of skin lesion segmentation on the performance of dermatoscopic image classification, Computer Methods and Programs in Biomedicine 197
Maron, R.C., Schlager, J.G., Hanggenmüller, S., Kalle, C. von, Utikal, J.S., Meier, F., Gellrich, F.F. et al. (2021), A benchmark for neural network robustness in skin cancer classification, European Journal of Cancer 155, 191-199.
Simonyan K., Zisserman A., (2015), Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015
Tschandl P., Rosendahl C., Kittler H. (2018). The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions, (eng), Scient. Data 5, 180161.
Yanese, J., Triantaphyllou, E. (2019) A systematic survey of computer-aided diagnosis in medicine: Past and present developments, Expert Systems with Applications 138, 112821.
WHO (2022) World Health Organization - Cancer, https://www.who.int/news-room/factsheets/detail/cancer Access Date: 10 Feb 2022
Received May 31, 2022, accepted June 17, 2022
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under https://creativecommons.org/licenses/by-sa/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Skin cancer is one of the most dangerous cancer types in the world. Like any other cancer type, early detection is the key factor for the patient's recovery. Integration of artificial intelligence with medical image processing can aid to decrease misdiagnosis. The purpose of the article is to show that deep learning-based image classification can aid doctors in the healthcare field for better diagnosis of skin lesions. VGG16 and ResNet50 architectures were chosen to examine the effect of CNN networks on the classification of skin cancer types. For the implementation of these networks, the ISIC 2019 Challenge has been chosen due to the richness of data. As a result of the experiments, confusion matrices were obtained and it was observed that ResNet50 architecture achieved 91.23% accuracy and VGG16 architecture 83.89% accuracy. The study shows that deep learning methods can be sufficiently exploited for skin lesion image classification.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Dermatology Clinic (M.D.), Istanbul, Turkey
2 Yildiz Technical University, Faculty of Civil Engineering, Department of Geomatics Engineering, Istanbul, Turkey