This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Breast cancer (BC) is the most critical and common disease which greatly affected ladies in the world according to American Institute for Cancer Research [1], and there were 2 million new cases in 2018. Breast cancer is the 5th greater cause of women death as compared with other types of cancers. The breast cancer cells are growing abnormally in breast cancer tissues and gradually increased the affected cell rate, causing breast cancer. The breast cancer is actually a malignant tumor that is developed in breast cells. A group of splitting cells forms a lump or mass of extra tissue which are called tumors, and these tumors can be either cancerous (malignant) or noncancerous (benign). In different countries with advanced technology in medical science, the 5-year survival rate of initial phase breast cancer is 80–90% and drops to 24% for breast cancer diagnosis at the initial stage [2]. For diagnosis, the breast cancer various invoice-based techniques have been used. In the biopsy technique [3], breast tissues are collected for testing and the results are highly accurate. However, to take a biopsy from the breast is painful for the patient. Another breast cancer diagnosis technique is mammogram [4] which is used for the diagnosis of breast cancer. In this technique, a 2-dimensional (2D) projection image of the breast is designed. However, the mammogram technique does not perform the diagnosis of benign cancer effectively. Another invoice-based technique for the diagnosis of the breast is magnetic reasoning imaging (MRI) [5], which is a very complex test and provides excellent results for 3-dimensional (3D) images and displays the dynamic functionality.
Theses invoice-based diagnosis techniques are very complex to conduct, and the results do not effectively and accurately diagnose the breast cancer. Additionally, these techniques required more time to generate the results [6].
In order to resolve these complexities in invasive-based methods for the diagnosis of breast cancer, a noninvasive-based technique such as machine learning technique is more effective and reliable. To classify breast tissues that either be malignant or benign, machine learning techniques have been used in the literature. The related literature of machine learning techniques for the diagnosis of breast cancer has been reported in this study briefly.
Azar and El-Said [4] proposed a technique for the diagnosis of breast cancer. They used three classification techniques such as radial basis function (RBF), probabilistic neural network (PNN), and multilayer perceptron (MLP). These classifiers were trained and tested with the breast cancer dataset. The performance evaluation metrics such as accuracy, specificity, and sensitivity were used for the classifier performance evaluation. The MLP obtained 97.80% and 97.66% classification accuracy for training and testing, respectively. In another study, Aličković and Subasi [7] proposed a breast cancer prediction system using two Wisconsin Breast Cancer (WBC) datasets along with genetic algorithm (GA) for feature selection algorithm and rotation forest (RF) classifier for classification purposes. The RF obtained 99.48% classification accuracy on selected features as selected by GA algorithm. Ahmad et al. [8] proposed a diagnosis system GA-MOO-NN for breast cancer diagnosis. The GA algorithm was used for selecting optimal features. They split the dataset into three parts: 50% for training, 25% for testing, and 25% for validation. The proposed technique achieved the accuracy of 98.85% and 98.10% individually in the best and average case. Hasan et al. [9] proposed a technique for the diagnosis of breast cancer using symbolic regression of multigene genetic programming. The 10-fold cross-validation was used and obtained 99.28% accuracy. Albrecht Andreas A et al. [10] proposed a technique to diagnose breast cancer and achieved 98.8% accuracy. Pena-Reyes and Sipper [11] proposed a classification technique which used the fuzzy-GA technique and achieved 97.36% accuracy. Akay [12] proposed a breast cancer diagnosis system using the F-score method for features selection and support vector machine (SVM) and obtained good performance results. Zheng et al. [13] used a K-means algorithm for features selection and extraction and combined with SVM for classification of benign and malignant breast tumors. The proposed technique achieved high classification accuracy and low computational time. Madevi [14] used hybridized principal component analysis (PCA) combined with different classifiers and applied to different breast cancer datasets and achieved good accuracy. In [15], the author proposed a technique based on memetic Pareto artificial neural network for the detection of breast cancer. The experimental results demonstrated that the proposed technique achieved good classification accuracy, and computational time was very low. Marcano-Cedeño et al. [16] proposed a method for breast cancer diagnosis using artificial meta plasticity multilayer perceptron and obtained 99.26% classification accuracy. Liu et al. [17] proposed a breast cancer prediction technique based on decision tree and applied the undersampling technique to balance the training data. The experimental results show that the proposed method achieved very good accuracy. Zheng et al. [13] proposed a breast cancer diagnosis approach based on K-means algorithm and SVM. The K-means was used for feature extraction, and SVM was used for classification. Onan [18] designed an intelligent technique for breast cancer detection. He used fuzzy-rough for selection of an instance and feature selection by consistency. For breast cancer detection, he used the fuzzy-rough nearest neighbor algorithm. Sheikhpour et al. [19] designed a technique based on particle swarm optimization integrated with nonparametric kernel density estimation for breast cancer prediction. Rasti et al. [20] designed a breast cancer diagnosis technique using mixture ensemble of convolutional neural networks and achieved accuracy 96.39%. Ani et al. [21] proposed IOT based patient monitoring and diagnostic prediction tool using ensemble classifier and the system achieved 93% accuracy. Yang et al. [22] proposed an IoT cloud-based wearable ECG monitoring system for smart healthcare, and the proposed system performance was very good in diagnosis of diseases.
The major aim of the article is to propose an IOT-based predictive system based on machine learning to successfully diagnosis people with breast cancer and healthy people. Machine learning predictive model SVM was used for classification of breast cancer in malignant and benign people. The recursive feature selection algorithm (REF) was adopted for the selection of features that improve the classification performance of the SVM classifier. We adopted the REF for appropriate feature selection in this study because the classification performance of REF FS-based method is good as compared with other methods of classification for BC and healthy people. These works used other feature selection algorithms such as LASSO, MRMR, LLBFS [23], relief with BFO [24], relief [25], and two-stage feature selection method [26]. The training/testing splits validation method was used in order to select the best hyperparameters for best model evaluation. Performance evaluation metrics such as classification accuracy, sensitivity, specificity, F1-score, Matthews’s correlation coefficient (MCC), and model execution time were used to check the performance of the proposed system. The proposed system has been tested on BC dataset which is available at the UCI repository.
The important contributions of this research study are as follows:
(1)
Breast cancer detection in the IoT health environment.
(2)
The modified REF algorithm used for feature selection and SVM classifier is trained and tested on selected features. Then, performance of SVM was also checked on the full-feature set and compared with the performance on best-selected feature subset at which the classifier achieved optimal performance.
(3)
Finally, we concluded that the proposed system can be used for effective diagnosis of BC. Furthermore, it can be incorporated easily in the healthcare system for BC diagnosis.
The remaining sections of this article are organized as follows. Section 2 describes the BC dataset, preprocessing techniques, feature selection algorithm REF, and classification algorithm SVM in detail. Furthermore, the validation technique and performance evaluation metrics are also discussed in this section. In Section 3, the BC diagnostic experimental results are analyzed and discussed in detail. Finally, conclusion and future work direction are presented in Section 4.
2. Research Materials and Methods
2.1. Dataset
The dataset “Wisconsin Diagnostic Breast Cancer (WDBC)” was created by Dr. William Wolberg at University of Wisconsin and is available at the UCI machine learning repository [27]. It was used as a dataset for implementation of the proposed study for designing machine learning-based system for the diagnosis of breast cancer. The dataset has a size of 569 subjects with 32 attributes and 30 features being real value features. The target output label diagnosis has two classes in order to represent the malignant or benign subject. The class distribution is 357 benign and 212 malignant subjects. Thus, the dataset is a 569 ∗ 32 feature matrix.
2.2. Method Background
In the following subsections, the background of the proposed method is discussed in detail.
2.2.1. Dataset Preprocessing
Before applying the machine learning algorithms for classification problems, data processing is necessary. The processed data [28, 29] reduced the computation time of classifier and increased the classification performance of the classifier. Methods such as missing value detection, standard scalar, and min-max scalar are widely applied to the dataset preprocessing. Standard scalar ensures that every feature has mean 0 and variance 1; thus, all features have the same coefficient. Min-Max scalar shifts the data in such a way that all features are ranged between 0 and 1. The feature which has an empty value in the row is removed from the dataset.
2.2.2. Modified Recursive Feature Elimination Algorithm (RFE)
The process of feature selection can be perceived as a method for selecting the feature subset from feature available set. The space of the data is very large and subspace/feature selection is critically necessary for the specificity of the data. The feature selection has two advantages. Firstly, it improves the accuracy of the classifier, and secondly due to feature selection, the computation time of machine learning algorithm reduced [6]. REF is a feature selection algorithm that fits a model and removes the irrelevant feature or features until the specified number of features is reached. Then building a model on features that are remained in the original set. The remaining features set are the most contributing features to the target label. The recursive feature elimination method for support vector machine [30] can be implemented in the following iterative steps (Algorithm 1).
Algorithm 1: Modified recursive feature elimination.
Begin
(1)
Train SVM model on the training dataset
(2)
Computes the performance metrics values such as accuracy, specificity, sensitivity, F1-score
(3)
Determine which feature is the least important in making the prediction on the testing dataset and eliminate this feature from the feature set.
(4)
The model has now reduced its feature by step 1
(5)
Select the feature set which gives the highest or lowest scoring metric.
(6)
Finish
The recursive feature elimination algorithm procedure is given below.
2.2.3. Classification
In this study, the following classifier was used for BC and healthy people classification. The brief theoretical and mathematical background of the classifier is presented.
The support vector machine (SVM) is a machine learning algorithm which has been mostly used for classification problems [24, 31–35]. SVM used a maximum margin strategy that transformed into solving a complex quadratic programming problem. Due to the high classification performance of SVM, various applications widely used it [6, 34, 35]. In a binary classification problem, the instances are separated with a hyperplane
The nonlinear scenario, for kernel trick and decision function, can be written as
The positive semidefinite functions that obey Mercer’s condition as kernel functions [33], such as the polynomial kernel, are expressed as:
The Gaussian kernel as expressed as
There are two parameters that should be determined in the SVM model: C and γ.
2.2.4. Data Partition
The dataset was divided into 70% for training the classifier and 30% for validation of the classifier.
2.2.5. Performance Evaluation Metrics
Evaluation metrics used to evaluate the performance of the classifier. In this study, three performance evaluation metrics were used. Table 1 shows the confusion matrix of the binary classification problem.
Table 1
Confusion matrix [6, 12, 36, 37].
Predicted BC subject | Predicted healthy subject | |
Actual BC subject | TP | FN |
Actual healthy subject | FP | TN |
According to Table 1, we compute the following metrics and mathematically expressed in equations (6)–(10), respectively.
(1)
TP (true positive) if the subject is classified as BC
(2)
TN (true negative) if a healthy subject is classified as healthy
(3)
FP (false positive) if a healthy subject is classified as BC
(4)
FN (false negative) if a BC is classified as healthy
(1) Classification Accuracy. The accuracy shows the overall performance of the classification system. Accuracy is the diagnostic test probability that correctly performed.
(2) Sensitivity/Recall. It is the ratio of correctly classified heart patient subjects to all number of heart patient subjects.
(3) Specificity. Specificity shows that a diagnostic test is negative, and the person is healthy.
(4) F1- Score. The traditional F-measure or balanced F-score (F1-score) is the harmonic mean of precision and recall:
(5) MCC. MCC represents the prediction ability of a classifier and creates value between [−1, +1].
If MCC of the classifier is +1, this means the classifier’s predictions are ideal.
−1 indicates that classifiers produce completely wrong predictions. MCC value near to 0 means that the classifier generates random predictions.
2.3. Proposed Predictive System for Brest Cancer Prediction
The following are the procedures of the proposed system for breast cancer prediction (algorithm 2). The flowchart of the proposed system is given in Figure 1.
Algorithm 2: Breast cancer predictive system.
Begin
(1)
Step 1: The preprocessing of breast cancer dataset using preprocessing techniques
(2)
Step 2: Best Feature selection set by REF algorithm
(3)
Step 3: Data partition using Training and testing splits method
(4)
Step 4: Train the predictive model SVM on the Training dataset
(5)
Step 5: Validation of predictive model SVM using testing dataset
(6)
Step 6: Computes the model performance evaluation metrics such as accuracy, sensitivity, specificity, MCC, F1-score, and execution time
(7)
Step 7: Finish
[figure omitted; refer to PDF]
3. Experimental Results Analysis and Discussion
In this section, we conduct the experiments for breast cancer prediction using feature selection algorithm for appropriate feature selection. The machine learning predictive model SVM has been used for the prediction of breast cancer. The dataset “Wisconsin Diagnostic Breast Cancer (WDBC)” was created by Dr. William Wolberg at the University of Wisconsin and is available at the UCI machine learning repository [27]. This dataset is used in this study. The dataset is split into 70% for training and 30% for testing purpose in these experiments. In order to check the predictive model performance, various evaluation measures of performances are used such as classification accuracy, specificity, sensitivity, MCC, F1-score, and execution time. All the performance metrics are computed automatically. Before applying feature selection algorithm and predictive model on data, preprocessing techniques are deployed on a dataset for the betterment of the dataset. Furthermore, all these experimental results are reported in tables and for better understanding, some graphics are also designed. All experiments conducted in python on an Intel(R) Core™ i5 -2400CPU @3.10 GH, RAM 4 GB, and Windows 10.
3.1. Results of Preprocessing on the Dataset
The information and description of 569 instances with 32 features of the dataset are given in Table 2 along with some statistical measures which are computed automatically. The class distribution is 357 benign and 212 malignant subjects in a dataset which is shown in Figure 2.
Table 2
Feature information and description with some statistical measures of Wisconsin Diagnostic Breast Cancer.
Label | Feature name | Description | Min-max | Mean, ± slandered division |
F1 | ID number | Integer | — | — |
F2 | Radius mean | Mean of distances from the center to points on the perimeter cell | 6.981000–28.110000 | 14.127292, ±3.524049 |
F3 | Texture mean | The standard deviation of grayscale values | 9.710000–39.280000 | 19.289649, ±4.301036 |
F4 | Perimeter mean | Perimeter of cell | 43.790000–188.500000 | 91.969033, ±24.298981 |
F5 | Area mean | Area of cell | 143.500000–2501.000000 | 654.889104, ±351.914129 |
F6 | Smoothness mean | Local variation in radius lengths | 0.052630–0.163400 | 0.096360, ±0.014064 |
F7 | Compactness mean | Perimeter ^ 2/area—1.0 | 0.019380–0.345400 | 0.104341, ±0.052813 |
F8 | Concavity mean | The severity of concave portions of the contour | 0.000000–0.426800 | 0.088799, ±0.079720 |
F9 | Concave points mean | Number of concave portions of the contour | 0.000000–0.201200 | 0.048919, ±0.038803 |
F10 | Symmetry mean | Symmetry | 0.106000–0.304000 | 0.181162, ±0.027414 |
F11 | Fractal dimension mean | Coastline approximation”—1 | 0.049960–0.097440 | 0.062798, ±0.007060 |
F12 | Radius severity | — | 0.111500–2.873000 | 0.405172, ±0.277313 |
F13 | Texture severity | — | 0.360200–4.885000 | 1.216853, ±0.551648 |
F14 | Perimeter severity | — | 0.757000–21.980000 | 2.866059, ±2.021855 |
F15 | Area severity | — | 6.802000–542.200000 | 40.337079, ±45.491006 |
F16 | Smoothness severity | — | 0.001713–0.031130 | 0.007041, ±0.003003 |
F17 | Compactness severity | — | 0.002252–0.135400 | 0.025478, ±0.017908 |
F18 | Concavity severity | — | 0.000000–0.396000 | 0.031894, ±0.030186 |
F19 | Concave points severity | — | 0.000000–0.052790 | 0.011796, ±0.006170 |
F20 | Symmetry severity | — | 0.007882–0.078950 | 0.020542, ±0.008266 |
F21 | Fractal dimension severity | — | 0.000895–0.029840 | 0.003795, ±0.002646 |
F 22 | Radius worst | — | 7.930000–36.040000 | 16.269190, ±4.833242 |
F23 | Texture worst | — | 12.020000–49.540000 | 25.677223, ±6.146258 |
F 24 | Perimeter worst | — | 50.410000–251.200000 | 107.261213, ±33.602542 |
F 25 | Area worst | — | 185.200000–4254.000000 | 880.583128, ±569.356993 |
F26 | Smoothness worst | — | 0.071170–0.222600 | 0.132369, ±0.022832 |
F27 | Compactness worst | — | 0.027290–1.058000 | 0.254265, ±0.157336 |
F28 | Concavity worst | — | 0.000000–1.252000 | 0.272188, ±0.208624 |
F29 | Concave points worst | — | 0.000000–0.291000 | 0.114606, ±0.065732 |
F30 | Symmetry worst | — | 0.156500–0.663800 | 0.290076, ±0.061867 |
F31 | Fractal dimension worst | — | 0.055040–0.207500 | 0.083946, ±0.018061 |
y | Diagnosis | M = malignant = 1B = benign = 0 | — | — |
3.2. Experimental Results of REF
To select more suitable features instead of using all the features of the dataset feature, selection algorithms are used for this purpose. The REF feature selection (FS) algorithm is more suitable for appropriate feature selection for predictive model prediction. REF is a feature selection algorithm that fits a model and removes the irrelevant feature or features until the specified number of features is reached. Then building a model on features that are remained in the original set. The remaining features set are the most contributing features to the target label. The thirty real values feature different subsets created by REF FA algorithm. The results of the REF FS algorithm are reported in Table 3.
Table 3
Thirty different subsets of feature and their ranking created by REF FS algorithm.
No of features in a subset | Subset of features |
1 | {F26} |
2 | {F7, F27} |
3 | {F7, F21, F27} |
4 | {F1, F7, F21, F27} |
5 | {F1, F7, F21, F27, F28} |
6 | {F1, F7, F8, F21, F27, F28} |
7 | {F1, F7, F8, F21, F27, F28, F29} |
8 | {F1, F7, F8, F9, F21, F27, F28, F29} |
9 | {F1, F7, F8, F9, F21, F25, F27, F28, F29} |
10 | {F1, F7, F8, F9, F21, F25, F27, F28, F29, F30} |
11 | {F1, F5, F7, F8, F9, F21, F25, F27, F28, F29, F30} |
12 | {F1, F3, F5, F7, F8, F9, F21, F25, F27, F28, F29, F30 } |
13 | {F1, F3, F5, F7, F8, F9, F12, F21, F25, F27, F28, F29, F30} |
14 | {F1, F3, F5, F7, F8, F9, F10, F12, F21, F25, F27, F28, F29, F30} |
15 | {F1, F2, F3, F5, F7, F8, F9, F10, F12, F21, F25, F27, F28, F29, F30 } |
16 | {F1, F2, F3, F5, F7, F8, F9, F10, F12, F17, F21, F25, F27, F28, F29, F30} |
17 | {F1, F2, F3, F5, F7, F8, F9, F10 F12, F14, F17, F21, F25, F27, F28, F29, F30} |
18 | {F1, F2, F3, F5, F7, F8, F9, F12, F14, F17 ,F21, F22, F23, F25, F27, F28, F29, F30} |
19 | {F1, F2, F3, F5, F6, F7, F8, F9, F12, F14, F17, F21, F22, F23, F25, F27, F28, F29, F30} |
20 | {F1, F2, F3, F5, F6, F7, F8, F9, F12, F14, F17, F19, F21, F22, F23, F25, F27, F28, F29, F0} |
21 | {F1, F2, F3, F5, F6, F7, F8, F9, F12, F14, F17, F18, F19, F21, F22, F23, F25, F27, F28, F29, F30} |
22 | {F1, F2, F3, F5, F6, F7, F8, F9, F12, F14, F17, F18, F19, F21, F22, F23, F25, F26, F27, F28, F29, F30} |
23 | {F1, F2, F3, F5, F6, F7, F8, F9, F12, F14, F17, F18, F19, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
24 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F12, F14, F17, F18, F19, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
25 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F14, F17, F18, F19, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30 } |
26 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F17, F18, F19, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
27 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F17, F18, F19, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30 } |
28 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
29 | {F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
30 | {F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30} |
3.3. Classification Results of SVM (Linear)
The SVM (kernel = linear) predictive model performance have been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are produced by REF FS algorithm and tabulated in Table 3. The SVM parameters C = 1 and
Table 4
Classification results of SVM (kernel = linear).
Model | Performance evaluation metric | |||||||||
Parameter (C, |
No of features | Acc (%) | Sp (%) | Precision (%) | Sen/recall (%) | F1-score | MCC | Classification error (%) | Execution time (s) | |
SVM (linear) | (1, 0.0001) | 1 | 76 | 88 | 88 | 56 | 70 | 72 | 24 | 0.003 |
2 | 85 | 90 | 92 | 74 | 82 | 83 | 15 | 0.083 | ||
3 | 95 | 94 | 94 | 97 | 96 | 95 | 5 | 0.002 | ||
4 | 96 | 94 | 94 | 95 | 95 | 95 | 4 | 0.005 | ||
5 | 96 | 96 | 96 | 94 | 95 | 95 | 4 | 0.003 | ||
6 | 96 | 96 | 93 | 94 | 95 | 96 | 4 | 0.002 | ||
7 | 96 | 96 | 96 | 96 | 96 | 94 | 4 | 0.002 | ||
8 | 95 | 94 | 94 | 94 | 94 | 95 | 5 | 0.002 | ||
9 | 96 | 96 | 96 | 95 | 94 | 96 | 4 | 0.002 | ||
10 | 97 | 96 | 95 | 94 | 95 | 96 | 3 | 0.004 | ||
11 | 97 | 96 | 96 | 94 | 95 | 96 | 3 | 0.003 | ||
12 | 97 | 97 | 97 | 98 | 97 | 96 | 3 | 0.010 | ||
13 | 95 | 96 | 93 | 94 | 95 | 96 | 5 | 0.010 | ||
14 | 95 | 97 | 97 | 94 | 96 | 96 | 5 | 0.010 | ||
15 | 98 | 99 | 99 | 97 | 98 | 98 | 2 | 0.024 | ||
16 | 98 | 99 | 99 | 97 | 98 | 98 | 2 | 0.019 | ||
17 | 98 | 98 | 98 | 98 | 98 | 98 | 2 | 0.080 | ||
18 | 99 | 99 | 99 | 98 | 99 | 99 | 1 | 0.030 | ||
19 | 97 | 96 | 96 | 99 | 98 | 97 | 3 | 0.022 | ||
20 | 97 | 98 | 98 | 98 | 98 | 97 | 3 | 0.027 | ||
21 | 97 | 96 | 96 | 99 | 98 | 97 | 3 | 0.038 | ||
22 | 96 | 96 | 96 | 99 | 98 | 97 | 4 | 0.107 | ||
23 | 98 | 98 | 98 | 99 | 98 | 98 | 2 | 1.832 | ||
24 | 98 | 98 | 98 | 99 | 98 | 98 | 2 | 0.909 | ||
25 | 97 | 96 | 96 | 97 | 97 | 97 | 3 | 0.733 | ||
26 | 96 | 96 | 96 | 97 | 97 | 97 | 4 | 3.729 | ||
27 | 96 | 95 | 95 | 97 | 96 | 96 | 4 | 3.987 | ||
28 | 96 | 95 | 95 | 96 | 95 | 96 | 4 | 3.594 | ||
29 | 96 | 97 | 97 | 97 | 97 | 97 | 4 | 4.332 | ||
30 | 95 | 96 | 96 | 95 | 98 | 94 | 5 | 4.547 |
[figure omitted; refer to PDF]
[figure omitted; refer to PDF][figure omitted; refer to PDF]
3.4. Classification Results of SVM (RBF)
The SVM (kernel = RBF) predictive model performance has been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are selected by REF FS algorithm. The SVM parameters C = 1 and
Table 5
Classification results of SVM (kernel = RBF)-based predictive model on different features subsets created by REF FA algorithm.
Model | Classification performances evaluation metrics | |||||||||
Parameters (C, |
Number of features | Acc (%) | Spe (%) | Precision (%) | Sen/recall (%) | F1-score | MCC | Classification error (%) | Execution time (s) | |
SVM (RBF) | (1, 0.0001) | 1 | 64 | 100 | 99 | 3 | 6 | 50 | 36 | 0.005 |
2 | 64 | 100 | 98 | 5 | 10 | 50 | 36 | 0.008 | ||
3 | 84 | 100 | 98 | 56 | 72 | 78 | 16 | 0.006 | ||
4 | 85 | 100 | 98 | 63 | 77 | 81 | 15 | 0.006 | ||
5 | 85 | 100 | 98 | 62 | 77 | 80 | 15 | 0.006 | ||
6 | 86 | 100 | 99 | 62 | 77 | 80 | 14 | 0.007 | ||
7 | 86 | 100 | 99 | 62 | 77 | 81 | 14 | 0.006 | ||
8 | 86 | 100 | 99 | 62 | 77 | 82 | 14 | 0.006 | ||
9 | 87 | 100 | 98 | 63 | 77 | 82 | 13 | 0.007 | ||
10 | 87 | 100 | 98 | 63 | 77 | 82 | 13 | 0.007 | ||
11 | 87 | 100 | 98 | 63 | 77 | 82 | 13 | 0.007 | ||
12 | 91 | 99 | 99 | 77 | 87 | 88 | 9 | 0.005 | ||
13 | 90 | 98 | 98 | 76 | 86 | 87 | 10 | 0.005 | ||
14 | 91 | 99 | 99 | 77 | 87 | 88 | 9 | 0.004 | ||
15 | 92 | 99 | 99 | 81 | 89 | 91 | 8 | 0.008 | ||
16 | 90 | 100 | 100 | 76 | 87 | 87 | 10 | 0.017 | ||
17 | 92 | 98 | 98 | 86 | 92 | 91 | 8 | 0.004 | ||
18 | 98 | 99 | 99 | 96 | 98 | 97 | 2 | 0.004 | ||
19 | 97 | 98 | 98 | 96 | 98 | 97 | 3 | 0.003 | ||
20 | 97 | 99 | 99 | 96 | 98 | 97 | 3 | 0.004 | ||
21 | 97 | 99 | 99 | 95 | 97 | 97 | 3 | 0.005 | ||
22 | 97 | 99 | 99 | 95 | 97 | 97 | 3 | 0.004 | ||
23 | 95 | 99 | 99 | 89 | 94 | 94 | 5 | 0.008 | ||
24 | 94 | 99 | 99 | 89 | 94 | 94 | 6 | 0.006 | ||
25 | 94 | 99 | 99 | 89 | 94 | 94 | 6 | 0.015 | ||
26 | 94 | 99 | 99 | 89 | 94 | 94 | 6 | 0.009 | ||
27 | 94 | 99 | 99 | 89 | 94 | 93 | 6 | 0.016 | ||
28 | 95 | 98 | 98 | 88 | 93 | 94 | 5 | 0.018 | ||
29 | 94 | 99 | 99 | 89 | 94 | 94 | 6 | 0.017 | ||
30 | 95 | 99 | 99 | 90 | 95 | 94 | 5 | 0.019 |
[figure omitted; refer to PDF]
[figure omitted; refer to PDF][figure omitted; refer to PDF]
3.5. Classification Results of SVM (Polynomial)
The SVM (kernel = polynomial) predictive model performance have been checked for prediction of breast cancer on the full-feature set and on different selected features subsets which are selected by REF FS algorithm. The SVM parameters C = 1 and
Table 6
Classification results of SVM (kernel = polynomial).
Model | Classification performances evaluation metrics | |||||||||
Parameters (C, |
Number of features in subset | Acc (%) | Sp (%) | Precision (%) | Sen/recall (%) | F1-score | MCC | Classification error | Execution time (s) | |
SVM (polynomial) | (1, 0.0001) | 1 | 64 | 100 | 99 | 20 | 33 | 50 | 36 | 0.013 |
2 | 65 | 99 | 98 | 23 | 37 | 50 | 35 | 0.009 | ||
3 | 63 | 99 | 98 | 23 | 37 | 50 | 37 | 0.005 | ||
4 | 63 | 99 | 98 | 23 | 37 | 50 | 37 | 0.006 | ||
5 | 63 | 99 | 99 | 23 | 37 | 50 | 37 | 0.006 | ||
6 | 64 | 99 | 99 | 23 | 37 | 50 | 36 | 0.006 | ||
7 | 64 | 99 | 99 | 23 | 37 | 51 | 36 | 0.004 | ||
8 | 65 | 99 | 99 | 24 | 39 | 51 | 35 | 0.005 | ||
9 | 64 | 99 | 99 | 24 | 39 | 50 | 36 | 0.007 | ||
10 | 64 | 99 | 99 | 24 | 39 | 50 | 36 | 0.006 | ||
11 | 65 | 99 | 99 | 24 | 39 | 51 | 35 | 0.006 | ||
12 | 90 | 99 | 99 | 75 | 95 | 87 | 10 | 0.002 | ||
13 | 89 | 99 | 99 | 40 | 57 | 80 | 11 | 0.004 | ||
14 | 88 | 99 | 99 | 74 | 85 | 86 | 12 | 0.002 | ||
15 | 92 | 99 | 99 | 81 | 89 | 91 | 8 | 0.002 | ||
16 | 93 | 98 | 98 | 81 | 88 | 91 | 7 | 0.002 | ||
17 | 93 | 98 | 98 | 87 | 92 | 92 | 7 | 0.005 | ||
18 | 97 | 97 | 97 | 97 | 97 | 97 | 3 | 0.002 | ||
19 | 96 | 97 | 97 | 97 | 97 | 97 | 4 | 0.119 | ||
20 | 96 | 97 | 97 | 95 | 96 | 97 | 4 | 0.111 | ||
21 | 96 | 96 | 96 | 96 | 96 | 97 | 4 | 0.124 | ||
22 | 96 | 96 | 96 | 96 | 96 | 97 | 4 | 0.060 | ||
23 | 95 | 95 | 95 | 96 | 95 | 95 | 5 | 0.060 | ||
24 | 94 | 94 | 94 | 96 | 94 | 93 | 6 | 0.061 | ||
25 | 93 | 94 | 94 | 95 | 94 | 93 | 7 | 0.151 | ||
26 | 92 | 95 | 95 | 95 | 94 | 93 | 8 | 0.162 | ||
27 | 92 | 94 | 94 | 94 | 93 | 92 | 8 | 0.167 | ||
28 | 92 | 94 | 94 | 94 | 93 | 92 | 8 | 0.211 | ||
29 | 92 | 94 | 94 | 91 | 92 | 91 | 8 | 0.234 | ||
30 | 92 | 92 | 92 | 91 | 91 | 92 | 8 | 0.277 |
[figure omitted; refer to PDF]
[figure omitted; refer to PDF][figure omitted; refer to PDF]
3.6. Classification Results of SVM (Sigmoid)
The SVM (kernel = sigmoid) predictive model performance has been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are selected by REF FS algorithm. The SVM parameters C = 1 and
Table 7
Classification results of SVM (kernel = sigmoid).
Model | Classification performances evaluation metrics | |||||||||
Parameters (C, |
Number of features | Acc (%) | Sp (%) | Precision (%) | Sen/recall (%) | F1-score | MCC | Classification error | Execution time (s) | |
SVM (sigmoid) | (1, 0.0001) | 1 | 64 | 100 | 99 | 20 | 34 | 50 | 36 | 0.006 |
2 | 64 | 100 | 99 | 20 | 34 | 50 | 36 | 0.006 | ||
3 | 71 | 100 | 99 | 20 | 34 | 60 | 29 | 0.006 | ||
4 | 79 | 100 | 99 | 42 | 56 | 70 | 21 | 0.006 | ||
5 | 78 | 100 | 99 | 62 | 77 | 70 | 22 | 0.011 | ||
6 | 79 | 100 | 99 | 42 | 59 | 71 | 21 | 0.005 | ||
7 | 79 | 100 | 100 | 63 | 77 | 71 | 21 | 0.005 | ||
8 | 80 | 100 | 100 | 43 | 60 | 72 | 20 | 0.005 | ||
9 | 80 | 100 | 100 | 42 | 59 | 71 | 20 | 0.008 | ||
10 | 80 | 100 | 100 | 42 | 59 | 71 | 20 | 0.005 | ||
11 | 81 | 100 | 100 | 42 | 59 | 70 | 19 | 0.006 | ||
12 | 80 | 100 | 100 | 41 | 58 | 50 | 20 | 0.016 | ||
13 | 84 | 54 | 54 | 60 | 45 | 77 | 16 | 0.005 | ||
14 | 75 | 99 | 99 | 70 | 82 | 60 | 25 | 0.009 | ||
15 | 76 | 100 | 100 | 72 | 83 | 63 | 24 | 0.010 | ||
16 | 74 | 100 | 100 | 64 | 77 | 64 | 26 | 0.008 | ||
17 | 45 | 70 | 70 | 01 | 2 | 35 | 55 | 0.031 | ||
18 | 28 | 45 | 45 | 2 | 4 | 22 | 72 | 0.012 | ||
19 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.020 | ||
20 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.011 | ||
21 | 29 | 45 | 45 | 02 | 4 | 22 | 71 | 0.148 | ||
22 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.044 | ||
23 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.044 | ||
24 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.004 | ||
25 | 28 | 45 | 45 | 02 | 4 | 22 | 72 | 0.004 | ||
26 | 27 | 45 | 45 | 02 | 4 | 22 | 73 | 0.014 | ||
27 | 27 | 45 | 45 | 02 | 4 | 22 | 73 | 0.016 | ||
28 | 27 | 45 | 45 | 02 | 4 | 22 | 73 | 0.017 | ||
29 | 27 | 45 | 45 | 02 | 4 | 22 | 73 | 0.117 | ||
30 | 27 | 45 | 45 | 02 | 4 | 22 | 73 | 0.221 |
[figure omitted; refer to PDF]
[figure omitted; refer to PDF][figure omitted; refer to PDF]
3.7. SVM Different Kernels Performance Comparison on Best-Selected Features
Table 8 shows the performance of different SVM kernels on selected feature set. The SVM linear kernel predictive performances are good compared with other SVM kernel RBF, polynomial, and sigmoid. The accuracy of the SVM linear was 99%, which shows the overall performance of the proposed system. The 99% specificity shows that the SVM linear effectively detected the healthy people. Similarly, 98% sensitivity of SVM linear effectively detects the breast cancer people. Furthermore, the F1-score of SVM linear is 98%. The MCC value of SVM linear is 99%. The classification error of Liner SVM was 1%. Thus, liner SVM-based diagnostic system for breast cancer is very efficient and reliable. The second beast SVM kernel is RBF according to Table 8 and on the reduced feature set SVM RBF achieved 98% classification accuracy, 99% specificity, 98% sensitivity, 98 F1-score, and 97% MCC, and execution time of SVM RBF is 0.004 seconds. The third best SVM predictive model kernel is polynomial kernel according to Table 8, and SVM (kernel = polynomial) obtained 97% classification, 97% specificity, 97% sensitivity, and 97 F1-score, and the MCC value is 97%. The execution time is 0.002 seconds. The performance of SVM kernel sigmoid was very low compared with other three SVM kernels and on feature subset 13, the SVM kernel sigmoid obtained 84% accuracy, 54% specificity, 60% sensitivity, and 77% MCC. Additionally, the execution time is 0.005 seconds. Thus, we reached on the conclusion that SVM kernel linear is a good predictive model for diagnostic of breast cancer compared with other three SVM kernels. The accuracy, specificity, and sensitivity of the four SVM kernels are graphically demonstrated in Figure 19 for better understanding. The execution time of these four SVM kernels has been shown in Figure 20.
Table 8
Excellent performance metrics results and best SVM kernel on selected feature subset.
Predictive model | Best feature subset | Accuracy (%) | Specificity (%) | Sensitivity/recall (%) | F1-score | MCC | Error | Execution time |
SVM (kernel-linear) | 18 | 99 | 99 | 98 | 99 | 99 | 1 | 0.030 |
SVM (kernel = RBF) | 18 | 98 | 99 | 98 | 98 | 97 | 2 | 0.004 |
SVM (kernel = polynomial) | 18 | 97 | 97 | 97 | 97 | 97 | 3 | 0.002 |
SVM (kernel = sigmoid) | 13 | 84 | 54 | 60 | 45 | 77 | 16 | 0.005 |
[figure omitted; refer to PDF]
3.8. Proposed Method Performance Comparison with Previous Methods
The performance of the proposed method in term of accuracy is good as compared with previous methods. In Table 9, the proposed method accuracy has been compared with different methods. Table 9 shows that the proposed method achieved high accuracy as compared with other states of the art method. This might be due to appropriate feature selection by FS algorithm.
Table 9
Proposed study classification performance and results of other previously proposed methods.
Reference | Method | Accuracy (%) |
[38] | PCA-AE-Ada | 85 |
[39] | ACO-SVM | 95.98 |
[35] | GA-SVM | 97.19 |
[35] | PSO-SVM | 97.37 |
[26] | GA-MOO-NN | 98.85 |
[14] | PCA-SVM | 96.84 |
[40] | Breast cancer diagnosis techniques using SVM, PNN, and MLP | 97.80 |
[11] | Classification system using fuzzy-GA method | 97.36 |
[20] | Classification system using mixture ensemble of convolutional neural network | 96.39 |
[41] | SAE-SVM | 98.25 |
[42] | Prediction of breast cancer using SVM and K-NN | 98.57 |
[43] | Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm | 98.50 |
[44] | Cost sensitivity SVM with IG for FS and breast cancer diagnosis | 98.83 |
Proposed method | REF-SVM | 99 |
4. Conclusions
Internet of Things (IoT) has witnessed the transition in life for the last few years which provides a way to analyze both the real-time data and past data by the emerging role of artificial intelligence and data mining techniques. In this research study, a diagnosis system is developed for breast cancer diagnosis. In designing the system machine learning predictive model, SVM was used for breast cancer detection. Recursive feature elimination (REF) FS algorithm is used for suitable and related feature selection for correct target classification of the malignant and benign people. REF algorithm produced new subsets of features from Wisconsin Diagnostic Breast Cancer dataset. The dataset was split into 70% for training and 30% for validation purpose. Additionally, the techniques of performance measuring metrics such as accuracy, sensitivity/recall, and specificity/precision, F1-score, MCC, and execution time were used for model performance evaluation. The Wisconsin Diagnostic Breast Cancer dataset of 32 attributes with 30 real value features and 569 instances available on UC Irvine data mining repository was used for testing of the proposed system. Machine learning libraries in python are used for the implementation and development of the proposed system. The experimental results analysis shows that the proposed system classifies the malignant and benign people effectively. The improvement in malignant and benign people prediction might be due to various contributions to the BC features. These findings suggest that the proposed diagnosis system could be used to accurately predict BC and furthermore could be easily incorporated in healthcare. The reduced space of features by REF FS algorithm shows that these are highly important features that diagnose BC accurately as compared with original features space. The classification performance of SVM with different kernels such as linear, RBF, polynomial, and sigmoid was tested on reduced number feature subset 18 is best as compared with full-feature set and on other reduced feature subsets. According to Table 8, SVM kernel-linear performance is best as compared to other SVM kernels such as RBF, polynomial, and sigmoid and SVM linear obtained 99% accuracy, 99% specificity, and 98% sensitivity. The 99% specificity value shows that it is good for the detection of healthy people. Similarly, 98% sensitivity shows that classifier effectively detected BC people. According to REF FS algorithm, the most important features are {F1, F2, F3, F5, F7, F8, F9, F12, F14, F17, F21, F22, F23, F25, F27, F28, F29, and F30}. These features have great impacts on the classification of BC and healthy people.
The novelty of the study is designed as a system of diagnosis to classify BC and healthy people. The system used the FS algorithm REF, SVM, training/testing splits method, and performance measuring metrics for BC diagnosis. For better diagnosis of breast cancer, machine learning method-based decision support system is more reliable. Furthermore, we know that irrelevant features also degrade the performance of the diagnosis system and computation time increases. Hence, another innovative part of the proposed study used feature selection algorithm to select the relevant subset of features that improve the classification performance diagnosis system. According to Table 9, the performance of the proposed system (REF-SVM) is excellent and achieved 99% classification accuracy as compared with the classification performances of other proposed studies. In the future, other features selection algorithms, optimization, and deep neural network classification methods will be utilized to further increase the performance of the diagnosis system for BC diagnosis.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (grant no. 61370073), the National High Technology Research and Development Program of China (grant no. 2007AA01Z423), and the project of Science and Technology Department of Sichuan Province.
[1] American Institute of Cancer Research, "Breast cancer statistics," 2018. https://www.wcrf.org/dietandcancer/cancer-trends/breast-cancer-statistics
[2] M. Islam, H. Iqbal, R. Haque, K. Hasan, "Prediction of breast cancer using support vector machine and K-nearest neighbors," Proceedings of the IEEE Region 10 Humanitarian Technology Conference (R10-HTC), vol. 23,DOI: 10.1109/r10-htc.2017.8288944, .
[3] A. M. Ahmad, G. M. Khan, S. A. Mahmud, J. F. Miller, "Breast cancer detection using cartesian genetic programming evolved artificial neural networks," Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 1031-1038, DOI: 10.1145/2330163.2330307, .
[4] A. T. Azar, S. A. El-Said, "Probabilistic neural network for breast cancer classification," Neural Computing and Applications, vol. 23 no. 6, pp. 1737-1751, DOI: 10.1007/s00521-012-1134-8, 2013.
[5] E. Warner, "Systematic review: using magnetic resonance imaging to screen women at high risk for breast cancer," Annals of Internal Medicine, vol. 148 no. 9, pp. 671-679, DOI: 10.7326/0003-4819-148-9-200805060-00007, 2008.
[6] A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, R. Sun, "A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms," Mobile Information Systems, vol. 2018,DOI: 10.1155/2018/3860146, 2018.
[7] E. Aličković, A. Subasi, "Breast cancer diagnosis using GA feature selection and rotation forest," Neural Computing and Applications, vol. 28 no. 4, pp. 753-763, DOI: 10.1007/s00521-015-2103-9, 2017.
[8] F. Ahmad, N. A. Mat Isa, Z. Hussain, S. N. Sulaiman, "A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis," Neural Computing and Applications, vol. 23 no. 5, pp. 1427-1435, DOI: 10.1007/s00521-012-1092-1, 2013.
[9] K. Hasan, M. Islam, M. M. A. Hashem, "Mathematical model development to detect breast cancer using multigene genetic programming," Proceedings of the 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 574-579, DOI: 10.1109/iciev.2016.7760068, .
[10] A. A. Albrecht, G. Lappas, S. A. Vinterbo, C. Wong, L. Ohno-Machado, "Two applications of the LSA machine," Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, pp. 184-189, .
[11] C. A. Peña-Reyes, M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis," Artificial Intelligence in Medicine, vol. 17 no. 2, pp. 131-155, DOI: 10.1016/s0933-3657(99)00019-6, 1999.
[12] M. F. Akay, "Support vector machines combined with feature selection for breast cancer diagnosis," Expert Systems with Applications, vol. 36 no. 2, pp. 3240-3247, DOI: 10.1016/j.eswa.2008.01.009, 2009.
[13] B. Zheng, S. W. Yoon, S. S. Lam, "Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms," Expert Systems with Applications, vol. 41 no. 4, pp. 1476-1482, DOI: 10.1016/j.eswa.2013.08.044, 2014.
[14] G. N. Ramadevi, "Importance of feature extraction for classification of breast cancer datasets, a study," International Journal of Scientific and Innovative Mathematical Research, vol. 3, pp. 763-368, 2015.
[15] H. A. Abbass, "An evolutionary artificial neural networks approach for breast cancer diagnosis," Artificial Intelligence in Medicine, vol. 25 no. 3, pp. 265-281, DOI: 10.1016/s0933-3657(02)00028-3, 2002.
[16] A. Marcano-Cedeño, J. Quintanilla-Domínguez, D. Andina, "WBCD breast cancer database classification applying artificial metaplasticity neural network," Expert Systems with Applications, vol. 38 no. 8, pp. 9573-9579, DOI: 10.1016/j.eswa.2011.01.167, 2011.
[17] Y.-Q. Liu, C. Wang, L. Zhang, "Decision tree based predictive models for breast cancer survivability on imbalanced data," Proceedings of the 2009 3rd International Conference on Bioinformatics and Biomedical Engineering,DOI: 10.1109/ICBBE.2009.5162571, .
[18] A. Onan, "A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer," Expert Systems with Applications, vol. 42 no. 15, pp. 6844-6852, 2015.
[19] R. Sheikhpour, M. A. Sarram, R. Sheikhpour, "Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer," Applied Soft Computing, vol. 40, pp. 113-131, DOI: 10.1016/j.asoc.2015.10.005, 2016.
[20] R. Rasti, M. Teshnehlab, S. L. Phung, "Breast cancer diagnosis in DCE-MRI using mixture ensemble of convolutional neural networks," Pattern Recognition, vol. 72 no. 24, pp. 381-390, DOI: 10.1016/j.patcog.2017.08.004, 2017.
[21] R. Ani, S. Krishna, N. Anju, M. S. Aslam, O. S. Deepa, "IoT based patient monitoring and diagnostic prediction tool using ensemble classifier," Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI),DOI: 10.1109/ICACCI.2017.8126068, .
[22] Z. Yang, Q. Zhou, L. Lei, K. Zheng, W. Xiang, "An IoT-cloud based wearable ECG monitoring system for smart healthcare," Journal of Medical Systems, vol. 40,DOI: 10.1007/s10916-016-0644-912, 2016.
[23] A. Tsanas, M. A. Little, P. E. McSharry, L. O. Ramig, "Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity," Journal of the Royal Society Interface, vol. 8 no. 6, pp. 842-855, DOI: 10.1098/rsif.2010.0456, 2011.
[24] Z. Cai, J. Gu, H.-L. Chen, "A new hybrid intelligent framework for predicting Parkinson's disease," IEEE Access, vol. 5 no. 19, pp. 17188-17200, DOI: 10.1109/access.2017.2741521, 2017.
[25] R. J. Urbanowicza, M. Meeker, W. L. Cava, R. S. Olson, J. H. Moore, "Relief-based feature selection: introduction and review," Journal of Biomedical Informatics, vol. 21 no. 4,DOI: 10.1016/j.jbi.2018.07.014, 2018.
[26] L. Naranjo, C. J. Pérez, J. Martín, Y. C. Roca, "A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications," Computer Methods and Programs in Biomedicine, vol. 142 no. 22, pp. 147-156, DOI: 10.1016/j.cmpb.2017.02.019, 2017.
[27] W. H. Wolberg, Wisconsin Diagnostic Breast Cancer (WDBC), 1995.
[28] S. Kotsiantis, "Data preprocessing for supervised learning," International Journal of Computer Science, vol. 1, pp. 111-117, 2006.
[29] A. Famili, W. Shen, R. Weber, E. Simoudis, "Data preprocessing and intelligent data analysis," Intelligent Data Analysis, vol. 1 no. 1–4,DOI: 10.3233/ida-1997-1102, 1997.
[30] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine Learning, vol. 46 no. 1–3, pp. 389-422, DOI: 10.1023/a:1012487302797, 2002.
[31] N. Cristianini, J. S. Taylor, An Introduction to Support Vector Machines, 2000.
[32] C.-C. Chang, C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2,DOI: 10.1145/1961189.1961199, 2011.
[33] H.-L. Chen, B. Yang, J. Liu, D.-Y. Liu, "A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis," Expert Systems with Applications, vol. 38 no. 7, pp. 9014-9022, DOI: 10.1016/j.eswa.2011.01.120, 2011.
[34] J. Mourão-Miranda, A. L.W. Bokde, C. Born, H. Hampel, M. Stetter, "Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data," NeuroImage, vol. 28 no. 4, pp. 980-995, DOI: 10.1016/j.neuroimage.2005.06.070, 2005.
[35] V. D. Sánchez A, "Advanced support vector machines and kernel methods," Neurocomputing, vol. 55 no. 1-2,DOI: 10.1016/s0925-2312(03)00373-4, 2003.
[36] A. U. Haq, J. Li, M. H. Memon, "Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for Parkinson disease prediction," Proceedings of the 2018 IEEE, 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP),DOI: 10.1109/iccwamtip.2018.8632613, .
[37] A. U. Haq, J. P. Li, M. H. Memon, "Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson’s disease using voice recordings," IEEE Access, vol. 7, pp. 37718-37734, DOI: 10.1109/access.2019.2906350, 2019.
[38] D. Zhang, L. Zou, X. Zhou, F. He, "Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer," IEEE Access, vol. 6, pp. 28936-28944, DOI: 10.1109/access.2018.2837654, 2018.
[39] Y. Prasad, K. K. Biswas, C. K. Jain, "SVM classifier based feature selection using GA, ACO and PSO for siRNA design," Lecture Notes in Computer Science, pp. 307-314, DOI: 10.1007/978-3-642-13498-2_40, 2010.
[40] A. F. Al-Fatlawi, M. H. Jabardi, S. H. Ling, "Efficient diagnosis system for Parkinson’s disease using deep belief network," Proceedings of the IEEE Congress on Evolutionary Computation,DOI: 10.1109/cec.2016.7743941, .
[41] Y. Xiao, J. Wu, Z. Lin, X. Zhao, "Breast cancer diagnosis using an unsupervised feature extraction algorithm based on deep learning," Proceedings of the 2018 37th Chinese Control Conference (CCC), pp. 9428-9433, DOI: 10.23919/ChiCC.2018.8483140, .
[42] M. M. Islam, H. Iqbal, R. Haque, K. Hasan, "Prediction of breast cancer using support vector machine and K-nearest neighbors," Proceedings of the 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), vol. 10, pp. 226-229, DOI: 10.1109/R10-HTC.2017.8288944, .
[43] N. Khuriwal, N. Mishra, "Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm," Proceedings of the 2018 IEEMA Engineer Infinite Conference (eTechNxT),DOI: 10.1109/ETECHNXT.2018.8385355, .
[44] N. Liu, J. Shen, M. Xu, D. Gan, E.-S. Qi, B. Gao, "Improved cost-sensitive support vector machine classifier for breast cancer diagnosis," Mathematical Problems in Engineering, vol. 2018,DOI: 10.1155/2018/3875082, 2018.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2019 Muhammad Hammad Memon et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The accurate and efficient diagnosis of breast cancer is extremely necessary for recovery and treatment in early stages in IoT healthcare environment. Internet of Things has witnessed the transition in life for the last few years which provides a way to analyze both the real-time data and past data by the emerging role of artificial intelligence and data mining techniques. The current state-of-the-art method does not effectively diagnose the breast cancer in the early stages, and most of the ladies suffered from this dangerous disease. Thus, the early detection of breast cancer significantly poses a great challenge for medical experts and researchers. To solve the problem of early-stage detection of breast cancer, we proposed machine learning-based diagnostic system which effectively classifies the malignant and benign people in the environment of IoT. In the development of our proposed system, a machine learning classifier support vector machine is used to classify the malignant and benign people. To improve the classification performance of the classification system, we used a recursive feature selection algorithm to select more suitable features from the breast cancer dataset. The training/testing splits method is applied for training and testing of the classifier for the best predictive model. Additionally, the classifier performance has been checked on by using performance evaluation metrics such as classification, specificity, sensitivity, Matthews’s correlation coefficient, F1-score, and execution time. To test the proposed method, the dataset “Wisconsin Diagnostic Breast Cancer” has been used in this research study. The experimental results demonstrate that the recursive feature selection algorithm selects the best subset of features, and the classifier SVM achieved optimal classification performance on this best subset of features. The SVM kernel linear achieved high classification accuracy (99%), specificity (99%), and sensitivity (98%), and the Matthews’s correlation coefficient is 99%. From these experimental results, we concluded that the proposed system performance is excellent due to the selection of more appropriate features that are selected by the recursive feature selection algorithm. Furthermore, we suggest this proposed system for effective and efficient early stages diagnosis of breast cancer. Thus, through this system, the recovery and treatment will be more effective for breast cancer. Lastly, the implementation of the proposed system is very reliable in all aspects of IoT healthcare for breast cancer.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2 School of Information Science and Technology University of Science and Technology of China, Hefei, China