Content area
Electrocardiography (ECG) is a non-invasive tool used to identify abnormalities in heart rhythm. It is used to evaluate dysfunctions in the electrical system of the heart. It offers a mechanism that does not cause any harm to patients. Being affordable makes it accessible. It provides a comprehensive assessment of the condition of the heart. Although it provides a successful analysis opportunity for arrhythmia detection, it is time-consuming and depends on the clinician’s experience. In addition, since the ECG patterns in pediatric patients are different from the ECG patterns in adults, physicians consider it a difficult and complex task. For this reason, a custom dataset of pediatric patients was created in this study. This dataset consists of 1318 abnormal beats and 1403 normal beats. MobileNetv2 transfer learning architecture was used to classify this balanced dataset. However, the stability of the results is a valuable. Therefore, the optimization algorithm that minimizes the loss function and the regularization method that controls the complexity of the model are proposed. In this direction, Proposed Optimization Algorithm V5 and Proposed Regularization Method V5 approaches have been integrated into the MobileNetv2 transfer learning model. The accuracy rates produced in the training and test datasets are 0.9801 and 0.9509, respectively. These results have acceptable improvement and stability compared to the accuracies of 0.9633 and 0.9399 produced by the original MobileNetv2 architecture on the training and test dataset, respectively. However, performance values provide limited information about the generalizability of the model. Therefore, the same processes were repeated on a more complex dataset with 6 categories. As a result of the classification, the accuracy rates for the training and test data sets were obtained as 0.9200% and 0.8975%, respectively. Training was performed under the same conditions as the training performed on 2-category datasets. Therefore, it is normal for the test dataset to experience a decrease of approximately 5%. The results obtained show that generalizations can be made for comprehensive, highly diverse and rich datasets.
Introduction
Heart diseases are among the most common health problems in society. These disorders can negatively impact the quality of daily life for individuals and can lead to potentially life-threatening clinical issues. One of the most prevalent heart conditions is arrhythmia, which can occur in both adults and children. However, the frequency and clinical significance of arrhythmia in children differ from those in adults. It is rare in children and can be challenging and time-consuming to diagnose [1].
Arrhythmias are abnormalities that affect the heart rate, causing the heart to go beyond its normal function and not be able to pump enough blood to the body. This condition is often called “arrhythmia”. Arrhythmia is when the electrical signals in the heart are not working correctly, causing the heartbeat to become faster, slower, or more irregular. However, changes in heart rate are not always a sign of abnormality. There are situations where fast or slow heartbeats are considered normal. Therefore, it is critical for patients to correctly distinguish between “normal” and “abnormal” arrhythmias to make a definitive diagnosis [2].
Rhythm disturbances are usually detected by Electrocardiography (ECG). ECG records the heart’s electrical activity and reveals irregularities and abnormalities in the heartbeat. In the medical world, electrocardiogram (ECG) analysis plays a vital role in correctly diagnosing cardiac arrhythmias by providing critical information about the heart’s electrical activity. However, due to their nature, some arrhythmias present a time-consuming and challenging detection process [3, 4]. In addition, ECG signals have a dynamic and complex structure. This situation requires time to evaluate and analyze the signals visually. Under normal conditions, ECG analysis is a part of basic medical education. For many patients who apply to a doctor for a reason, its interpretation is very important for the healthy evaluation of patients. For this reason, during medical education, students learn basic cardiac examination methods and the information necessary to detect cardiovascular diseases. However, due to the emergence of medical specialization, ECG interpretation poses a diagnostic challenge for physicians [5].
Arrhythmia, defined as irregular variations in heart rhythm, accounts for a large proportion of sudden cardiac deaths, especially in young individuals. Rapid detection with an artificial intelligence-based system is an important issue [4]. Many studies have been conducted using machine learning and deep learning models to automate the diagnosis of heartbeats. It is stated that artificial intelligence (AI), machine learning, and deep learning approaches, which are a subset of machine learning, offer promising solutions, especially in the context of arrhythmia disorders [6]. Therefore, computer-aided studies are essential for rapid and accurate diagnosis in this field. A comprehensive literature review was conducted to investigate the studies undertaken in the framework of this parameter.
The study [7] proposes an improved deep residual convolutional neural network to classify arrhythmias automatically. First, the overlapping segmentation method is used to segment the ECG signals in the MIT-BIH database into 5-second segments to eliminate the imbalance between classes, and these segments of the ECG signals are re-labeled. Then, discrete wavelet transform (DWT) removes noise from these segments. The results obtained using the improved deep residual convolutional neural network method for arrhythmia classification showed 94.54% sensitivity, 93.33% positive prediction, and 80.80% specificity for regular segments. The proposed method for supraventricular ectopic beat provides 35.22% sensitivity, 65.88% positive prediction, and 98.83% specificity. The proposed method for ventricular ectopic beat provides 88.35% sensitivity, 79.86% positive prediction, and 94.92% specificity.
A general and robust system for R-peak detection in Holter ECG signals is presented. Although many proposed algorithms have successfully addressed the problem of ECG R-peak detection, there is still a significant gap in the performance of these detectors on low-quality ECG records. Therefore, in the study [8], a novel implementation of a 1D Convolutional Neural Network (CNN) integrated with a verification model is used to reduce the number of false positives. This CNN architecture consists of an encoder block and a corresponding decoder block to generate a 1D segmentation map of R-peaks from the input ECG signal, followed by an example-based classification layer. This model has been tested on two open-access ECG databases. The first is the China Physiological Signal Challenge 2020 database (CPSC-DB). The second one is the MIT-BIH Arrhythmia Database (MIT-DB). Experimental results show that the proposed systematic approach achieves 99.30% F1-score, 99.69% recall, and 98.91% precision on CPSC-DB, while it achieves 99.83% F1-score, 99.85% recall, and 99.82% precision on MIT-DB.
The paper [9] proposes a hybrid 1D Resnet-GRU method consisting of Resnet and gated recurrent unit (GRU) modules to implement arrhythmia classification from 12-lead ECG recordings. Furthermore, the Grad-CAM++ mechanism is applied to the trained network model, thus generating thermal images superimposed on raw signals to discover the underlying explanations of various ECG segments. Experimental results indicate that the proposed method can achieve a high F1 score of 0.821 in classifying nine kinds of arrhythmia.
In another paper, a novel end-to-end Deep Multiscale Fusion Convolutional Neural Network (DMSFNet) architecture for multi-class arrhythmia detection was presented [10]. The proposed approach can effectively capture abnormal disease patterns and suppress noise interference with multiscale feature extraction and cross-scale information complementarity of ECG signals. It is also a joint optimization strategy with multiple losses at different scales. This strategy not only learns scale-specific features but also performs multiscale complementary feature learning cumulatively during the learning process. The DMSFNet model was implemented on two public datasets (CPSC_2018 and PhysioNet/CinC_2017). The obtained F1 scores for these two datasets were higher than those of previous approaches.
The paper [11] proposes a novel latent attention residual network (HA-ResNet) for automatic arrhythmia classification. In this model, firstly, one-dimensional ECG signals are transformed into two-dimensional images and fed into an embedding layer to extract the corresponding shallow features in ECG. Then, a latent attention layer combining Squeeze-and-Excitation (SE) block and Bidirectional Convolutional LSTM (BConvLSTM) is used to capture the deep Spatio-temporal features further. HA-ResNet is evaluated on two public datasets and achieves F1 scores of 96.0%, 96.7%, and 87.6% on 2s segments, 5s segments, and 10s segments, respectively.
Deep learning models have shown remarkable effectiveness in detecting cardiac arrhythmias by analyzing ECG signals to classify heartbeats. To improve the performance of these models, a novel hybrid hierarchical attention-based bidirectional recurrent neural network (HARDC) has been developed. This model combines an extended convolutional neural network (CNN) with a bidirectional recurrent neural network (BiGRU-BiLSTM) for arrhythmia classification [12]. The HARDC model effectively integrates the architecture of the extended CNN with BiGRU-BiLSTM to generate fusion features. Experimental results from the trained HARDC model demonstrate impressive performance, achieving an accuracy of 99.60%, an F1 score of 98.21%, and a precision of 97.66% on the PhysioNet 2017 dataset. Additionally, it exhibits a recall value of 99.60% on the MIT-BIH dataset.
The lack of labeled ECG data and low classification accuracy can significantly impact the overall effectiveness of a classification algorithm. The study [13] proposed a feature extraction and classification strategy based on generative adversarial network data augmentation and model fusion to better apply deep learning methods to arrhythmia classification. Firstly, arrhythmia sparse data is augmented by generative adversarial networks. Then, it proposes the hybrid use of a ResNet-based spatial information fusion model and a BiLSTM-based temporal information fusion model, aiming to identify different arrhythmia types in long-term ECG. The model effectively combines the location information of the nearest neighbors through the local feature extraction part of the generated ECG feature map. At the same time, it obtains the correlation of global features through autonomous learning in multi-domains through the BiLSTM network in the international feature extraction part. Finally, it is validated by the improved MIT-BIH arrhythmia database. Experimental results show that the proposed classification technique improves the accuracy of arrhythmia diagnosis by 99.4%, and the algorithm has high recognition performance and clinical value.
In a study by [14], the researchers utilized 3D Discrete Wavelet Transform (DWT) and Support Vector Machine (SVM) techniques to analyze and characterize Electrocardiogram (ECG) signals. The ECG data in the arrhythmia database was raw, resulting in noise mixed with the original signals. Dhyani et al. implemented several preprocessing steps to enhance the clarity of the ECG signals and improve accuracy for subsequent experiments. This preprocessing includes denoising the heartbeats, detecting R-waves, and segmenting the heartbeats. The SVM classifier was employed to categorize the ECG signals based on the nine types of heartbeats identified by various classifiers. Approximately 6,400 ECG beats from the CPSC 2018 arrhythmia dataset were analyzed in this study. The highest precision was achieved using level 4 coarse constants with the Symlet-8 (Sym8) channel for regularization. When applying the SVM classifier to the CPSC 2018 dataset, a typical accuracy of 99.02% was demonstrated.
A deep learning model [15] was proposed to detect and classify QRS complexes in single-channel Holter ECG. It uses 12,111 Holter ECG recordings of 30 seconds in length to train, validate, and test this method. The presented model comprises five ResNet blocks and a gated recurrent unit layer. The model output is a 4-channel probability vector (no QRS, normal QRS, premature ventricular contraction, premature atrial contraction) of 30 seconds long. The proposed method achieved an F1 score of 0.99 on the specific test set.
The study [16] evaluated cardiac cycle morphological and temporal features using 1D-CNN layers. The approach uses the MIT-BIH arrhythmia dataset for model training and testing. The model’s generalization ability is assessed on the INCART dataset. For five-class classification, it achieves 99.13% and 99.17% average accuracies for 2-fold and 5-fold cross-validation, respectively. For patient classification with three and five classes, the model achieves 98.73% and 97.91% average accuracies, respectively. In the INCART dataset, the model offers 98.20% average accuracy for three courses.
Recent studies have focused on wearable devices for continuously monitoring heart rates and rhythms through single-channel electrocardiograms (ECGs). These devices have the potential to detect arrhythmias promptly. A study by [17] utilized a convolutional neural network (CNN) to classify various arrhythmias without requiring a QRS wave detection step. Research was performed on the PhysioNet database demonstrated that the CNN model can effectively distinguish between Normal Sinus Rhythm (NSR) and several arrhythmias, including Atrial Fibrillation (AFIB), Atrial Flutter (AFL), Wolff-Parkinson-White Syndrome (WPW), Ventricular Fibrillation (VF), Ventricular Tachycardia (VT), Ventricular Flutter (VFL), Mobitz II AV Block (MII), and Sinus Bradycardia (SB). This study reported an average classification accuracy of 97.31% for 10-second and 5-second ECG segments.
In this study, MobileNetv2 transfer learning architecture with lightweight deep neural networks is used for arrhythmia detection. However, it is important to achieve stable results in pediatric arrhythmias, which have a complex and difficult detection process. In this regard, in order to obtain good generalization, fine tuning is required on both optimization and regularization methods. Thus, over-fitting and under-fitting problems are effectively solved. In this article, optimization and regularization approaches were proposed. As a result of the experiments performed, the accuracy rates of 0.9801 and 0.9509 were obtained on the training and test datasets, respectively, by using the Proposed Opt. Algorithm V5 and Proposed Reg. Method V5 approaches together. Integrating these developed hyperparameters into various deep learning-based architectures consisting of different components offers the potential for generalization regarding the target.
Methodology
Artificial intelligence (AI) is a rapidly integrated framework into the medical field, especially in the cardiovascular field. AI approaches have proven their applicability in detecting, diagnosing, and managing various cardiovascular diseases, improving disease classification and typing [6]. Furthermore, AI’s capacity to process large amounts of patient data and identify disease patterns distinguishes them from traditional time- and resource-intensive methods. AI models have shown tremendous potential in case detection (including asymptomatic/hidden disease), genotype, and phenotype identification. In addition, they have demonstrated applicability for general population screening, improving case detection in several usually asymptomatic conditions, such as left ventricular dysfunction [18].
On the other hand, machine learning (ML) and deep learning (DL) algorithms have the power to learn ECG signal denoising and quality assessment automatically. Applying artificial intelligence methods in arrhythmia detection will significantly reduce the pressure on physicians to analyze ECGs [19].
In this study created a original dataset for arrhythmia detection in pediatric patients. Experiments for the AI-based detection system were performed with Intel(R) Core(TM) i7-9750 H CPU @ 2.60 GHz, 32 GB RAM and 6 GB NVIDIA Geforce GTX 1660 Ti Graphics Card. All details about this study are explained below in order.
Dataset
Arrhythmia is an irregularity in the heart rhythm. However, this irregularity in the heart rhythm is not always associated with arrhythmia. The anatomical and physiological changes of childhood are especially different from the normal adult pattern [20]. Therefore, the recognition and treatment of rhythm dysfunction seen in pediatric patients are difficult and complex for physicians. An artificial intelligence-based arrhythmia detection system is an important requirement to develop a solution to this complex situation. This requirement providing faster and more accurate diagnosis for pediatric rhythm disorders is a critical factor in reducing mortality. However, the lack of a public dataset is a big problem in the development process. Therefore, in the study, a custom dataset was created by utilizing the ECG records of the Holter device with the Biomedical Shenzhen China brand used for 24-hour cardiac monitoring in the Sakarya University Faculty of Medicine, Pediatric Cardiology Clinic. This dataset, shared publicly at https://github.com/fatmaakalin/Normal-and-Abnormal-ECG-beats-dataset, was obtained from Sakarya University Faculty of Medicine Health Sciences Scientific Research Ethics Committee with the approval of the ethics committee numbered E-43012747-050.04–399547-22. Also ethical approval document was obtained from patients. This dataset, consisting of 1318 abnormal beats and 1403 normal beats, has the feature of being balanced between classes. This situation has an important function in the generalizability of the results. Additionally, the images are presented in a grayscale format in grayscale space. The depiction of the dataset is given in Fig. 1.
[See PDF for image]
Fig. 1
Depiction of normal and abnormal beats in the custom dataset
Channel 1 was used within the scope of the 3-channel ECG results obtained from the Holter device. Channel 1 is a time-dependent graph of the heart’s electrical activity. It presents a regularly repeating cycle of heartbeats. However, the beats given in Fig. 1 were obtained by cutting the normal and abnormal beats, whose coordinates were determined by the pediatric cardiology associate professor from channel 1 using the labelImg program by the OpenCV library. To create an artificial intelligence model, 70% and 30% of the anonymous dataset obtained using pediatric patients were separated as training and testing sets, respectively.
Deep learning-based MobileNetv2 transfer learning architecture
In recent years, deep learning has become a discipline that has been considered the gold standard. This discipline, which produces extraordinary results in complex tasks, has the power to create multi-layered learning models. This power gives it exceptional performance in visual data processing applications. The most common and popular approach among DL networks is the CNN architecture. It extracts features from visual data without human supervision. CNN architecture consists of multiple components [20].
The most important component of the CNN architecture is the convolution layer. The convolution operation performed by these layers lightens the computational load. In addition, local correlations are discovered with local features extracted from the inputs. The pooling layer reduces the large-sized feature maps formed as a result of the convolution process. This functionality, which ensures the preservation of dominant information, is important to reduce the computational cost. On the other hand, fully connected layers are used to integrate the extracted features into the classification process. The output of fully connected layers is the final CNN output. In CNN architecture, activation functions are applied after fully connected layers and convolution layers. These functions provide non-linear performance for the CNN architecture to learn complex information. At the end of the learning process, the loss function is used to optimize the error. The difference between the true value and the predicted value is important for measuring the model’s erroneous predictions. Because it forms the basis for the backpropagation algorithm that updates the weights. However, the selection of the optimization algorithm and the enhancement used together with the optimization algorithm are two main problems in the learning process. Unless a successful selection is made, overfitting in CNN models becomes an obstacle to producing generalizable outputs. Because the optimization algorithm ensures that the model learns correctly for successful training. It offers a potential to minimize the loss function. On the other hand, regularization methods prevent the model from becoming overly complex and its improve performance. It regulates the loss function. thus controlling the size of the weights. Therefore, loss functions and regularization methods were developed in this study [20].
Normally, deep CNN models require large data volumes to achieve maximum performance. However, for small datasets, this problem is effectively solved using transfer learning technique. Because the transfer learning mechanism is CNN models trained with large data volumes. It is a suitable approach to reach the result by fine-tuning this mechanism for a small data set. Therefore, the developed approaches are integrated into the MobileNetv2 transfer learning architecture [20].
The MobileNetv2 architecture was run for 100 iterations with the RMSprop optimizer using Tensorflow. The mathematical expression of the RMSProp optimization algorithm is given in Eqs. 1–3 [21].
1
2
3
The coefficient β in Eq. 1 is the momentum. is the square of the gradients. Vt is the moving average.The moving average information obtained through Eq. 1 is used in Eq. 2, which is calculated to update the weights. In Eq. 2, is the current weight value. η is the fixed learning rate. ϵ is a constant determined as 1 × 10−10. is the gradient of the weights. The output obtained in Eq. 2 is used in Eq. 3. In addition, η in Eq. 3 is the learning rate.
Optimization of classification performance
Since 1989, many changes have been made to the CNN architecture. Structural reformulations, regularization and parameter optimization are some of these changes. In this direction, in the presented study, 6 different optimization algorithms based on the rmsprop adaptive optimization algorithm and 6 different regularization methods based on ElasticNet have been developed. The developed approaches are explained below.
Proposed optimization algorithm V1
Proposed Optimization Algorithm V1 is based on Rmsprop. Compared to the Rmsprop optimization algorithm, the modified part is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 4.
4
The h function in Eq. 4 is the homogeneity value calculated to monitor the data distribution. Homogeneity is a criterion used to calculate the similarity between data. Its mathematical expression is given in Eq. 5 [22].
5
The low difference between the gradients given as input to Eq. 5 indicates high homogeneity.
Proposed optimization algorithm V2
The proposed Optimization Algorithm V2 is based on Rmsprop. Compared to the Rmsprop optimization algorithm; the modified part is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 6.
6
The s function given in Eq. 6 is the parameter value that measures the smoothness in the data situation. It measures the degree of change of the input. The mathematical expression is given in Eq. 7 [22].
7
The low difference between the gradients given as input to Eq. 7 indicates high smoothness.
Proposed optimization algorithm V3
The proposed Optimization Algorithm V3 is based on Rmsprop. Compared to the Rmsprop optimization algorithm, the modified part is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 8.
8
The v function given in Eq. 8 expresses the degree of deviation of the input from the mean. Its mathematical expression is shown in Eq. 9 [22].
9
The low gradients given as input to Eq. 9 indicate that gradient values are close to the average.
Proposed optimization algorithm V4
The proposed Optimization Algorithm V4 is based on Rmsprop. Compared to the Rmsprop optimization algorithm, the modified part is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 10.
10
The c1 function described in Eq. 10 aims to balance smoothness, homogeneity, and variance. Its mathematical expression is provided in Eq. 11.
11
The low variance within the gradients given as input to Eq. 11 indicates that smoothness and homogeneity are effective. In the opposite case, the effect of smoothness and homogeneity parameters will decrease. This proposed combination is useful for evaluating the general structure of the image.
Proposed optimization algorithm V5
Proposed Optimization Algorithm V5 is based on Rmsprop. Compared to the Rmsprop optimization algorithm, the modified part is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 12.
12
The c2 function given in Eq. 12 aims to evaluate the energy factor as well as the smoothness and homogeneity parameters balanced by variance. Its mathematical expression is given in Eq. 13.
13
The mathematical expression of the energy parameter given in Eq. 13 is given in Eq. 14.
14
The energy parameter, which measures the intensity of the image, allows the overall intensity of the image to be measured. The low variance in the final output ensures that the smoothness and homogeneity values are effective. The high variance ensures that the energy value is effective [22].
Proposed optimization algorithm V6
Proposed Optimization Algorithm V6 is based on Rmsprop. The modified part compared to the Rmsprop optimization algorithm is the rearrangement of Eq. 2. The mathematical expression of the updated part is given in Eq. 15.
15
The c3 function given in Eq. 15 aims to evaluate the root mean square (rms) value as well as the smoothness, homogeneity and energy factor parameters balanced by variance. Its mathematical expression is given in Eq. 16.
16
The mathematical expression of the rms parameter is given in Eqs. 16 and 17.
17
The RMS parameter, which performs general size analysis, was used in the denominator to provide normalization [22].
Optimization algorithms are used to minimize the loss function of a model. It minimizes the loss function by updating the weights of the model so that it performs best on the training data. Another hyperparameter that works to minimize the loss function is regularization methods. It is used to prevent the model from over-learning. Because it is important to provide generalization ability on the test data set as well as the training data set [20].
New regularization methods have been integrated into the optimization algorithm that produces the most efficient output within the scope of Proposed Optimization Algorithm V5. It is planned to strengthen the optimization with the penalty term added by the regularization method. These developed regularization methods are based on elasticnet. The mathematical expression for the ElasticNet regularization method combines the advantages of both Ridge and Lasso regularization methods, as shown in Eq. 18 [23].
18
The λ parameter in Eq. 18 governs the extent of coefficient reduction. The β parameter indicates the influence of the independent variables on the target variable. On the other hand, the ρ parameter determines the balance between Ridge and Lasso regularization techniques [23].
Six different regularization methods proposed based on ElasticNet are presented below.
Proposed regularization method V1
Proposed Regularization Method V2 is based on the ElasticNet technique. A softmax function is applied to the output generated by the dense layer, where the input data is provided. The mathematical expression of the softmax function is given in Eq. 19. The data ranged into a certain range by the softmax function is multiplied by the erf error function as formulated in Eq. 20. The mathematical expression of the erf function is given in Eq. 21. The erf function used in error analyses is used in Eq. 22 to calculate the average of all elements. The obtained result is calculated in Eq. 23 as a parameter controlling extent of coefficient reduction. This parameter is used in the mathematical equation given in Eq. 24 for the calculation of the proposed regularization method.
19
20
21
22
23
24
It is planned that the integration of the erf function used in modeling the errors into Eq. 24 will produce a successful output in analyzing the classification errors.
Proposed regularization method V2
Proposed Regularization Method V2 is based on the ElasticNet technique. A softmax function is applied to the output generated by the dense layer, where the input data is provided. The mathematical expression for the softmax function is presented in Eq. 25. The data is normalized to a specific range using the softmax function, as described in Eq. 26. Following this, the average is calculated in Eq. 27, and the variance is calculated in Eq. 28. This variance output is utilized in Eq. 30 as a parameter that controls the degree of coefficient reduction in Eq. 29. Finally, the regularization function is computed using Eq. 31
25
26
27
28
29
30
31
The variance value, which measures the extent to which values in a dataset deviate from the mean, is a crucial parameter for analyzing data distribution. It is intended that this value, integrated into Eq. 31 [22], will yield successful outcomes in modeling.
Proposed regularization method V3
Proposed Regularization Method V3 is based on the ElasticNet technique. A softmax function is applied to the output generated by the dense layer, where the input data is provided. The mathematical expression for the softmax function can be given in Eq. 32. The data ranged into a certain range by means of the softmax function are normalized as formulated in Eq. 33. In Eq. 34, mathematical equations that give reactions according to positive and negative values are used. In this direction, in Eq. 35, the erf function given in Eq. 36 is used. The Erf function is a Gaussian error function with a lower bound [24]. It can be used as a non-monotonic, continuous and trainable adaptive activation function. It has the power to improve gradient flow for negative inputs [25]. On the other hand, it produces small gradients for large negative values and causes slow convergence. The slow convergence condition also causes the problem of vanishing gradients. These drawbacks limit the effectiveness of deep neural networks [25]. However, there are successful studies where the erf function is combined with activation functions to provide a regularization on the negative side in order to produce new trainable activation functions [24, 26]. In these studies, the erf function provides the network with nonlinearity, shows a smooth function feature as it is differentiable at every point and offers the potential to reach saturation for large input values [24]. These explanations indicate that the erf function provides a slower computational process. For this reason, it is not preferred especially in situations where speed is sensitive. However, diagnostic accuracy is more important in medical studies than speed parameters. In this direction, for the purpose of smooth transition, control of negative values and stability control, the erf function in Eq. 36 was used in the customization of the regularization function. Then an adaptive learning process is performed in Eq. 37 and its output was used in Eq. 38. In Eq. 38 is controlled degree of coefficient reduction. Then, the regularization function is calculated by Eq. 39.
32
33
34
35
36
37
38
39
This regularization method, which has the potential to produce different responses according to the positive and negative values of the input, keeps the effectiveness of the input under control and provides balanced learning.
Proposed regularization method V4
Proposed Regularization Method V4 is based on ElasticNet. Softmax is applied to the output generated by the dense layer, where the input data is presented. The mathematical expression for the softmax function is provided in Eq. 40. The data ranged into a certain range by means of the softmax function are normalized as formulated in Eq. 41. The average value, calculated in Eq. 42, is then utilized in Eq. 43 to find the variance. The error function, calculated in Eq. 44, is used in the result calculation step in Eq. 45. This result is necessary for Proposed Regularization Method V4, as indicated in Eq. 46. Because the output of Eq. 46 is a parameter controlling the degree of coefficient reduction in Eq. 47. This parameter is used in the calculation of the regularization function through Eq. 48.
40
41
42
43
44
45
46
47
48
This regularization method, in which the variance and error functions are used together, enables the development of a model that is resistant to extremes. It supports stable learning.
Proposed regularization method V5
Proposed Regularization Method V5 is based on ElasticNet. The softmax function is applied to the output generated by the dense layer, where the input data is presented. The mathematical expression for the softmax function is provided in Eq. 49. The data is normalized within a specific range using the softmax function, as formulated in Eq. 50. The average value calculated in Eq. 51 is then utilized in Eq. 52 to compute the variance. The error function, calculated in Eq. 53, is used in the result calculation step in Eq. 54. This result is essential for Proposed Regularization Method V5 as described in Eq. 55. The output of Eq. 55 serves as a parameter that controls the degree of coefficient reduction in Eq. 56. This parameter is used to calculate the regularization function via Eq. 59. Normally, Eq. 59, which produces outputs related to the proposed regularization method, provides to balance the contributions of the ridge and lasso algorithms with the ρ parameter set at a rate of 0.5. However, Proposed Regularization Method V5 aims to calculate the ρ parameter adaptively. In this context, the entropy value calculated in Eq. 57 is assigned to the ρ parameter through Eq. 58.
49
50
51
52
53
54
55
56
57
58
59
The goal is to manage uncertainties by integrating the entropy value into this regularization method [22]. This situation is explained by the fact that the parameter ρ is small at high entropy values, which is characterized by high uncertainty. In the opposite case, the parameter ρ is large.
Results and discussion
Deep learning offers an approach based on multiple layers of artificial neurons [26]. The first layers of the deep learning-based models are structured to extract low-level features, and the last layers are structured to extract high-level features [21]. Its hierarchical structure provides great success in learning complex features. This success is explained by the fact that the learned feature representation is not limited by the human imagination [26].
Deep learning demonstrates superior performance compared to its alternatives in the field of image classification. Convolutional neural networks, a subset of the deep learning approach, are original architectures developed for images [26]. However, to obtain good performance from these architectures, large amounts of data are required. Lack of data causes underfitting and overfitting. There are three different methods to tolerate this situation. These methods are data augmentation, use of simulated data and transfer learning [21].
Transfer learning is the process of training CNN models on a large amount of data and then re-using them by fine-tuning them on a smaller dataset. This enables previously learned features to be used in another task. Learning does not start from scratch and accelerates the learning process. It has the potential to support the generalization and convergence process. However, modifications made to the components of transfer learning architectures consisting of various components are a critical factor in improving the performance of applications. In this direction, various CNN architectures have been proposed. However, the choice of architecture by the implementer depends on suitability for the target task [21]. Therefore, MobileNetv2 transfer learning architecture was used in this study.
MobileNetv2 is a simple transfer learning architecture with lightweight deep neural networks. It provides an efficient network architecture for building low-latency models. It is suitable for resource constraints. It focuses on optimizing latency. It has a structure that can be easily adapted to the design requirements of mobile applications and embedded visual applications. It has also produced efficient results in various tasks [27]. Therefore, since the arrhythmia detection model developed to support doctors has the potential to evolve into a product-oriented output, it is important that training is carried out with the MobileNetv2 transfer learning architecture. In this context, the results obtained within the scope of the training and test dataset for the default rmsprop optimization algorithm and six different improved optimization algorithms based on the rmsprop optimization algorithm are presented in Tables 1 and 2.
Table 1. Performance results of the customized optimization algorithms on the training dataset
Train Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Orijinal RmsProp Algorithm | 862 | 61 | 9 | 974 | 0.9633 | 0.9897 | 0.9411 | 0.9339 | 0.9610 |
Proposed Opt. Algorithm V1 | 898 | 25 | 19 | 964 | 0.9769 | 0.9793 | 0.9747 | 0.9729 | 0.9761 |
Proposed Opt. Algorithm V2 | 923 | 0 | 983 | 0 | 0.4843 | 0.4843 | 0 | 1.0000 | 0.6525 |
Proposed Opt. Algorithm V3 | 923 | 0 | 983 | 0 | 0.4843 | 0.4843 | 0 | 1.0000 | 0.6525 |
Proposed Opt. Algorithm V4 | 896 | 27 | 10 | 973 | 0.9806 | 0.9890 | 0.9730 | 0.9707 | 0.9798 |
Proposed Opt. Algorithm V5 | 896 | 27 | 10 | 973 | 0.9806 | 0.9890 | 0.9730 | 0.9707 | 0.9798 |
Proposed Opt. Algorithm V6 | 884 | 39 | 11 | 972 | 0.9738 | 0.9877 | 0.9614 | 0.9577 | 0.9725 |
Table 2. Performance results of the customized optimization algorithms on the test dataset
Test Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Orijinal RmsProp Algorithm | 359 | 36 | 13 | 407 | 0.9399 | 0.9651 | 0.9187 | 0.9089 | 0.9361 |
Proposed Opt. Algorithm V1 | 372 | 23 | 20 | 400 | 0.9472 | 0.9490 | 0.9456 | 0.9418 | 0.9454 |
Proposed Opt. Algorithm V2 | 395 | 0 | 420 | 0 | 0.4847 | 0.4847 | 0 | 1.0000 | 0.6529 |
Proposed Opt. Algorithm V3 | 395 | 0 | 420 | 0 | 0.4847 | 0.4847 | 0 | 1.0000 | 0.6529 |
Proposed Opt. Algorithm V4 | 370 | 25 | 18 | 402 | 0.9472 | 0.9536 | 0.9415 | 0.9367 | 0.9451 |
Proposed Opt. Algorithm V5 | 371 | 24 | 19 | 401 | 0.9472 | 0.9513 | 0.9435 | 0.9392 | 0.9452 |
Proposed Opt. Algorithm V6 | 364 | 31 | 13 | 407 | 0.9460 | 0.9655 | 0.9292 | 0.9215 | 0.9430 |
The outputs given in Tables 1 and 2 were evaluated based on Accuracy, Sensitivity, Specificity, Precision and F1 Score criteria. These metrics are used in two main stages as training and testing to interpret deep learning models in classification tasks [20, 27]. In the training phase, the classification algorithm is trained on the train dataset. The performance of the classifier is then measured with this dataset. Since the performance of the trained model in this section is measured on the dataset on which it was trained, it offers a limited opportunity to calculate the effectiveness of the model. It is necessary to measure the success of the process by using data that the model has never seen. Therefore, the success of the trained model is measured on the test dataset [20].
The closeness of the results regarding the performance metrics obtained through the training and test dataset indicates that the model avoids overfitting. At the same time, a generalizable model definition is made. In this context, a comparison was made between the training and test datasets within the scope of accuracy, which is the ratio of the number of correctly classified examples to the total number of examples; sensitivity, which calculates the ratio of successfully classified positive patterns; specificity, which calculates the ratio of successfully classified negative patterns; precision, which calculates the ratio of correctly predicted positive examples among the examples in the positive class; and F criterion, which calculates the harmonic mean between sensitivity and precision [20].
Proposed Optimization Algorithm 5 has shown maximum accuracy rate on the test dataset. In addition, a balanced evaluation was made between the 5 basic performance metrics calculated. This shows that the model produces close estimates in separating negative and positive classes. Since the model does not focus on a single class type, the outputs for all metrics are consistent. In particular, the F1 metric is a critical evaluation output that shows that the model does not focus on a single class type. On the other hand, it is critical to evaluate the stability of the model. The model, which comes to a generalizable format with regularization functions, produces consistent results on different data sets [23]. For this reason, 5 different regularization methods were applied to the Proposed Optimization Algorithm V5 algorithm, which showed maximum success. The evaluation criteria for the applied regularization method are given in Tables 3 and 4.
Table 3. Performance results of the customized optimization and regularization methods on the training dataset
Train Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Proposed Opt. Algorithm V5 + Proposed Reg. Method V1 | 903 | 20 | 22 | 961 | 0.9780 | 0.9762 | 0.9796 | 0.9783 | 0.9773 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V2 | 882 | 41 | 3 | 980 | 0.9769 | 0.9966 | 0.9598 | 0.9556 | 0.9757 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V3 | 891 | 32 | 7 | 976 | 0.9795 | 0.9922 | 0.9683 | 0.9653 | 0.9786 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V4 | 894 | 29 | 9 | 974 | 0.9801 | 0.9900 | 0.9711 | 0.9686 | 0.9792 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V5 | 893 | 30 | 8 | 975 | 0.9801 | 0.9911 | 0.9701 | 0.9675 | 0.9792 |
Table 4. Performance results of the customized optimization and regularization methods on the test dataset
Test Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Proposed Opt. Algorithm V5 + Proposed Reg. Method V1 | 375 | 20 | 23 | 397 | 0.9472 | 0.9422 | 0.9520 | 0.9494 | 0.9458 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V2 | 361 | 34 | 12 | 408 | 0.9436 | 0.9678 | 0.9231 | 0.9139 | 0.9401 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V3 | 368 | 27 | 15 | 405 | 0.9485 | 0.9608 | 0.9375 | 0.9316 | 0.9460 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V4 | 369 | 26 | 17 | 403 | 0.9472 | 0.9560 | 0.9394 | 0.9342 | 0.9449 |
Proposed Opt. Algorithm V5 + Proposed Reg. Method V5 | 368 | 27 | 13 | 407 | 0.9509 | 0.9659 | 0.9378 | 0.9316 | 0.9485 |
When Tables 3 and 4 are examined, the maximum result is obtained with the proposed regularization method 5 integrated into the proposed optimization algorithm 5. Using these two approaches together provides support for resolving situations of overfitting and underfitting. Because optimization increases the model performance by minimizing the loss function of the model. Regularization controls the model complexity to increase the generalization ability of the model. Strategic and balanced use of optimization and regularization methods with the suggested approaches ensures effective outputs. The graphs given in Figs. 2–5 confirm this situation.
The graphs given in Figs. 2 and 3 are the accuracy and loss values achieved during the training of the original rmsprop optimization algorithm. The graphs given in Figs. 4 and 5 show the accuracy and loss achieved during the training of the model in which the Proposed Optimization Algorithm V5 and the Proposed Regularization Method V5 are integrated. The graphs given in Figs. 2 and 3 have made large oscillations compared to the graphs given in Figs. 4–5. A large amplitude of oscillations indicates deviation from the steady state and difficulty in converging to the target. Therefore, using the proposed optimization algorithm V5 and the proposed regularization method V5 together improved the training process. Since excessively large steps could not be taken in the optimization process, a tendency to converge to the minimum point occurred. The output behaved stably when approaching the target value. This shows that the optimization algorithm and regularization method were adjusted successfully.
[See PDF for image]
Fig. 2
Accuracy values of the model trained with the original RMSprop optimization algorithm on the training and test datasets
[See PDF for image]
Fig. 3
Loss values of the model trained with the original RMSprop optimization algorithm on the training and test datasets
[See PDF for image]
Fig. 4
Accuracy values of the model trained with the Proposed Optimization Algorithm V5 and Proposed Regularization Method V5 on the training and test datasets
[See PDF for image]
Fig. 5
Loss values of the model trained with the Proposed Optimization Algorithm V5 and Proposed Regularization Method V5 on the training and test datasets
The maximum output results produced in Tables 3 and 4 for the training and testing dataset were obtained using the customized adaptive rmsprop optimization algorithm v5. The same customization, where the energy factor is integrated, as well as the parameters of smoothness and homogeneity balanced by variance, has also been used for other adaptive optimization algorithms such as Adagrad, Adam, and Nadam. The main goal in all customizations of optimization algorithms is to adaptively change the gradient magnitude. Thus, it was planned to prevent overfitting, stabilize gradients and improve learning. In this direction, customized adagrad, adam and nadam optimization algorithms were applied to the MobileNetv2 transfer learning architecture, where the customized approach was integrated. The classification results are given in Tables 5 and 6.
Table 5. Performance results for the customized other adaptive optimization algorithms in the training dataset
Train Dataset | TP | FP | FN | TN | Acc | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Proposed Opt. Algorithm V5 for rmsprop + Proposed Reg. Method V1 (proposed) | 860 | 63 | 25 | 958 | 0.9801 | 0.9911 | 0.9701 | 0.9675 | 0.9792 |
Proposed Opt. Algorithm V5 for Adagrad + Proposed Reg. Method V2 | 860 | 63 | 25 | 958 | 0.9538 | 0.9718 | 0.9383 | 0.9317 | 0.9513 |
Proposed Opt. Algorithm V5 for Adam + Proposed Reg. Method V2 | 883 | 40 | 4 | 979 | 0.9769 | 0.9955 | 0.9607 | 0.9567 | 0.9757 |
Proposed Opt. Algorithm V5 for Nadam + Proposed Reg. Method V3 | 897 | 26 | 8 | 975 | 0.9822 | 0.9912 | 0.9740 | 0.9718 | 0.9814 |
Table 6. Performance results for the customized other adaptive optimization algorithms in the test dataset
Test Dataset | TP | FP | FN | TN | Acc | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
Proposed Opt. Algorithm V5 for Rmsprop + Proposed Reg. Method V1 (proposed) | 860 | 63 | 25 | 958 | 0.9509 | 0.9659 | 0.9378 | 0.9316 | 0.9485 |
Proposed Opt. Algorithm V5 for Adagrad + Proposed Reg. Method V2 | 860 | 63 | 25 | 958 | 0.9264 | 0.9373 | 0.9167 | 0.9089 | 0.9229 |
Proposed Opt. Algorithm V5 for Adam + Proposed Reg. Method V2 | 883 | 40 | 4 | 979 | 0.9423 | 0.9703 | 0.9191 | 0.9089 | 0.9386 |
Proposed Opt. Algorithm V5 for Nadam + Proposed Reg. Method V3 | 372 | 23 | 18 | 402 | 0.9497 | 0.9538 | 0.9459 | 0.9418 | 0.9478 |
When Tables 5 and 6 are examined, it is seen that the maximum accuracy rate and F criterion parameters in the test dataset are obtained for the “Proposed Opt. Algorithm V5 for Rmsprop and Proposed Reg. Method V1”. This indicates a balanced and strong performance. In addition, the difference in accuracy produced within the training and test dataset is small compared to the Nadam optimization algorithm, which showed high success in the training dataset. This situation shows the existence of a model that is more resistant to overfitting. The experiments performed confirm that the customized rmsprop algorithm is more successful than the existing ones within the scope of adaptive optimization algorithms.
For the MobileNetv2 architecture in which the proposed approach “Proposed Opt. Algorithm V5 for rmsprop and Proposed Reg. Method V1” is integrated, the achieved results on the training and test datasets are 0.9801 and 0.9509, respectively. However, the potential for improving this success within the context of other transfer learning architectures also needs to be investigated. In this direction, customized the optimization algorithm and regularization method have been also integrated into MobileNetv1, VGG19, ResNet50, Xception, DenseNet121, InceptionV3, InceptionResNetV2 transfer learning architectures. Classification results for training and testing dataset are presented in Tables 7 and 8.
Table 7 . Performance results of transfer learning architectures for the customized approach in the training dataset
Train Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
MobileNetv1 | 913 | 10 | 9 | 974 | 0.9900 | 0.9902 | 0.9898 | 0.9892 | 0.9897 |
MobileNetv2 (proposed) | 893 | 30 | 8 | 975 | 0.9801 | 0.9911 | 0.9701 | 0.9675 | 0.9792 |
MobileNetv3 Small | 759 | 164 | 5 | 978 | 0.9113 | 0.9935 | 0.8564 | 0.8223 | 0.8998 |
VGG19 | 851 | 72 | 15 | 968 | 0.9544 | 0.9827 | 0.9308 | 0.9220 | 0.9514 |
ResNet50 | 840 | 83 | 10 | 973 | 0.9512 | 0.9882 | 0.9214 | 0.9101 | 0.9475 |
Xception | 863 | 60 | 13 | 970 | 0.9617 | 0.9852 | 0.9417 | 0.9350 | 0.9594 |
DenseNet121 | 846 | 77 | 3 | 980 | 0.9580 | 0.9965 | 0.9272 | 0.9166 | 0.9549 |
InceptionV3 | 903 | 20 | 7 | 976 | 0.9858 | 0.9923 | 0.9799 | 0.9783 | 0.9853 |
InceptionResNetV2 | 892 | 31 | 3 | 980 | 0.9822 | 0.9966 | 0.9693 | 0.9664 | 0.9813 |
Table 8 . Performance results of transfer learning architectures for the customized approach in the test dataset
Test Dataset | TP | FP | FN | TN | Acc. | Sens. | Spec. | Prec. | F1 Score |
|---|---|---|---|---|---|---|---|---|---|
MobileNetv1 | 371 | 24 | 22 | 398 | 0.9436 | 0.9440 | 0.9431 | 0.9392 | 0.9416 |
MobileNetv2 (proposed) | 368 | 27 | 13 | 407 | 0.9509 | 0.9659 | 0.9378 | 0.9316 | 0.9485 |
MobileNetv3 Small | 318 | 77 | 8 | 412 | 0.8957 | 0.9755 | 0.8425 | 0.8051 | 0.8821 |
VGG19 | 358 | 37 | 19 | 401 | 0.9313 | 0.9496 | 0.9155 | 0.9063 | 0.9275 |
ResNet50 | 316 | 79 | 26 | 394 | 0.8712 | 0.9240 | 0.8330 | 0.8000 | 0.8575 |
Xception | 359 | 36 | 16 | 404 | 0.9362 | 0.9573 | 0.9182 | 0.9089 | 0.9325 |
DenseNet121 | 346 | 49 | 9 | 411 | 0.9288 | 0.9746 | 0.8935 | 0.8759 | 0.9227 |
InceptionV3 | 365 | 30 | 11 | 409 | 0.9497 | 0.9707 | 0.9317 | 0.9241 | 0.9468 |
InceptionResNetV2 | 368 | 27 | 19 | 401 | 0.9436 | 0.9509 | 0.9369 | 0.9316 | 0.9412 |
When Tables 7 and 8 are examined, the maximum accuracy rate and F1 Score achieved in the test dataset were obtained with the MobileNetv2 transfer learning architecture. This situation shows a more balanced prediction success. It is also valuable that the difference between the accuracy values produced by the MobileNetv2 architecture in the training and test dataset is low. It indicates the existence of a strong model for overfitting. It enables the production of more precise and balanced outputs.
The MobileNetv2 transfer learning architecture produced accuracy rates of 0.9801 and 0.9509 in the training and test datasets for 1318 abnormal beats and 1403 normal beats categories, respectively. However, it is an important parameter that this success can be generalized to real-world applications as opposed to a limited sample space. For this purpose, it was aimed to test this situation on a new 6-class dataset derived from the MIT-BIH Arrhythmia Dataset and The PTB Diagnostic ECG Database on the Kaggle platform [28]. In this dataset, 803, 900, 900, 900, 900, and 900 beat images were selected randomly for normal beats (N), supraventricular ectopic beats (S), ventricular beats (V), fusion beats (F), unidentified beats (Q), and other beats (M), respectively. Then each beat image shown in a different color was converted to grayscale space for prevent color-dependent learning, and the dataset was split 70% and 30% for training and testing respectively. Finally, the dataset that was brought to a balanced state between categories was given as input to the Mobilenetv2 architecture, in which the customized optimization and regularization method are integrated. The confusion matrix obtained on the training and test data sets as a result of model training is presented in Fig. 6.
[See PDF for image]
Fig. 6
Confusion matrices obtained as a result of training the customized approach for the complex dataset with 6 categories on the training and test dataset
When Fig. 6 is examined, it is seen that the detection performance for the S and V categories in the training and test data sets is lower than the M and N categories. Performance measures for the 6 categories are presented in Tables 9 and 10.
Table 9 . Obtained performance results for the complex dataset with 6 classes by the customized approach on the training dataset
Training Dataset | Class 0 | Class 1 | Class 2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
M. | Total A. | Pre. | Sens. | F1. | Pre. | Sens. | F1. | Pre. | Sens. | F1. |
M.Net | 0.9200 | 0.9147 | 0.9347 | 0.9246 | 0.9635 | 0.8887 | 0.9246 | 0.9524 | 0.8696 | 0.9091 |
Class 3 | Class 4 | Class 5 | ||||||||
M.Net | 0.9200 | 0.9397 | 0.9785 | 0.9587 | 0.8460 | 0.9335 | 0.8876 | 0.9032 | 0.9282 | 0.9155 |
Table 10 . Obtained performance results for the complex dataset with 6 classes by the customized approach on the test dataset
Test Dataset | Class 0 | Class 1 | Class 2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
M. | Total A. | Pre. | Sens. | F1. | Pre. | Sens. | F1. | Pre. | Sens. | F1. |
M.Net | 0.8975 | 0.8625 | 0.9200 | 0.8903 | 0.9407 | 0.8759 | 0.9071 | 0.9519 | 0.8712 | 0.9097 |
Class 3 | Class 4 | Class 5 | ||||||||
M.Net | 0.8975 | 0.9259 | 0.9690 | 0.9470 | 0.8370 | 0.8968 | 0.8659 | 0.8630 | 0.8630 | 0.8630 |
In Tables 9 and 10, the maximum success achieved by the data set trained with the proposed model within the scope of the training and test data sets are 0.9200 and 0.8975 respectively. These data used to verify generalizability were derived from the MIT-BIH Arrhythmia Dataset and The PTB Diagnostic ECG Databases and were prepared for 6 classes. It is a complex, comprehensive, highly diverse and content-rich dataset. It is normal for the training performed on this dataset to experience a drop of approximately 5% during the same iteration condition for testing dataset compared to the training performed on the datasets with 2 categories. Because it is generally not possible to achieve the same accuracy performance in a data set with more complex and rich content. On the other hand, the distribution of Precision, Sensitivity and F1 Score criteria obtained for each class in a similar range indicates that the distinguishing process between classes is distributed in a balanced format and proves that overfitting and underfitting do not occur. This shows that the powerful components integrated into the model can be used in real-world applications and successful generalizations can be made. The performance of the proposed approach has also been investigated for imbalanced datasets. In this context, 200, 300, 400, 500, 600 and 700 random beat data were selected for the F, M, N, Q, S and V categories, respectively, in the scope of the dataset derived from “the MIT-BIH Arrhythmia Dataset” and “The PTB Diagnostic ECG Database”. Then, the model was trained for 100 iterations under the same conditions. The concretization of the detection performance for each category in the dataset is given in Fig. 7.
[See PDF for image]
Fig. 7
Confusion matrices obtained as a result of training the customized approach for the complex and unbalanced dataset with 6 categories on the training and test dataset
When Fig. 7 is examined, it is seen that the performance for S and V categories is partially low in the scope of the training and test dataset. Despite being an unbalanced dataset, detection performance for other categories is at a satisfactory level. Performance criteria for all categories are given in Tables 11 and 12.
Table 11. Obtained performance results for the unbalanced dataset with 6 classes by the customized approach on the training dataset
Training Dataset | Class 0 | Class 1 | Class 2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
M. | Total A. | Pre. | Sens. | F1. | Pre. | Sens. | F1. | Pre. | Sens. | F1. |
M.Net | 0.9201 | 0.8571 | 0.9023 | 0.8791 | 0.9000 | 0.9220 | 0.9108 | 0.9607 | 0.8152 | 0.8820 |
Class 3 | Class 4 | |||||||||
M.Net | 0.9201 | 0.9486 | 0.9765 | 0.9623 | 0.8738 | 0.9386 | 0.9051 | 0.9429 | 0.9409 | 0.9419 |
Table 12. Obtained performance results for the unbalanced dataset with 6 classes by the customized approach on the testing dataset
Test Dataset | Class 0 | Class 1 | Class 2 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
M. | Total A. | Pre. | Sens. | F1. | Pre. | Sens. | F1. | Pre. | Sens. | F1. |
M.Net | 0.8901 | 0.8500 | 0.8947 | 0.8718 | 0.8222 | 0.9136 | 0.8655 | 0.9333 | 0.8000 | 0.8615 |
Class 3 | Class 4 | Class 5 | ||||||||
M.Net | 0.8901 | 0.9333 | 0.9589 | 0.9459 | 0.8444 | 0.9048 | 0.8736 | 0.9143 | 0.8807 | 0.8972 |
When Tables 11 and 12 are examined, the training performed for an unbalanced and complex dataset experienced a decrease of approximately 6% during the same iteration condition on the test dataset compared to the training performed on datasets with 2 categories. This is normal because the same conditions do not provide a fair distribution for learning patterns in a more complex and unbalanced dataset. At the same time, the production of similar results with Tables 9 and 10 shows that the results related to the imbalance factor added to the complex data set do not significantly affect the final output.
This study aims to facilitate the detection of arrhythmias, which are defined as a difficult and complex process, with the support of artificial intelligence by integrating the proposed optimization algorithm and the proposed regularization method integrated into the MobileNetv2 transfer learning architecture. The potential for faster and more accurate diagnosis in pediatric rhythm disorders has been strengthened with this artificial intelligence-based system. Thus, it is aimed to reduce mortality rates due to early diagnosis.
Since this study was planned within the framework of producing medical decision support systems, the importance coefficient given to the speed criterion is low. In this direction, the calculate_metrics function given in Eq. 15 is called at each update for the custom rmsprop algorithm. A manually constructed accumulated_grad slot is created for each variable. At the same time, additional processing is done with the tensor expansion functionality. On the other hand, softmax, erf function, entropy and variance components are integrated into the elastic net algorithm in the developed custom regularization method. As a result of this configuration, transaction volume increases. Therefore, the custom rmsprop algorithm and custom regularization method are more intensive in terms of running time and memory consumption. Although the algorithm complexity in the developed approaches remains at O(n) level, the training time will increase due to extra operations.
In computer-aided medical studies, accuracy is a parameter that stands out compared to time. Therefore, it does not negatively affect the course of this study. However, since the preferred MobileNetv2 transfer learning is a lightweight architecture, it solves potential bottlenecks in terms of RAM and time cost in certain rate.
Conclusion
Deep learning-based algorithms, which have great potential in classifying ECG data, consist of various components. The modification of these components has an impact on performance, stability and generalizability. Therefore, optimization algorithm and regularization method hyperparameters that directly affect the performance are proposed. These originalities integrated into the MobileNetv2 transfer learning architecture produced more qualified outputs compared to the model trained with default parameters. It is expected that this development, in which deep learning-based model components will work in harmony, will be beneficial in the transition to clinical applications. In the future, tachycardia and bradycardia cases are planned to be included in the current scenario.
Acknowledgements
This study was supported by Scientific and Technological Research Council of Türkiye (TUBITAK) under the Grant Number 124E599 The authors thank to TUBITAK for their supports. Also, we would like to express our gratitude to GenAI for its invaluable support in the following areas: Writing some code parts, solving errors encountered in the code, analyzing mathematical expressions and developing different perspectives.
Authors contributions
F.A.: Data Collection, Conceptualization and Design, Formal Analysis, Methodology, Experimental Work, Visualization, Resources, Supervision, Writing - Original Draft, Writing - Review & Editing. P.D.Ç.: Conceptualization and Design, Data Collection, Analysis and Interpretation, Resources, Writing - Review & Editing. M.F.O.: Investigation, Writing - Original Draft, Writing - Review & Editing
Data availability
In this study, a custom dataset was generated. In addition, the public database named ECG Image Dataset was used at the Kaggle platform in the link https://www.kaggle.com/.
Declarations
Ethics approval and consent to participate
This dataset, shared publicly at https://github.com/fatmaakalin/Normal-and-Abnormal-ECG-beats-dataset, was obtained from Sakarya University Faculty of Medicine Health Sciences Scientific Research Ethics Committee with the approval of the ethics committee numbered E-43012747-050.04-399547-22. Also ethical approval document was obtained from patients.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Bilici, M; Demir, F. Pediatrik disritmiler. Dicle Med J; 2015; 42,
2. Drago, F; Battipaglia, I; Di Mambro, C. Neonatal and pediatric arrhythmias: clinical and electrocardiographic aspects. Card Electrophysiol Clin; 2018; 10,
3. Ansari Y, Mourad O, Qaraqe K, Serpedin E. Deep learning for ECG Arrhythmia detection and classification: an overview of progress for period 2017–2023. Frontiers Media S A. 2023. https://doi.org/10.3389/fphys.2023.1246746.
4. Zhang, K; Aleexenko, V; Jeevaratnam, K. Computational approaches for detection of cardiac rhythm abnormalities: are we there yet?. J Electrocardiol; 2020; 59, pp. 28-34. [DOI: https://dx.doi.org/10.1016/J.JELECTROCARD.2019.12.009]
5. Daydulo YD, Thamineni BL, Dawud AA. Cardiac arrhythmia detection using deep learning approach and time frequency representation of ECG signals. BMC Med Inform Decis Mak. Jan. 2023;23(1):https://doi.org/10.1186/S12911-023-02326-W
6. Shanmugavadivel K, Sathishkumar VE, Kumar MS, Maheshwari V, Prabhu J, Allayear SM. Investigation of Applying Machine Learning and Hyperparameter Tuned Deep Learning Approaches for Arrhythmia Detection in ECG Images. Comput Math Methods Med. 2022;2022. https://doi.org/10.1155/2022/8571970.
7. Li Y, Qian R, Li K. Inter-patient arrhythmia classification with improved deep residual convolutional neural network. Comput Methods Programs Biomed. 2022 Jan;214. https://doi.org/10.1016/J.CMPB.2021.106582.
8. Zahid, MU et al. Robust R-Peak Detection in Low-Quality Holter ECGs Using 1D Convolutional Neural Network. IEEE Trans Biomed Eng; 2022; 69,
9. Jiang M, et al. Visualization deep learning model for automatic arrhythmias classification. Physiol Meas. 2022 Jan;438. https://doi.org/10.1088/1361-6579/AC8469.
10. Wang, R; Fan, J; Li, Y. Deep Multi-Scale Fusion Neural Network for Multi-Class Arrhythmia Detection. IEEE J Biomed Health Inform; 2020; 24,
11. Guan, Y; An, Y; Xu, J; Liu, N; Wang, J. HA-ResNet: residual Neural Network With Hidden Attention for ECG Arrhythmia Detection Using Two-Dimensional Signal. IEEE/ACM Trans Comput Biol Bioinform; 2023; 20,
12. Islam, MS et al. HARDC: a novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netwk; 2023; 162, pp. 271-87. [DOI: https://dx.doi.org/10.1016/J.NEUNET.2023.03.004]
13. Ma S, Cui J, Xiao W, Liu L. Deep learning-based data augmentation and model fusion for automatic arrhythmia identification and classification algorithms. Comput Intell Neurosci. 2022;2022. https://doi.org/10.1155/2022/1577778.
14. Dhyani S, Kumar A, Choudhury S. Analysis of ECG-based arrhythmia detection system using machine learning. MethodsX. 2023 Jan;10. https://doi.org/10.1016/J.MEX.2023.102195.
15. Ivora A, et al. QRS detection and classification in Holter ECG data in one inference step. Sci Rep. 2022 Jan;121. https://doi.org/10.1038/S41598-022-16517-4.
16. Berrahou N, El Alami A, Mesbah A, El Alami R, Berrahou A. Arrhythmia detection in inter-patient ECG signals using entropy rate features and RR intervals with CNN architecture. Comput Methods Biomech Biomed Engin. 2024. https://doi.org/10.1080/10255842.2024.2378105.
17. Liu LR, et al. An Arrhythmia classification approach via deep learning using single-lead ECG without QRS wave detection. Heliyon. 2024 Jan;105. https://doi.org/10.1016/J.HELIYON.2024.E27200.
18. Sigfstead S, Jiang R, Avram R, Davies B, Krahn AD, Cheung CC. Applying Artificial Intelligence for Phenotyping of Inherited Arrhythmia Syndromes. Can J Cardiol. 2024. https://doi.org/10.1016/J.CJCA.2024.04.014.
19. Liu, J et al. A review of arrhythmia detection based on electrocardiogram with artificial intelligence. Expert Rev Med Devices; 2022; 19,
20. Alzubaidi L, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021 Dec;81. https://doi.org/10.1186/s40537-021-00444-8.
21. Sun, RY. Optimization for Deep Learning: an Overview. J Oper Res Soc China; 2020; 8,
22. Sonti K, Dhuli R. Pattern Based Glaucoma Classification Approach using Statistical Texture Features. in 2022 2nd International Conference on Artificial Intelligence and Signal Processing, AISP 2022, Institute of Electrical and Electronics Engineers Inc., 2022. https://doi.org/10.1109/AISP53593.2022.9760664.
23. Chukwura J, Chinenye Jecinta I. A review of techniques for regularization. 2023. [Online]. Available: http://www.ijres.org
24. Rajanand A, Singh P. ErfReLU: adaptive activation function for deep neural network. Pattern Anal Appl. Jun 2024;27(2):https://doi.org/10.1007/s10044-024-01277-w
25. Ullah A, Imran M, Basit MA, Tahir M, Younis J. AHerfReLU: a novel adaptive activation function enhancing deep neural network performance. Complexity. Jan 2025;2025(1):https://doi.org/10.1155/cplx/8233876
26. Oldham K, Myland J, Spanier J. An Atlas of Functions, Second Edition. Oldham K Ed. ch. Springer.
27. Talaei Khoei T, Ould Slimane H, Kaabouch N. Deep learning: systematic review, models, challenges, and research directions. Springer Science and Business Media Deutschland GmbH. 2023. https://doi.org/10.1007/s00521-023-08957-4. Nov. 01.
28. ECG Image Dataset - MIT-BIH and PTB Image database. https://www.kaggle.com/
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.