Content area
In the face of the mounting challenges posed by cybersecurity threats, there is an imperative for the development of robust identity authentication systems to safeguard sensitive user data. Conventional biometric authentication methods, such as fingerprinting and facial recognition, are vulnerable to spoofing attacks. In contrast, electrocardiogram (ECG) signals offer distinct advantages as dynamic, “liveness”‐assured biomarkers, exhibiting individual specificity. This study proposes a novel fusion network model, the convolutional neural network (CNN)‐transformer fusion network (CTFN), to achieve high‐precision ECG‐based identity authentication by synergizing local feature extraction and global signal correlation analysis. The proposed framework integrates a multistage enhanced CNN to capture fine‐grained local patterns in ECG morphology and a transformer encoder to model long‐range dependencies in heartbeat sequences. An adaptive weighting mechanism dynamically optimizes the contributions of both modules during feature fusion. The efficacy of CTFN was evaluated in three critical real‐world scenarios: single/multi‐heartbeat authentication, cross‐temporal consistency, and emotional variability resistance. The evaluation was conducted on 283 subjects from four public ECG databases: CYBHi, PTB, ECG‐ID, and MIT‐BIH. The CYBHi dataset revealed that CTFN exhibited a state‐of‐the‐art recognition accuracy of 98.46%, 80.95%, and 90.76%, respectively, signifying its remarkable performance. Notably, the model attained a 100% authentication accuracy rate using only six heartbeats. This represents a 25% decrease in input requirements when compared to prior works, while concurrently maintaining its robust performance against physiological variations induced by emotional states or temporal gaps. These results demonstrate that CTFN significantly advances the practicality of ECG biometrics by balancing high accuracy with minimal data acquisition demands, offering a scalable and spoof‐resistant solution for secure authentication systems.
1. Introduction
Advancements in generative artificial intelligence have led to significant concerns regarding the security of static biometric features, such as fingerprints, faces, and irises, which are susceptible to forgery [1, 2]. Physiological signals, such as electrocardiogram (ECG) and electroencephalogram (EEG), have found extensive application in the domains of biometrics and health monitoring. This can be attributed to the distinctive characteristics and dynamic properties inherent in these signals. For instance, Bin Heyat et al. have demonstrated the capability of automatic detection of teeth grinding through the analysis of a single-channel EEG signal [3–5]. This finding underscores the promise of physiological signals in pattern recognition applications. Dynamic biosignals, like the ECG, possess inherent resistance to spoofing due to their ability to encode unique physiological patterns associated with cardiac activity [6]. The heart generates ECG signals through coordinated electrical impulses originating from the sinoatrial node, propagating across myocardial tissue to regulate systole and diastole. These signals reflect the depolarization and repolarization of cardiac cells, producing waveform features such as the P wave, QRS complex, and T wave that vary in amplitude, duration, and morphology across individuals. The ECG’s uniqueness stems from the interplay of anatomical factors, such as heart position and tissue conductivity, and dynamic physiological states, including heart rate variability and emotional arousal. This multifaceted nature renders ECG replication difficult without live acquisition, making it a robust authentication method. Unlike static biometrics, ECG-based authentication relies on real-time signal capture during specific activities, such as deep breathing, to verify user vitality and deter spoofing attacks [7]. These biological and functional advantages position ECG as a transformative solution for secure identity verification in an era of evolving cyber threats.
In recent years, there has been considerable research on identity authentication algorithms based on ECG signals. Although the accuracy of identity authentication algorithms based on ECG signals has approached 100% through model optimization, this authentication method still faces numerous challenges in practical applications [8]. Specifically, the instability of feature extraction from a single modality is a prominent issue. To address this problem, Agrawal [9] constructed a convolutional neural network (CNN)-long short-term memory (LSTM) fusion network for multiple feature extractions. However, the model’s accuracy and stability were only validated using databases with high data quality, such as Physikalisch-Technische Bundesanstalt (PTB) and ECG-ID, without testing the model’s sensitivity to noise. Moreover, these publicly available databases were collected using the same equipment, with volunteers in stable emotional states and only recording signals during specific periods. They lacked data recollection several months later, making it impossible to verify the model’s robustness against temporal variations and emotional influences. Over time, the ECG signals of the same collector may undergo certain changes due to factors such as emotions, medications, or pathologies [10–12]. Therefore, it is particularly necessary to construct a multifeature extraction model and validate its impact on a single heartbeat, time interval, and emotional fluctuation to ensure the effectiveness and robustness of the authentication system.
To address the aforementioned issues, we focus on real-world applications of ECG biometric-based identity authentication technology and propose the multistage enhanced CNN and transformer encoder fusion network (CTFN) model, which is a fusion of a CNN and a transformer encoder, to achieve accurate verification of individual identities across multiple scenarios. The main contributions of the proposed study are as follows:
- 1.
The CTFN model is proposed, which utilizes a multistage enhanced CNN to effectively capture local patterns and fine-grained details from ECG signals and combines it with the transformer encoder, which extracts complex dependencies and contextual information over longer sequences. Together, they synthesize both local and global features of ECG signals, and their superior performance was verified through experiments.
- 2.
CTFN was comprehensively evaluated through four experiments: ablation experiments, single/multiple heartbeat analysis, cross-time stability analysis, and emotion stability analysis. Verification using 3-month interval ECG data from CYBHi demonstrated an accuracy of 80.95%, marking a 3.15% improvement over the state-of-the-art (SOTA) result. In addition, the accuracy of watching high-arousal videos reached 90.76%, approaching the SOTA result achieved with fewer heartbeats (six beats compared to eight beats).
- 3.
We experimentally proved that the CTFN model not only performs exceptionally well on high-quality ECG datasets but also maintains high recognition accuracy on low signal-to-noise ratio datasets, such as the CYBHi database collected using off-the-person methods. Furthermore, high-quality data, such as those from databases collected using on-the-person methods such as ECG-ID, requires only a single heartbeat for authentication. This proves the feasibility and reliability of CTFN-based ECG identity authentication in real-world applications.
2. Related Works
As research on ECG-based identity authentication matures, an increasing number of researchers are shifting their focus from sole accuracy to the robustness of the models. Consequently, we categorize studies on ECG-based identity authentication into two types. The first category focuses on evaluating the performance metrics of models, such as authentication accuracy and equal error rate (EER), without verifying the robustness of the model. The second category considers both the robustness of the model and practical factors, such as the number of heartbeats, cross-time variations, and the state of the subject (sitting, walking, running, or emotional fluctuations), which affect the authentication algorithm. In the research on ECG identity authentication algorithms, striking a balance between accuracy and robustness is a noteworthy concern.
2.1. Research Focused on Model Authentication Accuracy
In the early stages, methods focused on manually extracting physiological features from ECG signals for identity authentication. In 2001, Beil et al. [13] introduced ECG signals as a new modality for intelligent biometric recognition. They validated the effectiveness of ECG signals for biometric recognition by extracting 30 time–frequency domain features, such as the amplitude and slope of the QRS complex, from 20 private datasets. From 2005 to 2009, the Israeli SA team began baseline feature extraction and calculated a stable feature set that represented individual information [14], quantifying the minimum number of heartbeat cycles required for identity recognition [15]. In addition, some scholars such as Chan et al. [16], Wang et al. [17], Tantawi et al. [18] designed ECG feature engineering based on morphological features, such as QRS waveform intervals and amplitudes, as well as linear, nonlinear, and various higher order statistical indicators to achieve ECG biometric recognition. Further research by Goshvarpour et al. [19], Yanık et al. [20], Zhang et al. [21], Tan et al. [22] incorporated statistical features and frequency band information obtained from signal processing methods such as wavelet transforms, scaleogram analysis, and matching tracking into feature engineering designs to evaluate the effectiveness of such feature representations in ECG-ID tasks. In 2017, deep learning methods gradually entered the field. From 2018 to 2023, scholars such as Labati et al. [23], Shin et al. [24], and Agrawal et al. [9] conducted research on ECG identity authentication based on standard CNN models, LSTM networks, and their improved models. They constructed shallow CNN models under data augmentation, ECG-ID models of adaptive particle swarm-optimized bidirectional LSTM (BiLSTM), lightweight classification models with MobileNetV2 combined with BiLSTM, and ECG classification models fused with CNN and LSTM, all achieving classification accuracies of over 85%. The aforementioned research focused on optimizing algorithms to ensure that models can accurately identify and verify user identities under ideal conditions without considering the influence of real-world factors.
2.2. Research Focused on Model Robustness
As the accuracy of ECG identity authentication approaches 100%, researchers have shifted their focus to model robustness. In 2020, Belo et al. [25] constructed a time convolutional neural network (TCNN) that inputs segments of ECG signals and their subsets (QRS waves) into one-dimensional (1D) convolutional networks. By training and validating two sets of data from the CYBHi database collected 3 months apart, they achieved an accuracy of 60.3%. In 2021, Ibtehaz et al. [26] developed a novel convolutional network architecture based on multiresolution analysis and experimentally compared the impact of single and multiheartbeats on authentication accuracy. They achieved accuracies of over 96% for a single heartbeat and 100% for multiheartbeats across the ECG-ID, MIT-BIH Arrhythmia, MIT-BIH NSRDB, and PTB databases. In 2022, Fatimah et al. [27] utilized Fourier decomposition to extract relevant features by decomposing them into a set of Fourier eigenband functions. By training on ECG signals collected from the CYBHi database under normal conditions and testing the ECG signals (eight beats) observed during high-arousal video viewing, they achieved the highest accuracy of 91.07% for this scenario. In the past 2 years, Zhang et al. [28] constructed a 1D integrated EfficientNet model and conducted cross-time validation on the CYBHi database, training and testing on two sets of data collected 3 months apart from each test subject, achieving an accuracy of 75.5%. In the same year, Angelis et al. [29] and Yi et al. [30] conducted similar experiments, achieving accuracies of 70% and a maximum accuracy of 77.80% for this scenario. The aforementioned robustness studies focused on enhancing system adaptability to noise, interference, and variations to ensure high levels of accuracy and stability in real-world applications.
3. Materials and Methods
We constructed an algorithmic module integrating heartbeat segmentation, data augmentation, and pattern recognition, and designed an ECG authentication workflow. Robustness studies on ECG signal characteristics have been conducted, including preprocessing and model design. The raw signals undergo filtering, QRS detection segmentation, and data augmentation, followed by feature extraction and fusion using multistage enhanced CNN and transformer encoders, and are then classified using SoftMax. The workflow of the proposed method is illustrated in Figure 1, and the specific model details are shown in Figure 2. This section will delineate the database and methodological approach utilized in this study.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
3.1. Databases
The datasets used in this study were sourced from prominent ECG authentication databases: ECGID [31], MIT-BIH Arrhythmia [32], MIT-BIH NSRDB [33], and CYBHi [34]. The CYBHi database contains two subsets: long and short. The long subset comprises two sets of data collected with a 3-month interval, labeled Now and Later. This dataset was used to verify the impact of ECG data collected at different times on the model. The short subset includes three sets of data collected from different emotional states: normal, low-arousal, and high-arousal video viewing. This dataset was used to verify the effects of emotions on the models. Table 1 summarizes the information in the databases used in this study, including additional information on the data collection methods, participant states, and participant health conditions.
Table 1 Databases.
| Dataset | Abbreviation | Persons | Sampling rate (Hz) | Additional information |
| ECG-ID | ECG-ID | 90 | 500 | (a)(1)(Ⅰ) |
| MIT-BIH arrhythmia | MIT-ARR | 47 | 360 | (b)(2)(Ⅱ) |
| MIT-BIH NSRDB | MIT-NSR | 18 | 128 | (b)(2)(Ⅰ) |
| CYBHi-long | Now | 63 | 1000 | (c)(1)(Ⅰ) |
| Later | ||||
| CYBHi-short | Normal | 65 | 1000 | (c)(3)(Ⅰ) |
| Low | ||||
| High |
3.2. Preprocessing of ECG Signals
The preprocessing consists of three parts: preprocessing, QRS detection and segmentation, and data augmentation. First, the preprocessing involves filtering and normalizing the signal. The ECG signal is passed through a finite impulse response bandpass filter to maintain frequencies between 0.7 and 90 Hz while eliminating frequencies around 50 Hz (powerline noise). Subsequently, QRS waves were detected and segmented using a real-time detection method based on the absolute value of the cumulative differential ECG compared to the adaptive threshold proposed by Christov [35]. Following previous work [36] and considering an average heartbeat length of 0.8 s, the length of each segment was fixed at 0.8 fs, with a sampling point ratio of 0.32:0.48, before and after each R peak, and the first and last R peaks were discarded. Finally, ECG signals undergo enhancement processing to adapt the model to real-world environments. Given that signals exhibit characteristics in terms of noise, fundamental frequency domain, and time domain, data augmentation revolves around these features to simulate variations in real-world situations. This is achieved by adding noise, shifting, and resampling the signals, combined with the original signals, to form an augmented dataset (four times the size of the original dataset). Data augmentation enhances the ability of a model to handle unknown data and improves its robustness and generalization capabilities. The mathematical representation of each process is as follows:
3.3. CTFN
The local feature perception property of a CNN limits its ability to extract global information, while the self-attention mechanism of the transformer excels at capturing global features, albeit with less capability to extract local information compared to a CNN [37]. To overcome these limitations, features are extracted and overlaid through multistage enhanced CNN and transformer encoder modules, jointly focusing on both global and local features. This resulted in the development of the CTFN algorithm. Figure 2 illustrates the detailed architecture of the model.
3.3.1. Multistage Enhanced CNN Module
This module includes an input layer followed by a sequence of convolutional layers with batch normalization and ReLU activation, organized into five blocks to gradually learn higher level abstract features. Each block incorporates a convolutional layer, batch normalization, ReLU activation, and max-pooling, contributing to stable training and improved computational efficiency. The output layer consisted of a flattened layer, followed by a dense layer with batch normalization and ReLU activation. The size of the convolutional kernels increased gradually, allowing the network to learn abstract features at different levels. The structure of the module is presented in Table 2.
Table 2 Network structure of the multistage enhanced CNN module.
| Group | Network structure | Convolution size |
| Input | Conv1d + BN + ReLU | (32, 1) |
| Block 1 | Conv1d + BN + ReLU + MaxPooling | (64, 2) |
| Block 2 | Conv1d + BN + ReLU + MaxPooling | (128, 2) |
| Block 3 | Conv1d + BN + ReLU + MaxPooling | (256, 2) |
| Block 4 | Conv1d + BN + ReLU + MaxPooling | (512, 2) |
| Block 5 | Conv1d + BN + ReLU + MaxPooling | (1024, 2) |
| Output | Flatten + Dense + BN + ReLU | — |
3.3.2. Transformer Encoder Module
This module first applies positional encoding to the input and then utilizes a multihead self-attention layer, followed by feature extraction through a feed-forward neural network layer. Subsequently, it employs residual connections and layer normalization. Finally, it compresses the features through fully connected layers and adds batch normalization, an activation function, and dropout layers. The encoded feature information obtained from the output is fed into the fusion feature classifier module. The multihead self-attention mechanism can simultaneously attend to features at different positions in the ECG sequence, capturing correlations in both the time and frequency domains to better understand the structure of the ECG signal. Furthermore, it can establish long-range dependencies, which are crucial for current registration and verification after a certain time interval. The residual connections add the original signal to the features learned through the multihead self-attention layer and feed-forward neural network, enabling the model not only to acquire new abstract features but also to retain the original signal information. This helps address the vanishing gradient and exploding gradient problems in deep neural network training, thereby improving training stability.
3.3.3. Feature Fusion and Classifier
After passing through the CNN and transformer encoder, the model learns the local feature vectors and global feature vectors of the ECG signal. The feature fusion module adopts a weighted summation approach, and the weight coefficients are incorporated into the training process to achieve self-learning of the weights. Subsequently, the combined features are fed into the SoftMax classifier for final decision-making. The training process of the CTFN algorithm is guided by the cross-entropy loss. In SoftMax discrimination, the following verification rule is followed: If the recognition success rate of the ECG signal segment for the test subject exceeds 95%, the test subject is authenticated; otherwise, the authentication fails.
3.4. Performance Evaluation Metrics
Three metrics were adopted to evaluate the model: accuracy (ACC), F1-score, and equal error rate (EER). ACC represents the average accuracy of the test set, where each individual is considered to pass the verification if 95% or more of their heartbeats are successfully recognized. The EER is the value when the false acceptance rate (FAR) is equal to the false rejection rate (FRR), indicating the performance balance of the system in distinguishing legitimate users from illegitimate users. The mathematical representation of each metric is as follows:
4. Results and Discussion
In this section, we evaluate the authentication performance of CTFN. It utilizes 283 sets of data from four publicly available databases and designs three baseline models to conduct experiments: single or multiple heartbeat analysis, cross-time stability analysis, and emotional stability analysis.
4.1. Experimental Background
The results and discussion may be presented separately or in one combined section and may optionally be divided into headed subsections.
4.1.1. Experimental Environment
All experiments were conducted using the following computer hardware configuration: Intel (R) Xeon (R) Platinum 8255 C CPU @ 2.50 GHz, NVIDIA RTX 3080 (10 GB) × 1 GPU, and 40 GB of memory. The software environment included Python 3.9, Scikit-learn 0.23.1, TensorFlow 2.5.0, NumPy 1.19.0, and Scipy 1.4.1.
4.1.2. Parameter Settings
The following hyperparameters were determined through multiple experiments: Epoch was set to 100, with early termination if the validation set loss did not decrease for five consecutive rounds; batch size was set to 64; learning rate was set to 0.00001; number of CNN convolution kernels was 32, with a size of 1 and padding to maintain input and output sizes; number of heads in the transformer encoder was four; embedding layer dimension was eight; dimension of the hidden layer of the feedforward neural network was eight; activation function was ReLU; dropout rate was 0.25; optimizer was Adam; and loss function was categorical cross-entropy.
4.2. Ablation Experiment
The baseline models included CTFN without augmentation, the CNN module in CTFN, and the transformer encoder module in CTFN. CTFN without augmentation utilizes non-augmented data for verification through CTFN, while the remaining models use augmented data. The results are presented in Table 3.
Table 3 Results of the ablation experiment.
| Databases | Model | ACC (%) | F1-score (%) | EER (%) |
| ECG-ID | CTFN without augment | 95.51 | 96.82 | 0.55 |
| CNN | 98.87 | 97.59 | 0.28 | |
| Transformer encoder | 97.75 | 96.85 | 0.36 | |
| CTFN | 100.00 | 99.79 | 0.12 | |
| MIT-ARR | CTFN without augment | 95.83 | 97.12 | 0.72 |
| CNN | 97.92 | 96.79 | 1.44 | |
| Transformer encoder | 95.83 | 95.92 | 1.54 | |
| CTFN | 100.00 | 98.78 | 0.45 | |
| MIT-NSR | CTFN without augment | 100.00 | 98.45 | 0.86 |
| CNN | 100.00 | 96.39 | 1.28 | |
| Transformer encoder | 94.44 | 95.71 | 1.52 | |
| CTFN | 100.00 | 99.35 | 0.30 |
As presented in Table 3, after data augmentation, CTFN achieved an accuracy of 100.00% and an F1-score of over 98.78%, with an EER below 0.45% across all three databases. Compared to the baseline models, CTFN improved the accuracy by 2.25%, 4.17%, and 5.56% for ECG-ID, MIT-ARR, and MIT-NSR, respectively. The F1-score also increased by 2.97%, 2.86%, and 3.64%, while the EER decreased by 0.11%, 0.37%, and 0.21%, respectively. CTFN exhibited varying degrees of improvement over the baseline models across the three databases. Figures 3 and 4 depict the confusion matrices and test performance in terms of FAR and FRR for the three databases, where the intersection of FAR and FRR represents the EER. As indicated by the confusion matrices, the model effectively identified the testers (with blue diagonal entries), exhibiting very few false positives or false negatives (nonzero off-diagonal entries), with almost no misidentifications. The FAR and FRR curves confirm this observation.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
Furthermore, after training and testing the CTFN model using data processed by the augmentation algorithm, accuracy improved by up to 4.49%, F1-score increased by a maximum of 2.97%, and EER decreased by up to 0.56%. The changes in validation loss (val_loss) during CTFN training with and without data augmentation for the three databases are illustrated in Figure 5. The model converged faster with augmented data, requiring significantly fewer training epochs compared to unaugmented data, indicating that the model was less prone to divergence.
[IMAGE OMITTED. SEE PDF]
4.3. Robustness Experiments
This section validates the robustness of the CTFN model through three experiments: single or multiple beat, cross-time, and emotion stability analyses.
- 1.
Single or multiple beat (6 beats) analysis was conducted on the ECG-ID, MIT-ARR, MIT-NSR, and CYBHi datasets.
- 2.
Cross-time stability analysis involves training the CTFN model on the CYBHi long-term dataset “Now” and testing it on both “Now” (nonoverlapping with the training set), and data collected 3 months later (“Later”) to analyze the impact of different time points on model validation.
- 3.
Emotion stability analysis is performed by training the model on the CYBHi short-term dataset “Normal” and testing it on “Normal” (nonoverlapping with the training set), “Low,” and “High” emotional states to analyze the effect of emotions on the model.
The results of the three experiments are summarized in Table 4, and the findings are discussed from three perspectives.
Table 4 Results of robustness experiments.
| Databases | Single beat | Multiple beat | ||||
| ACC (%) | F1 (%) | EER (%) | ACC (%) | F1 (%) | EER (%) | |
| ECG-ID | 96.63 | 98.02 | 1.18 | 100.00 | 99.79 | 0.12 |
| MIT-ARR | 95.83 | 95.59 | 1.47 | 100.00 | 98.78 | 0.45 |
| MIT- NSR | 94.44 | 96.71 | 1.14 | 100.00 | 99.35 | 0.30 |
| Now | 96.83 | 97.62 | 0.78 | 98.41 | 98.62 | 0.38 |
| Later | 63.49 | 83.25 | 3.71 | 80.95 | 93.12 | 1.15 |
| Normal | 95.38 | 95.82 | 0.59 | 98.46 | 98.45 | 0.19 |
| Low | 90.77 | 92.03 | 1.21 | 96.92 | 96.68 | 0.43 |
| High | 67.69 | 84.41 | 2.59 | 90.76 | 95.81 | 0.48 |
4.3.1. Single or Multiple Beat Analysis
Single-beat data are susceptible to interference and suitable for scenarios with high data quality and real-time requirements. Conversely, multibeat data has a larger volume, allowing the model to learn more features; however, it also requires longer data collection and training times. The specific performance comparisons are shown in Figure 6. The CTFN model trained on single-beat data achieved a maximum test accuracy of 96.63%, while the test accuracy with multibeat data approached 100%. However, the performance of single-beat data declines significantly in complex scenarios (such as those involving time and emotional influences) and fails to meet the requirements for accurate identity verification. The experimental results in this section also indicate that it cannot meet practical application demands. Subsequent analyses of the cross-time and emotional stability experiments were conducted using six-beat data.
[IMAGE OMITTED. SEE PDF]
4.3.2. Cross-Time Stability Analysis
After 3 months, there may be slight changes in heartbeat data due to factors such as season, environment, or individual physiological state. These changes can lead to differences in performance metrics between verification after 3 months and immediate verification. Figures 7 and 8 show the confusion matrices and FAR/FRR test performance curves for the Now and Later datasets. Combined with the data in Table 4, the accuracy of verification after 3 months dropped to 80.95%, F1-score decreased to 93.12%, and the EER increased to 1.15%. Compared with immediate verification, this represents a 17.46% decrease in accuracy. However, the accuracy of 80.95% still surpasses the 77.80% reported in the latest literature [28–30].
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
4.3.3. Emotional Stability Analysis
ECG data are also affected by emotional fluctuations [38], and these subtle changes can be captured for emotion recognition. However, in identity authentication, emotional fluctuations can affect a model’s judgment. Figures 9 and 10 show the confusion matrices and FAR/FRR test performance curves for normal, low, and high emotional states. Combined with the data in Table 4, when watching low-arousal videos, the accuracy decreased to 96.92%, F1-score decreased to 96.68%, and the EER increased to 0.43%. The impact on the model results was relatively small when the emotional fluctuations were minimal. When watching high-arousal videos, accuracy decreased to 90.76%, F1-score decreased to 95.81%, and EER increased to 0.48%. Compared with a stable emotional state, the accuracy of verification decreased by 1.54% and 7.70% when watching low- and high-arousal videos, respectively. The best-performing model in this scenario [27] achieved an accuracy of 91.07% using eight heartbeats while watching high-arousal videos, whereas this study achieved nearly identical accuracy using six heartbeats. However, because they did not evaluate the EER, a comprehensive comparison could not be made.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
4.4. Comparison With State-of-the-Art Methods
In order to facilitate a meaningful comparison with the SOTA method, three primary metrics were selected for analysis: accuracy, training time, and average authentication time per person. The CTFN model demonstrates a 100% authentication accuracy rate on the ECG-ID, MIT-ARR, and MIT-NSR databases, representing an improvement of ~1% over the existing SOTA method. However, it should be noted that the training time as well as the authentication time exhibit a slight increase. Furthermore, the CTFN model demonstrates a 3.15% higher accuracy compared to the SOTA method on the CYBHi dataset, with a 3-month interval between enrollment and authentication. In addition, the training and authentication times are reduced. When observing the high arousal video, fewer heartbeats are required to approach the performance of the SOTA method. The detailed comparison is presented in Table 5.
Table 5 Comparison with SOTA methods.
| Authors | Databases | Method | ACC (%) | Training time (s) | Authentication time (ms) |
| Prakash [6] | ECG-ID | CNN + LSTM | 99.49 | 103.73 | 69.08 |
| Asif et al. [39] | MIT-ARR | 1D-CRNN | 98.81 | 353.67 | 92.76 |
| MIT- NSR | 99.62 | 71.34 | 72.89 | ||
| D’angelis [29] | Now | ViT | 99.00 | 341.86 | 83.32 |
| Yi [30] | Later | ADAFF-Net | 77.80 | 832.77 | 104.25 |
| Fatimah [27] | High | FDM + PT | 91.07 | — | — |
| Ours | ECG-ID | CTFN | 100.00 | 127.82 | 75.56 |
| MIT-ARR | 100.00 | 391.72 | 97.92 | ||
| MIT- NSR | 100.00 | 81.72 | 78.34 | ||
| Now | 98.41 | 221.05 | 75.87 | ||
| Later | 80.95 | 614.72 | 95.24 | ||
| High | 90.76 | 844.74 | 93.69 |
5. Conclusions
This study presented the CTFN, a pioneering framework for ECG-based identity authentication that capitalizes on the complementary strengths of CNNs and transformers. The multistage enhanced CNN extracts local features, such as specific ECG waveforms, R peaks, and valleys, capturing fine-grained patterns that are unique to individuals. Concurrently, the transformer encoder models global dependencies across ECG sequences, thereby enabling the model to comprehend overall signal fluctuations. The integration of these modules, in conjunction with addressing identity authentication challenges through adaptive feature fusion, has led to significant advancements in the accuracy and robustness of the CTFN.
CTFN demonstrates notable efficacy; however, it is subject to several limitations. These limitations include high computational complexity due to the transformer encoder, a dependence on accurate preprocessing for QRS detection and noise filtering, and limited generalization to individuals with cardiac pathologies. In addition, while it handles moderate emotional and environmental variability, extreme conditions may affect its accuracy. The model is also not optimized for lightweight, real-time deployment on edge devices. Future research will concentrate on addressing these challenges by developing lightweight variants through techniques like pruning and knowledge distillation, improving feature extraction using transfer learning for noisy data, validating the model on pathological datasets, and integrating multimodal biometrics for enhanced reliability. In addition to the aforementioned points, the dual function of ECG authentication systems in health monitoring can be explored. For example, Bin Heyat and Akhtar et al. realized the association analysis between cardiovascular diseases and insomnia through machine learning [40, 41]. The ECG features captured by the CTFN model may provide auxiliary information for early disease diagnosis, further advancing its application in healthcare IoT.
Data Availability Statement
The dataset used in this study is openly available on the website provided below, allowing for reproducibility, and further research by the scientific community. ECGID: . MIT-ARR: . MIT-NSR: . CYBHi: .
Conflicts of Interest
The authors declare no conflicts of interest.
Author Contributions
Heng Jia was responsible for model construction and experimental validation. Zhidong Zhao managed the project and secured funding, ensuring that the project was well-supported and aligned with funding objectives. Yefei Zhang handled the manuscript’s editing and review, refining the presentation, and ensuring clarity and coherence in the final submission. Xianfei Zhang was in charge of data visualization. Yanjun Deng focused on the methodology and validation. Hao Wang contributed to data curation and provided additional support in resource management. Pengfei Jiao participated in validating the experimental results.
Funding
This work was supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LQ24F010011, in part by the Natural Science Foundation of China under Grant 62301205, in part by the Fundamental Research Funds for the Provincial Universities of Zhejiang (Grant GK259909299001-021), in part by the Zhejiang Provincial Natural Science Foundation of China under Grants LDT23F01012F01 and LDT23F01015F01, in part by the Zhejiang Province Key Research and Development Program Project under Grant 2024C01023, and in part by the Zhejiang Provincial Key Laboratory for Sensitive Data Security Protection and Confidentiality Management (No. 2024E10048).
1 Zhang Y., Gao C., Pan S., Li Z., Xu Y., and Qiu H., A Score-Level Fusion of Fingerprint Matching With Fingerprint Liveness Detection, IEEE Access. (2020) 8, 183391–183400, https://doi.org/10.1109/ACCESS.2020.3027846.
2 Ming Z., Visani M., Luqman M. M., and Burie J.-C., A Survey on Anti-Spoofing Methods for Facial Recognition With RGB Cameras of Generic Consumer Devices, Journal of Imaging. (2020) 6, no. 12, https://doi.org/10.3390/jimaging6120139.
3 Bin Heyat M. B., Lai D., and Akhtar F., et al.GuptaD., BhattacharyyaS., and KhannaA., Bruxism Detection Using Single-Channel C4-A1 on Human Sleep S2 Stage Recording, Intelligent Data Analysis, 2020, 1st edition., John Wiley & Sons, 347–367.
4 Heyat M. B. B., Lai D., Khan F. I., and Zhang Y., Sleep Bruxism Detection Using Decision Tree Method by the Combination of C4-P4 and C4-A1 Channels of Scalp EEG, IEEE Access. (2019) 7, 102542–102553, https://doi.org/10.1109/ACCESS.2019.2928020.
5 Bin Heyat M. B., Akhtar F., and Khan A., et al.A Novel Hybrid Machine Learning Classification for the Detection of Bruxism Patients Using Physiological Signals, Appl. Sci. (2020) 10, no. 21, 1–16, https://doi.org/10.3390/app10217410.
6 Jaya Prakash A., Patro K. K., Hammad M., Tadeusiewicz R., and Pławiak P., BAED: A Secured Biometric Authentication System Using ECG Signal Based on Deep Learning Techniques, Biocybernetics and Biomedical Engineering. (2022) 42, no. 4, 1081–1093, https://doi.org/10.1016/j.bbe.2022.08.004.
7 Sun L., Zhong Z., Qu Z., and Xiong N., PerAE: An Effective Personalized AutoEncoder for ECG-Based Biometric in Augmented Reality System, IEEE Journal of Biomedical and Health Informatics. (2022) 26, no. 6, 2435–2446, https://doi.org/10.1109/JBHI.2022.3145999.
8 Asadianfam S., Talebi M. J., and Nikougoftar E., ECG-Based Authentication Systems: A Comprehensive and Systematic Review, Multimedia Tools and Applications. (2024) 83, no. 9, 27647–27701, https://doi.org/10.1007/s11042-023-16506-3.
9 Agrawal V., Hazratifard M., Elmiligi H., and Gebali F., ElectroCardioGram (ECG)-Based User Authentication Using Deep Learning Algorithms, Diagnostics. (2023) 13, no. 3, https://doi.org/10.3390/diagnostics13030439.
10 Odinaka I., Lai P.-H., Kaplan A. D., O’Sullivan J. A., Sirevaag E. J., and Rohrbaugh J. W., ECG Biometric Recognition: A Comparative Analysis, IEEE Transactions on Information Forensics and Security. (2012) 7, no. 6, 1812–1824, https://doi.org/10.1109/TIFS.2012.2215324, 2-s2.0-84870264493.
11 Bin Heyat M. B., Akhtar F., and Abbas S. J., et al.Wearable Flexible Electronics Based Cardiac Electrode for Researcher Mental Stress Detection System Using Machine Learning Models on Single Lead Electrocardiogram Signal, Biosensors. (2022) 12, no. 6, https://doi.org/10.3390/bios12060427.
12 Akhtar F., Belal Bin Heyat M., and Sultana A., et al.Medical Intelligence for Anxiety Research: Insights From Genetics, Hormones, Implant Science, and Smart Devices with Future Strategies, WIREs Data Mining and Knowledge Discovery. (2024) 14, no. 6, https://doi.org/10.1002/widm.1552.
13 Biel L., Pettersson O., Philipson L., and Wide P., ECG Analysis: A New Approach in Human Identification, IEEE Transactions on Instrumentation and Measurement. (2001) 50, no. 3, 808–812, https://doi.org/10.1109/19.930458, 2-s2.0-0035364148.
14 Israel S. A., Irvine J. M., Cheng A., Wiederhold M. D., and Wiederhold B. K., ECG to Identify Individuals, Pattern Recognition. (2005) 38, no. 1, 133–142, https://doi.org/10.1016/j.patcog.2004.05.014, 2-s2.0-4644322024.
15 Irvine J. M. and Israel S. A., A Sequential Procedure for Individual Identity Verification Using ECG, EURASIP Journal on Advances in Signal Processing. (2009) 2009, https://doi.org/10.1155/2009/243215, 2-s2.0-67749108362, 243215.
16 Chan A. D. C., Hamdy M. M., Badre A., and Badee V., Wavelet Distance Measure for Person Identification Using Electrocardiograms, IEEE Transactions on Instrumentation and Measurement. (2008) 57, no. 2, 248–253, https://doi.org/10.1109/TIM.2007.909996, 2-s2.0-39449110375.
17 Wang J.-S., Chiang W.-C., Hsu Y.-L., and Yang Y.-T. C., ECG Arrhythmia Classification Using a Probabilistic Neural Network With a Feature Reduction Method, Neurocomputing. (2013) 116, 38–45, https://doi.org/10.1016/j.neucom.2011.10.045, 2-s2.0-84878473520.
18 Tantawi M. M., Revett K., Salem A.-B., and Tolba M. F., A Wavelet Feature Extraction Method for Electrocardiogram (ECG)-Based Biometric Recognition, Signal, Image and Video Processing. (2015) 9, no. 6, 1271–1280, https://doi.org/10.1007/s11760-013-0568-5, 2-s2.0-84939653534.
19 Goshvarpour A. and Goshvarpour A., Human Identification Using a New Matching Pursuit-Based Feature Set of ECG, Computer Methods and Programs in Biomedicine. (2019) 172, 87–94, https://doi.org/10.1016/j.cmpb.2019.02.009, 2-s2.0-85061528768.
20 Yanık H. C., Değirmenci E., Büyükakıllı B., Karpuz D., Kılınç O. H., and Gürgül S., Electrocardiography (ECG) Analysis and a New Feature Extraction Method Using Wavelet Transform With Scalogram Analysis, Biomedical Engineering/Biomedizinische Technik. (2020) 65, no. 5, 543–556, https://doi.org/10.1515/bmt-2019-0147.
21 Zhang Y., Zhao Z., Deng Y., Zhang X., and Zhang Y., ECGID: A Human Identification Method Based on Adaptive Particle Swarm Optimization and the Bidirectional LSTM Model, Frontiers of Information Technology & Electronic Engineering. (2021) 22, no. 12, 1641–1654, https://doi.org/10.1631/FITEE.2000511.
22 Tan C., Zhang L., Qian T., Bras S., and Pinho A. J., Statistical n-Best AFD-Based Sparse Representation for ECG Biometric Identification, IEEE Transactions on Instrumentation and Measurement. (2021) 70, 1–13, https://doi.org/10.1109/TIM.2021.3119138.
23 Donida Labati R., Muñoz E., Piuri V., Sassi R., and Scotti F., Deep-ECG: Convolutional Neural Networks for ECG Biometric Recognition, Pattern Recognition Letters. (2019) 126, 78–85, https://doi.org/10.1016/j.patrec.2018.03.028, 2-s2.0-85045116303.
24 Shin S., Kang M., Zhang G., Jung J., and Kim Y. T., Lightweight Ensemble Network for Detecting Heart Disease Using ECG Signals, Applied Sciences. (2022) 12, no. 7, https://doi.org/10.3390/app12073291.
25 Belo D., Bento N., Silva H., Fred A., and Gamboa H., ECG Biometrics Using Deep Learning and Relative Score Threshold Classification, Sensors. (2020) 20, no. 15, https://doi.org/10.3390/s20154078.
26 Ibtehaz N., Chowdhury M. E. H., and Khandakar A., et al.EDITH: ECG Biometrics Aided by Deep Learning for Reliable Individual Authentication, IEEE Transactions on Emerging Topics in Computational Intelligence. (2021) 6, no. 4, 928–940, https://doi.org/10.1109/TETCI.2021.3131374.
27 Fatimah B., Singh P., Singhal A., and Pachori R. B., Biometric Identification From ECG Signals Using Fourier Decomposition and Machine Learning, IEEE Transactions on Instrumentation and Measurement. (2022) 71, 1–9, https://doi.org/10.1109/TIM.2022.3199260.
28 Zhang L., Chen S., Lin F., Ren W., Choo K.-K., and Min G., 1DIEN: Cross-Session Electrocardiogram Authentication Using 1D Integrated EfficientNet, ACM Transactions on Multimedia Computing, Communications and Applications. (2023) 20, no. 1, 1–17.
29 D’angelis O., Bacco L., Vollero L., and Merone M., Advancing ECG Biometrics Through Vision Transformers: A Confidence-Driven Approach, IEEE Access. (2023) 11, 140710–140721, https://doi.org/10.1109/ACCESS.2023.3338191.
30 Yi P., Si Y., Fan W., and Zhang Y., ECG Biometrics Based on Attention Enhanced Domain Adaptive Feature Fusion Network, IEEE Access. (2023) 12, 1291–1307.
31 El_Rahman S. A., Biometric Human Recognition System Based on ECG, Multimedia Tools and Applications. (2019) 78, no. 13, 17555–17572, https://doi.org/10.1007/s11042-019-7152-0, 2-s2.0-85059784641.
32 Moody G. B. and Mark R. G., The Impact of the MIT-BIH Arrhythmia Database, IEEE Engineering in Medicine and Biology Magazine. (2001) 20, no. 3, 45–50, https://doi.org/10.1109/51.932724, 2-s2.0-0034953193.
33 Goldberger A. L., Amaral L. A. N., and Glass L., et al.PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals, Circulation. (2000) 101, no. 23, e215–e220, https://doi.org/10.1161/01.CIR.101.23.e215.
34 da Silva H. P., Lourenço A., Fred A., Raposo N., and Aires-de-Sousa M., Check Your Biosignals Here: A New Dataset for Off-the-Person ECG Biometrics, Computer Methods and Programs in Biomedicine. (2014) 113, no. 2, 503–514, https://doi.org/10.1016/j.cmpb.2013.11.017, 2-s2.0-84892814006.
35 Christov I. I., Real Time Electrocardiogram QRS Detection Using Combined Adaptive Threshold, BioMedical Engineering OnLine. (2004) 3, no. 1, 1–9, https://doi.org/10.1186/1475-925X-3-28, 2-s2.0-13144284940.
36 Li Y., Pang Y., Wang K., and Li X., Toward Improving ECG Biometric Identification Using Cascaded Convolutional Neural Networks, Neurocomputing. (2020) 391, 83–95, https://doi.org/10.1016/j.neucom.2020.01.019.
37 Dosovitskiy A., Beyer L., and Kolesnikov A., et al.An Image is Worth 16x16 words: Transformers for Image Recognition at Scale, 2020, arXiv preprint arXiv: 2010.11929.
38 Rumpa L. D., Suluh S., Ramopoly I. H., and Jefriyanto W., Development of ECG Sensor Using Arduino Uno and E-Health Sensor Platform: Mood Detection From Heartbeat, 1528, Journal of Physics: Conference Series, 2020, IOP Publishing.
39 Asif M. S., Faisal M. S., and Dar M. N., et al.Hybrid Deep Learning and Discrete Wavelet Transform-Based ECG Biometric Recognition for Arrhythmic Patients and Healthy Controls, Sensors. (2023) 23, no. 10, https://doi.org/10.3390/s23104635.
40 Bin Heyat M. B., Akhtar F., and Sultana A., et al.Role of Oxidative Stress and Inflammation in Insomnia Sleep Disorder and Cardiovascular Diseases: Herbal Antioxidants and Anti-Inflammatory Coupled With Insomnia Detection Using Machine Learning, Current Pharmaceutical Design. (2022) 28, no. 45, 3618–3636, https://doi.org/10.2174/1381612829666221201161636.
41 Akhtar F., Heyat M. B. B., and Parveen S., et al.Early Coronary Heart Disease Deciphered via Support Vector Machines: Insights From Experiments, 2023 20th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP, 2023, IEEE, 1–7, https://doi.org/10.1109/ICCWAMTIP60502.2023.10387051.
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
