Content area

Abstract

With over a century of development, electrocardiogram (ECG) diagnostics has become the preferred tool for healthcare professionals in cardiovascular disease diagnosis and monitoring. As wearable devices and mobile monitoring technologies become widespread, ECG data are trending toward diversity and long-term collection, making traditional manual annotation methods inadequate for massive data analysis demands. This research addresses core challenges in ECG signal classification—extremely imbalanced data, significant individual physiological differences, and difficulties in long sequence fitting—by proposing a Principal Component Analysis-based Conditional Generative Adversarial Network (PCA-CGAN). Through in-depth analysis of ECG signal principal component distribution characteristics, we discovered that just a few principal components can explain over 90% of signal variance, revealing the inherent inefficiency and limitations of traditional complete waveform generation methods. Based on this theoretical foundation, we shift the data augmentation paradigm from generating surface waveforms to generating high information density principal component features, resolving waveform jitter and heterogeneity issues present in traditional methods. Simultaneously, we designed a two-stage conditional encoding-decoding architecture that builds category-independent feature spaces from early training stages, fundamentally breaking the feature space bias caused by the “Matthew effect” and effectively preventing majority classes from compressing minority class features during generation. Using the Transformer’s global attention mechanism, the model precisely captures key diagnostic features of various arrhythmias, maximizing inter-class differences while maintaining intra-class consistency. Experiments demonstrate that PCA-CGAN not only achieves stable convergence on a large-scale heterogeneous dataset comprising 43 patients for the first time but also resolves the “dilution effect” problem in data augmentation, avoiding the asymmetric phenomenon where Precision increases while Recall decreases. After data augmentation, the ResNet model’s average F1 score improved significantly, with particularly outstanding performance on rare categories such as atrial premature beats, far surpassing traditional methods like SigCWGAN and TD-GAN. This research redefines the objectives and methods of ECG signal generation from the theoretical perspectives of information entropy and feature manifolds, providing a systematic solution to data imbalance problems in the medical field while establishing a theoretical foundation for the application of ECG-assisted diagnostic systems in real clinical environments.

Details

1009240
Business indexing term
Title
Principal component conditional generative adversarial networks for imbalanced ECG classification enhancement
Author
Publication title
PLoS One; San Francisco
Volume
20
Issue
8
First page
e0330707
Number of pages
40
Publication year
2025
Publication date
Aug 2025
Section
Research Article
Publisher
Public Library of Science
Place of publication
San Francisco
Country of publication
United States
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2025-05-13 (Received); 2025-08-06 (Accepted); 2025-08-22 (Published)
ProQuest document ID
3242333117
Document URL
https://www.proquest.com/scholarly-journals/principal-component-conditional-generative/docview/3242333117/se-2?accountid=208611
Copyright
© 2025 Chao Tang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-29
Database
ProQuest One Academic