J-HJ and S-WL are joint first authors.
WHAT IS ALREADY KNOWN ON THIS TOPIC
Current diagnostic methods for dyspnoea in emergency departments rely primarily on clinical assessments and basic imaging, supplemented by biomarkers such as N-terminal probrain natriuretic peptide, often leading to delayed or inaccurate differentiation between cardiac and pulmonary causes. A recognised need exists for more rapid and accurate diagnostic tools in this setting.
WHAT THIS STUDY ADDS
This study exhibits an artificial intelligence (AI)-powered ECG tool that analyses standard 12-lead ECGs using a transformer neural network, demonstrating high diagnostic accuracy with an area under the receiver operating characteristic curve of 0.938. The tool significantly outperforms traditional biomarkers and provides immediate results, aiding in the rapid differentiation between cardiac and pulmonary dyspnoea.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
AI-powered ECG analysis could be integrated into clinical protocols, potentially transforming the initial assessment process in emergency departments. By providing faster and more accurate diagnoses, this tool could reduce healthcare costs and improve patient outcomes. Further research could explore its integration with other diagnostic tools and its effectiveness across different hospital settings.
Introduction
Dyspnoea is a common yet complex presentation in the emergency department (ED), where numerous life-threatening conditions necessitate rapid and precise differential diagnoses.1 In the USA, dyspnoea causes three to four ED visits annually, with 50% of the patients admitted to acute tertiary care hospitals. The Asia-Pacific region accounts for 5% of all ED visits.2 Contrary to prevailing knowledge, acute dyspnoea is associated with a 2.57-fold increase in mortality (annual mortality 4.9% vs 2.3%) compared with acute chest pain.3 Pneumonia and heart failure (HF) are the two major causes of dyspnoea in the ED, with incidences of lower respiratory tract infections, including pneumonia and HF, at 24.9% and 17.3%, respectively.2 In 10%–15% of cases, the two diseases may coexist, necessitating combined treatment.
HF is a leading cause of hospitalisation and mortality, especially in patients >65 years old.4–6 Its global prevalence is rapidly increasing and is closely associated with increased morbidity and mortality.4 However, accurately diagnosing the cause of dyspnoea in the ED is often challenging and can lead to misdiagnosis owing to multifactorial causes. Additionally, HF requires various assessments to determine its clinical and haemodynamic profiles. Although intensive physical examinations and diagnostic tests are important, they have limitations as screening tools, and for interpretation, still rely on the proficiency of the clinician. Transthoracic echocardiography (TTE) remains underutilised owing to its complex analysis and the need for highly trained professionals, despite being the most effective diagnostic tool for HF owing to its rapid and accurate structural/functional evaluation.
ECG is a rapid, inexpensive, non-invasive and simple diagnostic tool widely used in medicine. Recently, artificial intelligence (AI) has been used to analyse ECG data and predict cardiac function. Expert-driven and machine-learning-driven AI models can predict HF regardless of ejection fraction in patients with dyspnoea. In contrast, convolutional neural network model-based AI-ECG effectively identifies left ventricular systolic dysfunction in selected patients with dyspnoea in the ED.7–9 However, AI-based clinical data are lacking, and further clinical research is warranted for the use of AI-based technologies in real-world clinical settings.
Therefore, we hypothesised that deep learning algorithms can be used in ED settings to identify the cause of dyspnoea.
Methods
Study design and population
This retrospective cross-sectional study was conducted at the ED of Inha University Hospital, Incheon, South Korea, from February 2006 to September 2023. It focused on consecutive patients presenting with dyspnoea, capitalising on the ED’s capacity for rapid testing to differentiate causes of dyspnoea and enable accurate and efficient data retrieval—crucial for comprehensive retrospective analysis of patient records. The inclusion criteria were age ≥18 years, admission to either the cardiology or pulmonary departments post-ED visit, and having undergone ECG and TTE. For individuals who visited the ED multiple times, the initial visit was designated as the index visit. Patients admitted to departments other than cardiology or pulmonology were excluded. Patients with ‘HF’ as their primary diagnosis on admission to the cardiology department were categorised under cardiac origin. Conversely, patients admitted to the pulmonology department from the ED with ‘pneumonia’ as their initial diagnosis were categorised under pulmonary origin. The HF classification adhered to the most recent guidelines.5 6
Patient informed consent was waived owing to its impracticality and minimal risk of harm.
ECG and echocardiographic measurements
All patients underwent an ECG within 24 hours of their ED visit. When multiple ECGs were obtained, the first ECG performed in the ED was selected. All ECGs were acquired at a sampling rate of 500 Hz using a GE-Marquette ECG machine (Marquette Tools, Milwaukee, Wisconsin, USA), and raw data were stored as XML documents using the MUSE® cardiology information system in relational databases.
Thereafter, patients who underwent comprehensive two-dimensional TTE within 180 d of their initial ED visit were identified. Echocardiographic studies were performed using commercially available instruments, and parameters were measured according to the guidelines.10 The left ventricular ejection fraction (LVEF) was calculated using the modified Simpson’s method from apical four-chamber and two-chamber views. Early mitral inflow (E) velocity and septal mitral annular velocity (e’) were obtained from tissue Doppler imaging.
Deep learning modelling (DLM) for identifying the origin of dyspnoea
The proposed AI-ECG architecture used in this study is schematically illustrated in figure 1. This figure depicts the workflow when a patient with dyspnoea visits the ED: a clinical evaluation, including a physical examination, radiographic examination and laboratory investigation, is performed. The ECG data are then processed through the AI-ECG model to predict the likelihood of cardiac or pulmonary causes of dyspnoea. Based on the AI-ECG predictions, clinicians can make informed decisions about the patient’s condition, aiding in diagnosing the underlying cause of dyspnoea.
A Light Gradient Boosting Machine (GBM)11 was used for preprocessing ECGs. LightGBM (a machine-learning model) was used alongside an automatic labelling method that we developed to facilitate learning and prediction of the R-peak location. The preprocessed ECG was subject to pretraining using a transformer-based12 HF and pulmonary diseases shared training network with HF pretraining structure to classify patients with LVEF ≤40%. The transformer structure used here effectively captures features from global data, such as those employed in GPT13 and BERT.14 The weights in the Multi-Head Attention block can also be used to compute an attention score that quantifies important features in the input data. After that, the HF pretraining structure was replaced with the pulmonary disease post-training structure to post train for classifying cardiac or pulmonary diseases. This transfer learning approach effectively uses pretrained ECG features to classify cardiac or pulmonary diseases with increased accuracy. Additionally, the dataset used in this study was split into training, validation and holdout test sets with a ratio of 7:1:2, respectively, through random sampling. The training and validation sets were used for model training and tuning. The holdout test set was constructed independently from the training and validation sets, ensuring that it did not include any patients present in the training and validation sets. Furthermore, the patients included in the holdout test set were not included in the training and validation sets of both the pretraining and post-training phases. All performance metrics reported in this paper were evaluated using the holdout test set. Figure 2 shows the dataset creation and analysis strategy to ensure a robust and reliable dataset for training, validating and testing the network.
Figure 2. Development, validation and schematic strategy for dataset creation and analysis. Performance of deep-learning-based model for cardiac origin. EF, ejection fraction; GBM, gradient boosting machine; HF, heart failure; HFrEF, heart failure-reduced ejection fraction.
Statistical analyses
Continuous variables are described as means and SD, and skewed data as medians with IQRs. Categorical variables are reported as numbers and percentages. Student’s t-test or Mann-Whitney U test was used for continuous variables to compare baseline characteristics between the groups. The χ2 test or Fisher’s exact test was used to analyse categorical variables.
Diagnostic utility was calculated using sensitivity, specificity, negative and positive predictive values, and negative and positive likelihood ratios. Receiver operating characteristic (ROC) curves were generated for both the AI-powered ECG and N-terminal pro-brain natriuretic peptide (NT-proBNP), with HF diagnosis considered the gold standard. The area under the curve (AUC) was calculated. The AI model’s performance was measured using the AUC and ROC curves to predict the dataset’s accuracy, recall (sensitivity), specificity and F1 score. Recall represents the ratio of correctly predicted positive observations to all observations. The F1 score (balanced F-score) is the harmonic mean of precision and recall.
Logistic regression analysis assessed the independent contribution of AI-powered ECG in identifying dyspnoea of cardiac origin. A comprehensive independent variable set was evaluated, including patient demographics (age and sex), medical history, laboratory findings, echocardiographic measures and AI-ECG prediction probabilities. The final multivariate model was constructed using the forward selection method based on a p value of 0.2 and clinical significance. The performance of the two models was assessed to determine the incremental prognostic value of the AI-powered ECG prediction probabilities: a baseline model including the same parameters used in the logistic regression analysis and an extended model that included the baseline model parameters plus the AI-powered ECG prediction probability. The area under the receiver operating characteristic curve, net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were calculated, and the DeLong test was used to compare the models statistically.
Additional sensitivity analyses in the pulmonary origin group were conducted to determine if patients were primarily cardiac rather than pulmonary origin, potentially indicating misclassification in the ED. The high probability of cardiac origin was defined as an AI-ECG-predicted probability of cardiac origin ≥0.70 in patients within the pulmonary origin group. Additionally, cases of true pulmonary origin with AI-ECG misinterpreted as cardiac origin were analysed among patients in the pulmonary origin group. Two certified cardiologists manually and blindly reviewed all patient data to determine whether they were of cardiac origin.
A two-sided p value of 0.05 was considered statistically significant. R statistical software (V.4.1.0; R Foundation for Statistical Computing, Vienna, Austria) was used for all analyses.
Results
In the initially enrolled 3677 patients, 572 were excluded owing to missing demographics (n=59), ECGs (n=55) and echocardiographic information (n=458). Finally, 3105 patients (mean age 68±16 years, male:female=1805:1300, pneumonia=1908, HF=1197) were included in our analysis.
Baseline clinical and ECG characteristics
Table 1 shows the study population’s baseline characteristics and AI-ECG prediction probabilities. The NT-proBNP level was higher in patients with dyspnoea of cardiac origin (3524.0 (1068.0–8191.0) pg/mL vs 445.0 (131.0–1767.0) pg/mL, p<0.001). For patients with dyspnoea of cardiac origin, 76.3% were predicted as cardiac causes (p<0.001) by AI-ECG, whereas 88.1% were predicted as pulmonary (p<0.001) causes for patients with dyspnoea of origin. The median (IQR) AI-ECG prediction probability value for cardiac causes in patients with dyspnoea of cardiac origin was 0.90 (0.54–0.99). The patients were matched using 1:1 exact matching on the propensity score for age and sex, which resulted in adequate balance. After matching, all the standardised mean differences for the covariates were below 0.1, and all standardised mean differences for squares and two-way interactions between covariates were below 0.15. After the propensity score matching analysis, the AI-ECG prediction probability value showed significant differences between the two groups (online supplemental table S1). Additionally, subgroup analyses stratified by an age cut-off of 65 years confirmed these findings (online supplemental table S2).
Table 1Baseline characteristics of the study population
Cardiac origin (n=1197) | Pulmonary origin (n=1908) | Total (n=3105) | P value | |
Clinical characteristics | ||||
Age, years | 73±14 | 65±17 | 68±16 | <0.001 |
Female sex, n (%) | 537 (44.9) | 763 (40.0) | 1300 (41.9) | 0.008 |
BMI (kg/m2) | 24.2±4.7 | 22.9±4.1 | 23.4±4.4 | <0.001 |
History of HF, n (%) | 291 (24.3) | 85 (4.5) | 376 (12.1) | <0.001 |
DM, n (%) | 338 (28.3) | 258 (13.5) | 596 (19.2) | <0.001 |
HTN, n (%) | 435 (36.4) | 327 (17.1) | 762 (24.5) | <0.001 |
CKD, n (%) | 257 (21.5) | 54 (2.8) | 311 (10.0) | <0.001 |
CAD, n (%) | 235 (19.6) | 117 (6.1) | 352 (11.3) | <0.001 |
MI, n (%) | 75 (6.3) | 42 (2.2) | 117 (3.8) | <0.001 |
COPD, n (%) | 57 (4.8) | 196 (10.3) | 253 (8.1) | <0.001 |
AF, n (%) | 458 (38.3) | 92 (4.8) | 550 (17.7) | <0.001 |
Laboratory findings | ||||
LVEF (%) | 45.5±14.7 | 56.7±12.2 | 48.8±14.9 | <0.001 |
E/e’ | 17.3±7.5 | 15.6±6.7 | 16.9±7.4 | 0.001 |
Laboratory findings | ||||
BUN (mg/dL) | 24.6±15.8 | 19.6±14.7 | 21.6±15.3 | <0.001 |
Creatinine (mg/dL) | 1.4±1.4 | 1.2±1.2 | 1.3±1.2 | <0.001 |
eGFR (mL/min/1.73 m2) | 63.2±30.4 | 75.4±32.9 | 70.7±32.5 | <0.001 |
NT-proBNP (pg/mL) | 3524.0 (1068.0–8191.0) | 445.0 (131.0–1767.0) | 1241.0 (247.0–4715.0) | <0.001 |
Log-NT-proBNP (pg/mL) | 3.5 (3.0–3.9) | 2.6 (2.1–3.2) | 3.0 (2.3–3.7) | <0.001 |
WBC (x109/L) | 8.6±3.7 | 10.9±5.8 | 9.9±5.2 | <0.001 |
Neutrophil (%) | 67.1±13.2 | 76.3±12.7 | 72.8±13.6 | <0.001 |
ESR (mm/hour) | 23.0±24.0 | 52.4±34.1 | 41.0±33.8 | <0.001 |
CRP (mg/dL) | 1.8±3.0 | 10.8±9.8 | 7.5±9.1 | <0.001 |
ECG findings | ||||
Heart rate (beats/min) | 89.8±24.6 | 92.3±20.5 | 91.7±21.5 | 0.017 |
PR interval (ms) | 167.8±34.5 | 155.6±26.1 | 157.8±28.1 | <0.001 |
QRS duration (ms) | 99.6±22.9 | 90.7±15.6 | 92.8±18.0 | <0.001 |
QT interval (ms) | 397.5±63.1 | 365.9±45.4 | 373.3±51.8 | <0.001 |
Corrected QT interval (ms) | 472.5±43.7 | 445.7±38.9 | 452.0±41.6 | <0.001 |
P axis | 48.2±29.3 | 51.6±24.2 | 51.0±25.2 | 0.021 |
R axis | 30.0±56.9 | 40.6±44.5 | 38.1±47.9 | <0.001 |
T axis | 79.1±81.5 | 47.1±42.0 | 54.5±55.5 | <0.001 |
AI-ECG prediction probability | ||||
Probability for pulmonary origin | 0.10 (0.01–0.46) | 0.91 (0.78–0.96) | 0.79 (0.17–0.94) | <0.001 |
Pulmonary causes, n (%) | 284 (23.7) | 1681 (88.1) | 1965 (63.3) | <0.001 |
Probability for cardiac origin | 0.90 (0.54–0.99) | 0.09 (0.04–0.22) | 0.21 (0.06–0.83) | <0.001 |
Cardiac causes, n (%) | 913 (76.3) | 227 (11.9) | 1140 (36.7) | <0.001 |
Variables are expressed as the mean±SD, median (IQR), or n (%).
AF, atrial fibrillation; AI, artificial intelligence; BMI, body mass index; BUN, blood urea nitrogen; CAD, coronary artery disease; CKD, chronic kidney disease; COPD, chronic obstructive pulmonary disease; CRP, C reactive protein; DM, diabetes mellitus; e’, septal mitral annular velocity; E, early mitral inflow; eGFR, estimated glomerular filtration rate; ESR, erythrocyte sedimentation rate; HF, heart failure; HTN, hypertension; LVEF, left ventricular ejection fraction; MI, myocardial infarction; NT-proBNP, N-terminal probrain natriuretic peptide; WBC, white blood cell.
Performance of DLM for identifying the origin of dyspnoea
Online supplemental figure S1 depicts the AUC of the AI-ECG model after pretraining, predicting patients with LVEF ≤40%, with an AUC value of 0.855 (95% CI 0.801 to 0.901). After pretraining an AI-ECG with LVEF, it was post-trained to identify cardiac causes among patients with dyspnoea visiting the ED. An AI-ECG model achieved an AUC of 0.938 (95% CI 0.897 to 0.965) with an accuracy of 88.1% (95% CI 84.0% to 92.1%) in identifying dyspnoea of cardiac origin among patients visiting the ED. The sensitivity, specificity and positive and negative predictive values were 93.0%, 79.5%, 89.0% and 86.4%, respectively. The F1 score was 0.828 (95% CI 0.757 to 0.891). When compared with the predictive value of NT-proBNP for dyspnoea of cardiac origin (cut-off value >1383), AI-ECG performance showed superior diagnostic value in this study population for identifying dyspnoea of cardiac origin (AUC 0.938, 95% CI 0.897 to 0.965 for AI-ECG; AUC 0.765, 95% CI 0.701 to 0.823, p<0.001 for NT-proBNP, figure 3). Figure 4 presents the ECG heatmap of attention scores for patients with cardiogenic origins.
Figure 3. AUROC curve for identifying cardiac causes. The red line indicates the AI-ECG model, and the blue line indicates the NT-proBNP level. AI, artificial intelligence; AUROC, area under the receiver operating characteristic curve; NT-proBNP, N-terminal probrain natriuretic peptide.
Figure 4. The ECG heatmap displays attention scores for patients with cardiac origin, highlighting bright yellow segments on the waveform that indicate regions that have substantially contributed to the classification of cardiac origin.
Association between parameters to identify the cardiac origin of dyspnoea
Table 2 demonstrates the univariate logistic regression analysis. The pretraining AI-ECG probability was significantly associated with the cardiac origin of dyspnoea. After adjusting for clinical, laboratory and echocardiographic parameters, AI-ECG probability was independently associated with dyspnoea of cardiac origin (adjusted OR, 3.45 (95% CI 2.26 to 5.30), p<0.001). In contrast, the NT-proBNP level was not significantly associated with dyspnoea of cardiac origin. We conducted an additional analysis to assess the incremental prognostic value of the AI-ECG performance. Two models were compared: a baseline model including the clinical, laboratory and echocardiographic parameters as demonstrated in table 2, and an extended model that included the baseline model parameters plus the AI-ECG probability. Compared with the baseline model, the extended model including AI-ECG performance showed superior diagnostic value even after adjusting for the clinical, laboratory and echocardiographic parameters (AUC 0.958, 95% CI 0.951 to 0.964, p<0.001 for the extended model; AUC 0.917, 95% CI 0.907 to 0.927, p<0.001 for NT-proBNP, online supplemental figure S2). The NRI and IDI were 0.136 and 0.035, respectively. In the DeLong test, the Z-Score was 4.369, and the p value was <0.001.
Table 2Factors associated with identifying the cardiac origin of dyspnoea
Parameter | Univariate analysis | Multivariate analysis | ||||
OR | 95% CI | P value | OR | 95% CI | P value | |
Age (per year) | 1.03 | 1.03 to 1.04 | 0.007 | 0.98 | 0.96 to 1.00 | 0.042 |
Female sex | 1.22 | 1.05 to 1.41 | <0.001 | 2.33 | 1.48 to 3.73 | <0.001 |
HF history | 6.92 | 5.39 to 8.97 | <0.001 | 0.83 | 0.46 to 1.44 | 0.499 |
DM | 2.52 | 2.10 to 3.02 | <0.001 | 0.87 | 0.53 to 1.42 | 0.562 |
HTN | 2.78 | 2.35 to 3.29 | <0.001 | 0.85 | 0.54 to 1.36 | 0.503 |
CKD | 9.39 | 6.99 to 12.84 | <0.001 | 2.41 | 1.31 to 4.60 | 0.006 |
AF | 12.23 | 9.67 to 15.63 | <0.001 | 4.64 | 2.74 to 8.14 | <0.001 |
Log-NT-proBNP (per pg/mL) | 3.87 | 3.41 to 4.41 | <0.001 | 1.08 | 0.77 to 1.52 | 0.639 |
WBC (per 10³/μL) | 0.90 | 0.88 to 0.91 | <0.001 | 0.95 | 0.91 to 0.99 | 0.022 |
CRP (per mg/dL) | 0.73 | 0.71 to 0.75 | <0.001 | 0.80 | 0.77 to 0.84 | <0.001 |
LVEF (per %) | 0.94 | 0.93 to 0.95 | <0.001 | 0.96 | 0.95 to 0.98 | <0.001 |
E/e’ | 1.04 | 1.02 to 1.06 | <0.001 | 0.98 | 0.95 to 1.01 | 0.127 |
AI-ECG | 23.81 | 19.68 to 28.91 | <0.001 | 3.45 | 2.26 to 5.30 | <0.001 |
AF, atrial fibrillation; AI, artificial intelligence; CKD, chronic kidney disease; CRP, C reactive protein; DM, diabetes mellitus; e’, septal mitral annular velocity; E, early mitral inflow; HF, heart failure; HTN, hypertension; LVEF, left ventricular ejection fraction; NT-proBNP, N-terminal probrain natriuretic peptide; WBC, white blood cell.
Sensitivity analysis for AI-ECG prediction probability showing cardiac causes in pulmonary origin group
We identified patients predicted to have cardiac causes from the pulmonary origin group using AI-ECG analysis and performed an additional sensitivity analysis to determine whether these patients had true cardiac causes.
Among the 1908 patients in this group, 129 (6.76%) were identified as having a high probability of dyspnoea of cardiac origin with an AI-ECG predicted probability ≥0.70. These patients were classified into three categories based on the likelihood of cardiac origins: ≥0.90 (n=65), 0.80–0.89 (n=34) and 0.70–0.79 (n=34). In the pulmonary group, the number of patients with dyspnoea of true cardiac origin (as predicted by AI-ECG) with a probability of ≥0.70 was 53 (81.5%), 23 (76.7%) and 20 (58.8%) for those with likelihoods of ≥0.90, 0.80–0.89 and 0.70–0.79, respectively (p=0.046, figure 5). Online supplemental table S3 shows the baseline characteristics of the study population. The NT-proBNP levels of patients with true pulmonary causes within the pulmonary group did not differ from those of patients with dyspnoea of true cardiac origin in the pulmonary group.
Figure 5. Sensitivity analysis according to AI-ECG probability for cardiac origin in patients with pulmonary origin reviewed by a cardiologist. The blue box represents the true cardiac origin, and the red box represents the true pulmonary origin, as reviewed by expert cardiologists. AI, artificial intelligence.
Additionally, we analysed patients with dyspnoea of true pulmonary origin where AI-ECG was misinterpreted as having a cardiac origin. Among the 34 cases involving 33 patients, 22 had abnormal ECG findings, including bundle branch block, whereas 7 exhibited abnormal lung conditions, including cancer (n=3) and tuberculosis-destroyed lung, empyema thorax and emphysema lung (n=1 each). One patient exhibited abnormal ECG findings and lung cancer. Four patients exhibited no definitive abnormalities (online supplemental table S4).
Discussion
Main findings
Our study aimed to assess the efficacy of AI-ECG in distinguishing between cardiac and pulmonary causes of dyspnoea, a critical challenge in the ED. Our findings highlighted several key issues notably the participant characteristics, the validity of AI-ECG compared with NT-proBNP levels and the identification of dyspnoea origins in patients misclassified by their initial presentation.
AI-ECG demonstrated a significant predictive value over traditional diagnostic methods (especially NT-proBNP) in identifying dyspnoea of cardiac origin. Furthermore, 74.4% of patients identified by AI-ECG as having a predicted cardiac origin were confirmed to have cardiac issues, demonstrating the high accuracy of AI-ECG predictions among patients initially classified as having a pulmonary origin in the ED.
Difficulties in diagnosing HF in real-world clinical settings
Dyspnoea stemming from diverse cardiac and pulmonary conditions often presents overlapping clinical features, posing challenges in initial assessments and increasing the risk of misdiagnosis. Approximately 10%–15% of ED dyspnoea cases involve concurrent cardiac and pulmonary issues, necessitating complex clinical interventions. Our study (including diverse patient demographics and comorbidities) revealed distinctions between cardiac-origin and pulmonary-origin dyspnoea. Notably, patients with cardiac-origin dyspnoea were older and had multiple comorbidities. These characteristics emphasise the complexity of diagnosing dyspnoea and highlight the need for advanced diagnostic tools to accommodate such diversity.
Validity of AI-ECG compared to NT-proBNP
Natriuretic peptides effectively diagnose HF in acute care settings.15–18 A previous systematic review found that using lower recommended thresholds of 300 ng/L for NT-proBNP achieved sensitivities of 0.99 (0.97–1.00) and negative predictive values of 0.98 (0.89–1.0) for diagnosing acute HF.15 Another study comparing the diagnostic value of natriuretic peptide in HF with that of ECG showed that NT-proBNP provided a higher negative predictive value (0.97) and a lower positive predictive value (0.44), whereas an abnormal ECG did not add any further predictive value.17 However, their effectiveness in distinguishing HF may be limited owing to multiple factors affecting natriuretic peptide levels.19 Our study demonstrated that the AI-ECG algorithm outperformed NT-proBNP in identifying the cardiac origin of dyspnoea (AUC, 0.938 vs 0.765). Moreover, even after adjusting for the clinical, laboratory and echocardiographic parameters, AI-ECG significantly improved the model’s predictive performance (AUC, 0.958 vs 0.917). This suggests that AI-ECG analysis has the potential to serve as a new tool that surpasses NT-proBNP, improving the diagnostic accuracy of HF.
Advancing ECG diagnostics with AI: unveiling the power of deep learning and transformer models
In our sensitivity analysis, clinicians initially classified 11.9% of patients presenting with dyspnoea in the ED as pulmonary in origin. However, 74.4% of these patients had a cardiac origin when the AI-ECG prediction probability was >0.70, indicating high accuracy, particularly with higher prediction probability values. AI-ECG offers benefits not found in conventional diagnostic approaches. The advantage of the transformer model is its ability to effectively handle temporal dependencies across long sequences via self-attention mechanisms, allowing for the accurate modelling of complex temporal patterns without relying on fixed-size windows or recurrent connections. Furthermore, its parallelisable architecture enables efficient processing of large-scale time-series data, contributing to faster training and inference times. Our results show that AI-ECG significantly enhances clinical decision-making in the time-sensitive ED environment by integrating rapidly into workflows, thereby facilitating quicker diagnoses and avoiding delays from more invasive procedures. This innovation has the potential to help physicians deliver better-informed care efficiently.
ECG pattern recognition via DLM: potential mechanisms differentiating HF from pulmonary conditions
Traditional AI models (particularly those with deep-learning architectures) often serve as ‘black boxes’, obscuring the basis of their decisions.20 However, the proposed transformer architecture resolves this challenge by employing an attention mechanism that distinctly accentuates significant segments of the input data with attention scores. The attention score heatmap displays the specific areas of the ECG that our AI-ECG model focused on when predicting patients with cardiogenic causes, thereby enhancing the transparency of the model and providing insights into its decision-making process.
Our investigation of ECGs via deep learning analysis revealed distinctive patterns, particularly within the intrinsicoid segments of the QRS complex and the T wave. These are crucial for differentiating between patients with HF and those with pulmonary conditions. Despite the challenges posed by the ‘black-box’ nature of deep learning algorithms, our results support the hypothesis that the electrocardiographic differences observed in patients with HF are primarily owing to alterations in ventricular depolarisation and repolarisation abnormalities.21–23 These findings suggested that ventricular electrical remodelling plays a substantial role in the analytical capabilities of deep learning algorithms, indicating that such ECG modifications are crucial for evaluating patients presenting with dyspnoea in the ED. This highlights the potential of deep learning for enhancing the diagnostic accuracy of HF in clinical settings, thereby providing a novel approach for identifying and managing patients with this condition.
Limitations
Our study has some limitations. First, we defined the cardiac and pulmonary origins as HF and pneumonia, respectively. Therefore, we limited our representation of these disease categories, although these conditions are highly prevalent and represent significant cases. However, we focused on comparing the two conditions with the highest incidence and most representative of each disease, yielding positive results. This lays the foundation for extending the findings of this study to other diseases.
Second, HF and pneumonia can coexist. As previously mentioned, the two diseases may coexist in 10%–15% of the cases. The presence of overlapping conditions requires clinical decision-making based on the clinical situation and test results, making subjective judgments by clinicians unavoidable. Consequently, the differentiation of AI-ECG in such cases remains a future challenge. Third, AI-ECG was misinterpreted as indicating dyspnoea of cardiac origin in cases where the true origin was pulmonary. This suggests the need for further algorithm training to address misinterpretations related to certain ECG patterns and conditions affecting cardiac remodelling, especially as specific causes were unidentified in four cases.
Fourth, our transformer network required more computational power than traditional networks such as convolutional24 and recurrent neural network,25 owing to its many parameters and the quadratic increase in computational demands with input sequence length. We mitigated this by developing a method to identify R-peaks and select only five ECG patterns, reducing the sequence used to approximately 32% via linear interpolation, thereby lowering computational demands without significantly affecting prediction accuracy. Despite these advances, the single-centre study and the imbalanced dataset underscore the need for larger-scale validation and comparison with established models.
Fifth, a biased population based on disease severity and comorbidities may have been selected because this retrospective study was conducted at a tertiary university hospital, thereby limiting its representativeness of the general population. Therefore, prospective studies are warranted to establish the usefulness of AI-ECG in the medical field.
Conclusion
Our findings suggest that the application of AI-ECG represents a promising advancement in the ED and provides a new and effective means of identifying the cause of dyspnoea. AI-ECG could become an important tool in the evolving medical diagnostic landscape owing to its potential to increase diagnostic accuracy and shorten treatment times. This could lead to more tailored treatment strategies, ultimately improving patient outcomes and reducing the burden on the ED. Prospective studies are warranted to further evaluate the practicality and effectiveness of real-time improvement in acute care settings.
We want to thank Editage (www.editage.co.kr) for providing writing assistance, improving the language, and proofreading the article.
Data availability statement
No data are available.
Ethics statements
Patient consent for publication
Not applicable.
Ethics approval
The study was approved by the Institutional Review Board of Inha University Hospital and conducted in accordance with the ethical principles outlined in the Declaration of Helsinki (INHAUH 2023-03-024).
J-HJ and S-WL contributed equally.
Contributors The major contributions to this study were led by J-HJ, S-WL and Y-SB, who participated in various stages, including conceptualisation, data curation and manuscript drafting. D-YK, S-HS, S-CL, D-HK and WC contributed to the formal analysis, investigation, project management and validation processes. S-CL, D-HK and WC handled the software development and funding acquisition. The team supervised, reviewed and edited the research at all stages. Guarantor: Y-SB.
Funding This work was supported by an Inha University Research Grant and a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number RS-2023-00265440).
Disclaimer The funders played no role in the study design, data collection and analysis, decision to publish, or manuscript preparation.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
1 Mueller C. Acute dyspnoea in the emergency department. In: Price S, Rosano GMC, Vrints C, eds. The ESC textbook of intensive and acute cardiovascular care. 2nd ed. Oxford: Oxford University Press, 2017: 103–12.
2 Laribi S, Keijzers G, van Meer O, et al. Epidemiology of patients presenting with dyspnea to emergency departments in Europe and the Asia-Pacific region. Eur J Emerg Med 2019; 26: 345–9. doi:10.1097/MEJ.0000000000000571
3 Argulian E, Agarwal V, Bangalore S, et al. Meta-analysis of prognostic implications of dyspnea versus chest pain in patients referred for stress testing. Am J Cardiol 2014; 113: 559–64. doi:10.1016/j.amjcard.2013.10.019
4 Cho JY, Cho D-H, Youn J-C, et al. Korean Society of Heart Failure Guidelines for the Management of Heart Failure: Definition and Diagnosis. Int J Heart Fail 2023; 5: 51–65. doi:10.36628/ijhf.2023.0009
5 Heidenreich PA, Bozkurt B, Aguilar D, et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol 2022; 79: e263–421. doi:10.1016/j.jacc.2021.12.012
6 McDonagh TA, Metra M, Adamo M, et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J 2021; 42: 3599–726. doi:10.1093/eurheartj/ehab368
7 Choi D-J, Park JJ, Ali T, et al. Artificial intelligence for the diagnosis of heart failure. NPJ Digit Med 2020; 3: 54. doi:10.1038/s41746-020-0261-3
8 Adedinsewo D, Carter RE, Attia Z, et al. Artificial Intelligence-Enabled ECG Algorithm to Identify Patients With Left Ventricular Systolic Dysfunction Presenting to the Emergency Department With Dyspnea. Circ Arrhythm Electrophysiol 2020; 13: e008437. doi:10.1161/CIRCEP.120.008437
9 Yoon M, Park JJ, Hur T, et al. Application and Potential of Artificial Intelligence in Heart Failure: Past, Present, and Future. Int J Heart Fail 2024; 6: 11–9. doi:10.36628/ijhf.2023.0050
10 Lang RM, Badano LP, Mor-Avi V, et al. Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging 2015; 16: 233–70. doi:10.1093/ehjci/jev014
11 Ke GL, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017; 30.
12 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst 2017; 30.
13 Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020; 33: 1877–901.
14 Devlin J, Chang M-W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv 2018. doi:10.48550/arXiv.1810.04805
15 Roberts E, Ludman AJ, Dworzynski K, et al. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting. BMJ 2015; 350: h910. doi:10.1136/bmj.h910
16 Kelder JC, Cramer MJ, Verweij WM, et al. Clinical utility of three B-type natriuretic peptide assays for the initial diagnostic assessment of new slow-onset heart failure. J Card Fail 2011; 17: 729–34. doi:10.1016/j.cardfail.2011.04.013
17 Zaphiriou A, Robb S, Murray-Thomas T, et al. The diagnostic accuracy of plasma BNP and NTproBNP in patients referred from primary care with suspected heart failure: results of the UK natriuretic peptide study. Eur J Heart Fail 2005; 7: 537–41. doi:10.1016/j.ejheart.2005.01.022
18 Cowie MR, Struthers AD, Wood DA, et al. Value of natriuretic peptides in assessment of patients with possible new heart failure in primary care. Lancet 1997; 350: 1349–53. doi:10.1016/S0140-6736(97)06031-5
19 Mueller C, McDonald K, de Boer RA, et al. Heart Failure Association of the European Society of Cardiology practical guidance on the use of natriuretic peptide concentrations. Eur J Heart Fail 2019; 21: 715–31. doi:10.1002/ejhf.1494
20 Hassija V, Chamola V, Mahapatra A, et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput 2024; 16: 45–74. doi:10.1007/s12559-023-10179-8
21 O’Neal WT, Mazur M, Bertoni AG, et al. Electrocardiographic Predictors of Heart Failure With Reduced Versus Preserved Ejection Fraction: The Multi-Ethnic Study of Atherosclerosis. J Am Heart Assoc 2017; 6: e006023. doi:10.1161/JAHA.117.006023
22 Coronel R, Wilders R, Verkerk AO, et al. Electrophysiological changes in heart failure and their implications for arrhythmogenesis. Biochim Biophys Acta 2013; 1832: 2432–41. doi:10.1016/j.bbadis.2013.04.002
23 Houser SR, Margulies KB. Is depressed myocyte contractility centrally involved in heart failure? Circ Res 2003; 92: 350–8. doi:10.1161/01.RES.0000060027.40275.A6
24 Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE 1998; 86: 2278–324. doi:10.1109/5.726791
25 Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nat New Biol 1986; 323: 533–6. doi:10.1038/323533a0
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background
Acute dyspnoea is common in acute care settings. However, identifying the origin of dyspnoea in the emergency department (ED) is often challenging. We aimed to investigate whether our artificial intelligence (AI)-powered ECG analysis reliably distinguishes between the causes of dyspnoea and evaluate its potential as a clinical triage tool for comparing conventional heart failure diagnostic processes using natriuretic peptides.
Methods
A retrospective analysis was conducted using an AI-based ECG algorithm on patients ≥18 years old presenting with dyspnoea at the ED from February 2006 to September 2023. Patients were categorised into cardiac or pulmonary origin groups based on initial admission. The performance of an AI-ECG using a transformer neural network algorithm was assessed to analyse standard 12-lead ECGs for accuracy, sensitivity, specificity and area under the receiver operating characteristic curve (AUC). Additionally, we compared the diagnostic efficacy of AI-ECG models with N-terminal probrain natriuretic peptide (NT-proBNP) levels to identify cardiac origins.
Results
Among the 3105 patients included in the study, 1197 had cardiac-origin dyspnoea. The AI-ECG model demonstrated an AUC of 0.938 and 88.1% accuracy for cardiac-origin dyspnoea. The sensitivity, specificity and positive and negative predictive values were 93.0%, 79.5%, 89.0% and 86.4%, respectively. The F1 score was 0.828. AI-ECG demonstrated superior diagnostic performance in identifying cardiac-origin dyspnoea compared with NT-proBNP. True cardiac origin was confirmed in 96 patients in a sensitivity analysis of 129 patients with a high probability of cardiac origin initially misdiagnosed as pulmonary origin predicted by AI-ECG.
Conclusions
AI-ECG demonstrated superior diagnostic accuracy over NT-proBNP and showed promise as a clinical triage tool. It is a potentially valuable tool for identifying the origin of dyspnoea in emergency settings and supporting decision-making.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Division of Cardiology, Department of Internal Medicine, Inha University College of Medicine, Incheon, South Korea
2 Department of Electrical and Computer Engineering, Inha University, Incheon, South Korea
3 DeepCardio Inc, Incheon, South Korea; Department of Computer Engineering, Inha University, Incheon, South Korea
4 Division of Cardiology, Department of Internal Medicine, Inha University College of Medicine, Incheon, South Korea; DeepCardio Inc, Incheon, South Korea
5 DeepCardio Inc, Incheon, South Korea; Department of Information and Communication Engineering, Inha University, Incheon, Korea (the Republic of)