Full text

Translate

Turn on search term navigation

Correspondence to L. Malin Overmars; [email protected]

WHAT IS ALREADY KNOWN ON THIS TOPIC

Excluding coronary stenosis in low-to-intermediate-risk individuals presenting with related symptoms remains challenging and often leads to inefficient use of coronary imaging modalities such as cardiac CT. Machine learning algorithms using electronic health record (EHR) data have potential in reducing unnecessary imaging, but their generalisability across different healthcare settings has not been well validated.

WHAT THIS STUDY ADDS

This study externally validates sex-stratified machine learning algorithms for excluding coronary stenosis using EHR data. The algorithms demonstrated high negative predictive values when tested on data from the Cardiology Centers of the Netherlands, confirming their reliability and applicability in different healthcare settings.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

The validated algorithms may support clinical decision-making by assisting in the initial assessment of patients with symptoms suggestive of coronary stenosis. However, their clinical utility requires further validation and refinement. This study highlights the potential of EHR-based machine learning tools in non-invasive diagnostics, promoting personalised and data-driven cardiac care.

Introduction

Excluding coronary stenosis in low-to-intermediate-risk individuals with suggestive symptoms remains a clinical challenge, often leading to inefficiencies in healthcare workflows.^1–4 The reliance on coronary imaging modalities, such as cardiac CT, leads to abundant and inefficient use. Recently, we trained sex-stratified algorithms based on electronic health record (EHR) data from an academic hospital that may offer non-invasive, efficient alternatives for excluding coronary stenosis in men and women.^{5 6} The target population includes individuals presenting with symptoms suggestive of coronary stenosis. Using clinical and demographic data, raw electrocardiographic data and haematological characteristics, we showed that the algorithms could identify patients unlikely to have coronary stenosis with high negative predictive values (NPVs), though specificity remained limited.⁵ Given the inherent limitations of single-site development and the heterogeneity of populations, robust external validation is critical to determine the broader applicability and reliability of such algorithms.⁷ The algorithm’s performance may differ across primary, secondary and academic healthcare settings due to variations in population characteristics and clinical practices.^8–10 Therefore, to determine the generalisability of the developed algorithms to a broader population and setting, we retrained algorithms on EHR data from the academic University Medical Center Utrecht (UMCU) and tested the algorithms externally on EHR data from the Cardiology Centers of the Netherlands (CCN), consisting of patients referred by a general practitioner to a cardiologist for diagnostic evaluation throughout the Netherlands.¹¹ The intended purpose of the model is to support healthcare professionals in non-invasively ruling out coronary stenosis before imaging is performed. The study considers potential health inequalities by sex, aiming to validate the model in a sex-stratified manner to ensure fairness and accuracy across demographic groups.

Methods

Training cohort: academic hospital

In this study, the training cohort consisted of patients who underwent coronary CT at the UMCU, an academic hospital in the Netherlands, between 1997 and 2020. EHR data on these individuals were extracted from the Utrecht Patient Oriented Database (UPOD).^{6 12} To identify individuals who had undergone cardiac CT, we used procedure codes 385143 and 385143C, which are used within the Dutch healthcare system for the declaration of healthcare costs. These codes are unique to the Netherlands and may not be directly applicable in other countries. Individuals under the age of 18 years were excluded from the study. The data collected included medication prescriptions (Anatomical Therapeutic Chemical (ATC) codes) with prescription dates, laboratory measurements, clinical measurements and clinical correspondence, such as radiology reports and cardiology letters. All data were pseudonymised before use in the study, and current privacy and ethical regulations were followed. The current study was conducted in compliance with the Declaration of Helsinki.

Testing cohort: cardiology clinics

The external testing cohort consisted of patients who visited 1 of the 13 CCN between 2007 and 2018, which are ‘one-stop-shop’ diagnostic clinics for patients suspected of cardiac disease by their general practitioner.¹¹ Patients who underwent coronary CT or coronary angiography were included. Patients who were referred to the UMCU were excluded to avoid inclusion of the same individuals in the training and external testing cohort. The testing cohort was made available under implied consent and transferred to the UMCU under the Dutch Personal Data Protection Act.¹¹ The unique patient IDs could not be traced back to the individual without accessing the original database, which was not available to UMCU researchers. Unstructured text fields were anonymised using a text mining tool before being included in the final database.¹³

Preprocessing and feature engineering

Predictors were chosen based on clinical relevance and availability in EHR (online supplemental table S3). Measurements were standardised across the dataset. To ensure consistency across the dataset, we implemented a rigorous data preprocessing pipeline informed by our previous research on the reuse of routine laboratory data.¹⁴ Given the long data collection period (1997–2020), we conducted extensive trend analyses and visualisation techniques to detect systematic shifts in measurement distributions over time. Laboratory measurements that had undergone unit changes, such as glucose and cholesterol levels, were converted to a consistent standard to ensure comparability. Similarly, earlier diagnostic coding systems were mapped to International Classification of Diseases (ICD)-10 codes to maintain uniformity in comorbidity classification. Also, we recalculated the glomerular filtration rate as a measure of renal functioning using the 2009 Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula.¹⁵ Cardiovascular history and cardiovascular risk factors which were used as predictor variables in the models were extracted from structured fields within the EHRs. These fields comprise Diagnosis Treatment Combinations codes—used for reimbursement purposes in the Netherlands—ICD-10 codes, and other diagnosis and procedure entries recorded by medical specialists, using terms compiled with clinical domain experts (online supplemental table S1 and S2).

Additionally, to enable accurate model testing, we required an exact overlap of variables between the training and testing cohorts, a prerequisite for model training and testing. We observed that raw haematological characteristics and ECG data within the training cohort were unavailable in the testing cohort. Therefore, we created matched subsets of the training and testing cohorts based on the overlapping variables (online supplemental table S3). Given the missing not at random nature of EHR data, missing values were not imputed. Instead, the XGBoost algorithm’s ability to handle missingness was relied on, acknowledging that this approach may limit model generalisability.^{14 16–19}

Outcome definition

Outcomes were extracted from radiology reports using a natural language processing approach based on regular expressions to identify affirmative and negative mentions of coronary stenosis. Examples of conclusions from the radiology reports and the label attached to the conclusions by the text mining procedure are given in online supplemental table S4. When an Agatston/calcium score was mentioned and it was 0, we labelled it no coronary stenosis. When a stenosis score was mentioned and it was greater than 49 (%), we labelled it coronary stenosis. Furthermore, the text mining pipeline mainly focused on negations,²⁰ such as ‘no coronary artery disease’, ‘no abnormalities’ and ‘clean coronaries’. When in doubt, the pipeline labelled it as coronary stenosis. We used a conservative approach and aimed to label only the clear denials as no coronary stenosis.

Modelling methods

Predictors were standardised and included in the XGBoost algorithm without transformation. The model was internally validated using 10-fold cross-validation and externally validated on a separate cohort (online supplemental methods S1). The absence of coronary stenosis was predicted using XGBoost, a gradient boosting algorithm using built-in cross-validation, Least Absolute Shrinkage and Selection Operator (LASSO) and ridge regularisation to prevent overfitting.²¹ XGBoost can deal with sparse data and missing values by including the missing values in constructing decision trees, thus giving value to the missingness and allowing algorithms to be trained on EHR data. Model hyperparameter optimisation was omitted to avoid the risk of overfitting. The models were sex-stratified to ensure fairness and accuracy across different sexes.

Performance metrics and model explainability

Performance metrics included NPVs, sensitivities, specificities and calibration plots. The model output was the predicted probability of coronary stenosis. A threshold of 0.05 was chosen to prioritise sensitivity, ensuring high NPV.

Additionally, the actual/empirical probability was plotted against the estimated/predicted probability using calibration curves, where values under the curve suggest risk overestimation, whereas values above the curve suggest risk underestimation.^{7 22}

To determine the extent to which the ‘true negatives’ in the training cohort and the testing cohort matched, baseline characteristics were compared. Additionally, for model explainability, SHapley Additive exPlanations (SHAP) values were calculated and plotted. For every individual, each SHAP value represents the impact that a variable has in the prediction.

Continuous variables are presented as means and SD or medians and IQR and were compared using the unpaired Mann-Whitney U test (non-normally distributed) or Student’s t-test (normally distributed). Categorical variables are displayed as frequencies and percentages and compared with χ² tests. A two-sided p value of <0.05 was considered significant.

All analyses were performed in R (V.4.4.1) and RStudio (V.2024.12.1). We used dplyr (V.1.1.4),²³ xgboost (V.1.7.8.1),²⁴ ggplot2,²⁵ predtools (V.0.0.3),²⁶ tidyverse,²⁷ ggstatsplot²⁸ and caret.²⁹

Results

Baseline characteristics

The training cohort comprised of 9298 men (median age 55 years (IQR 48, 63), 13% history of arrhythmia, 14% history of heart failure, 22% history of cardiovascular intervention) and 5376 women (median age 59 years (IQR 49, 67), 10% history of arrhythmia, 12% history of heart failure, 15% history of cardiovascular intervention) (table 1 and online supplemental table S1 and S2). Of the 9298 men within the training cohort, 73% did not have coronary stenosis (figure 1). Of the 5376 women within the training cohort, 83% did not have coronary stenosis (figure 1).

Table 1

Baseline characteristics

	Men (n=14 060)		Women (n=9866)
Characteristic	Training, n=9298	Testing, n=4762	Training, n=5376	Testing, n=4490
Age (years)	55 (48, 63)	60 (52, 67)	59 (49, 67)	60 (52, 68)
Systolic blood pressure, mm Hg	135 (124, 150)	145 (130, 160)	134 (120, 151)	140 (125, 155)
Diastolic blood pressure, mm Hg	81 (74, 89)	85 (80, 95)	79 (72, 87)	85 (75, 90
Body mass index, kg/m²	26.3 (24.2, 29.0)	26.3 (24.5, 29.1)	25.4 (22.5, 29.1)	26.0 (23.1, 29.4)
Glomerular filtration rate (CKD-EPI), mL/min/1.73 m²	83 (60, 90)	94 (82, 103)	80 (60, 90)	92 (79, 102)
Total cholesterol, mmol/L	5.00 (4.10, 5.80)	5.10 (4.30, 5.80)	5.10 (4.40, 6.00)	5.30 (4.60, 6.10)
High-density lipoprotein cholesterol, mmol/L	1.16 (1.00, 1.38)	1.20 (1.00, 1.60)	1.45 (1.23, 1.73)	1.60 (1.30, 2.00)
Low-density lipoprotein cholesterol, mmol/L	3.00 (2.30, 3.70)	3.00 (2.20, 3.70)	3.00 (2.30, 3.70)	3.10 (2.30, 3.80)
Arrhythmia (n, %)	1230 (13%)	949 (20%)	522 (10%)	771 (17%)
Valvular disease (n, %)	428 (5%)	634 (13%)	272 (5%)	634 (14%)
Cardiac intervention (n, %)	1420 (15%)	1022 (21%)	621 (14%)	815 (18%)
Heart failure (n, %)	1341 (14%)	1163 (24%)	589 (12%)	925 (21%)
Coronary artery disease (n, %)	2954 (32%)	2551 (54%)	1249 (23%)	1912 (43%)
Cardiovascular intervention (n, %)	2019 (22%)	484 (10%)	823 (15%)	186 (4%)
Cerebrovascular disease (n, %)	529 (5%)	202 (4%)	308 (6%)	180 (4%)
Congenital cardiac disease (n, %)	90 (0.1%)	17 (0.4%)	72 (1%)	26 (0.5%)

CKD-EPI, Chronic Kidney Disease Epidemiology Collaboration.

View Image - Figure 1. Distribution of coronary stenosis outcomes in train/validation and test cohorts (CCN). This bar plot illustrates the distribution of coronary stenosis outcomes in the train/validation (UPOD) and test cohorts (CCN). The plot displays the proportions of patients with coronary stenosis and without coronary stenosis in each cohort. The train/validation cohort represents the initial dataset used for algorithm development and evaluation, while the test cohort represents an independent dataset used for external validation. The bar plot provides an overview of the coronary stenosis prevalence in each cohort, highlighting the relative proportions of coronary stenosis cases. CCN, Cardiology Centers of the Netherlands; UPOD, Utrecht Patient Oriented Database; CAD, Coronary Artery Disease.

Figure 1. Distribution of coronary stenosis outcomes in train/validation and test cohorts (CCN). This bar plot illustrates the distribution of coronary stenosis outcomes in the train/validation (UPOD) and test cohorts (CCN). The plot displays the proportions of patients with coronary stenosis and without coronary stenosis in each cohort. The train/validation cohort represents the initial dataset used for algorithm development and evaluation, while the test cohort represents an independent dataset used for external validation. The bar plot provides an overview of the coronary stenosis prevalence in each cohort, highlighting the relative proportions of coronary stenosis cases. CCN, Cardiology Centers of the Netherlands; UPOD, Utrecht Patient Oriented Database; CAD, Coronary Artery Disease.

The testing cohort comprised 4762 men (median age 60 (IQR 52, 67), 20% history of arrhythmia, 24% history of heart failure, 10% history of cardiovascular intervention) and 4490 women (median age 60 years (IQR 52, 68), 17% history of arrhythmia, 21% history of heart failure, 4% history of cardiovascular intervention) (table 1). Of the 4762 men within the testing cohort, 60% did not have coronary stenosis (figure 1). Of the 4490 women within the CCN data set, 83% did not have coronary stenosis (figure 1).

Model explainability

The most predictive variable for both men and women was age, where from an age of about 55 years in men and early 60s in women, the Shapley value was higher than 0 (figure 2). This implies an association between higher age and higher risk of coronary stenosis. For women and men, a systolic blood pressure (SBP) lower than 150 mm Hg corresponded to a Shapley value<0, implying that lower SBP is associated with reduced risk of coronary stenosis. For men, haemoglobin levels around 0.08 g/L and total cholesterol levels below 4 mmol/L corresponded to Shapley values<0, implying an association with reduced risk of coronary stenosis. For women, haemoglobin levels higher than 0.08 g/L and cholesterol levels below 7 mmol/L corresponded to Shapley values<0, implying an association with reduced risk of coronary stenosis. Other relationships between predictive variables and Shapley values can be derived from figure 2. To note, these relationships do not imply causality.

View Image - Figure 2. Shapley plots for male and female models. These Shapley plots visualise the Shapley values for the male (A) and female (B) models. Shapley values represent the contribution of each input feature to the output prediction of the model. In the plots, the features are listed on the x-axis, and their corresponding Shapley values are represented by the y-axis. Positive values indicate that the feature value positively contributes to the prediction, while negative values indicate a negative contribution. The Shapley plots provide insights into the importance and influence of each input feature on the model’s decision-making process for both the male and female models. They help to identify the key factors driving the predictions and highlight the relative contributions of each feature in determining the probability of absence of coronary artery disease for individuals. SHAP, SHapley Additive exPlanations; CKD-EPI, Chronic Kidney Disease Epidemiology Collaboration

Figure 2. Shapley plots for male and female models. These Shapley plots visualise the Shapley values for the male (A) and female (B) models. Shapley values represent the contribution of each input feature to the output prediction of the model. In the plots, the features are listed on the x-axis, and their corresponding Shapley values are represented by the y-axis. Positive values indicate that the feature value positively contributes to the prediction, while negative values indicate a negative contribution. The Shapley plots provide insights into the importance and influence of each input feature on the model’s decision-making process for both the male and female models. They help to identify the key factors driving the predictions and highlight the relative contributions of each feature in determining the probability of absence of coronary artery disease for individuals. SHAP, SHapley Additive exPlanations; CKD-EPI, Chronic Kidney Disease Epidemiology Collaboration

Model performance: training and testing cohorts

Based on the results of the 10-fold cross-validation, the algorithms started overfitting after an area under the curve (AUC) of 0.72. The maximum obtained AUC for the internal out-of-fold sets was 0.72 (online supplemental figure S1). For the male algorithm, the NPV was 0.95, the positive predictive value was (PPV) 0.30, sensitivity was 0.98 and specificity was 0.14 (table 2). For the female algorithm, the NPV was 0.93, PPV was 0.20, sensitivity was 0.91 and specificity was 0.26 (table 2).

Table 2

Model performance of the sex-stratified algorithms to exclude CAD

A. Training—men
Observed
Predicted		Coronary stenosis	No coronary stenosis
	Coronary stenosis	2480 (TP)	5836 (FP)	PPV=0.30
	No coronary stenosis	48 (FN)	934 (TN)	NPV=0.95
		Sensitivity=0.98	Specificity=0.14
B. Testing—men
Observed
Predicted		Coronary stenosis	No coronary stenosis
	Coronary stenosis	1860 (TP)	2689 (FP)	PPV=0.41
	No coronary stenosis	23 (FN)	190 (TN)	NPV=0.89
		Sensitivity=0.99	Specificity=0.07
C. Training—women
Observed
Predicted		Coronary stenosis	No coronary stenosis
	Coronary stenosis	837 (TP)	3306 (FP)	PPV=0.20
	No coronary stenosis	83 (FN)	1150 (TN)	NPV=0.93
		Sensitivity=0.91	Specificity=0.26
D. Testing—women
Observed
Predicted		Coronary stenosis	No coronary stenosis
	Coronary stenosis	682	3037	PPV=0.18
	No coronary stenosis	97	674	NPV=0.87
		Sensitivity=0.88	Specificity=0.18

These present the confusion matrices depicting the performance of the sex-stratified algorithms for exclusion of coronary stenosis. The confusion matrices show the distribution of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) results for both the training cohort (A and C) and the external testing cohort (B and D). The performance metrics, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy, are calculated based on these values.

Based on testing cohort, the results were as follows: for the male algorithm, the NPV was 0.89, PPV was 0.41, sensitivity was 0.99 and specificity was 0.07 (table 2). For the female algorithm, the NPV was 0.87, PPV was 0.18, sensitivity was 0.88 and specificity was 0.18 (table 2). The observed difference in NPV between the training and testing cohorts was 0.06 for both the male and female models, indicating a modest decline in performance across datasets. The algorithms showed reasonable calibration overall, although they underestimated risk in the highest-risk categories in the testing cohort, indicating potential areas for improvement (figure 3).

View Image - Figure 3. Calibration plots of the training (A - men, B - women) and testing cohorts (C - men, D - women). This calibration plot illustrates the calibration performance of the models in both the train/validation (UPOD) and test cohorts (CCN). The plot visually compares the predicted probabilities of coronary stenosis against the observed proportions of coronary stenosis cases. The x-axis represents the predicted probabilities, while the y-axis represents the observed proportions. The diagonal reference line indicates perfect calibration, where the predicted probabilities match the observed proportions. Deviations from the reference line indicate miscalibration. The plot provides insights into how well the models’ predicted probabilities align with the actual coronary stenosis outcomes in both the train/validation and test cohorts, assessing the reliability and accuracy of the models’ predictions. CCN, Cardiology Centers of the Netherlands; UPOD, Utrecht Patient Oriented Database.

Figure 3. Calibration plots of the training (A - men, B - women) and testing cohorts (C - men, D - women). This calibration plot illustrates the calibration performance of the models in both the train/validation (UPOD) and test cohorts (CCN). The plot visually compares the predicted probabilities of coronary stenosis against the observed proportions of coronary stenosis cases. The x-axis represents the predicted probabilities, while the y-axis represents the observed proportions. The diagonal reference line indicates perfect calibration, where the predicted probabilities match the observed proportions. Deviations from the reference line indicate miscalibration. The plot provides insights into how well the models’ predicted probabilities align with the actual coronary stenosis outcomes in both the train/validation and test cohorts, assessing the reliability and accuracy of the models’ predictions. CCN, Cardiology Centers of the Netherlands; UPOD, Utrecht Patient Oriented Database.

Characteristics of true negatives and true positives

For male patients predicted to be at low risk by the algorithm without coronary stenosis (true negatives; training n=904, testing n=190), the training cohort had a mean age of 32 years (SD 11); and the testing cohort a mean of 41 years (SD 11 years). In the training cohort, the average SBP was 128 mm Hg (SD 17) and diastolic blood pressure (DBP) was 78 mm Hg (SD 10); in the testing cohort, mean SBP was 139 mm Hg (SD 17) and mean DBP was 82 mm Hg (SD 11). Low-density lipoprotein (LDL) cholesterol levels were below 3.0 mmol/L in both cohorts (online supplemental table S5).

For female patients predicted to be at low risk and without coronary stenosis (true negatives; training n=1161, testing n=674), the training cohort had a mean age of 44 years (SD 13) and the testing cohort had a mean age of 53 years (SD 11). In the training cohort, the average SBP was 129 mm Hg (SD 20) and the average DBP was 78 (SD 12). In the testing cohort, the average SBP was 134 (SD 21), and the average DBP was 82 (SD 11). LDL cholesterol levels were below 3.0 mmol/L in both cohorts (online supplemental table S5).

Discussion

The findings of this study offer insights into the performance and generalisability of the sex-stratified algorithms in excluding coronary stenosis prior to coronary imaging. The models displayed promising results in both the training cohort (UPOD, NPV men 0.95, NPV women 0.93) and the external testing cohort (CCN, NPV men 0.89, NPV women 0.87), indicating their ability to accurately identify a proportion of patients without coronary stenosis. Notably, the female algorithm exhibited higher specificity (UPOD 0.26, CCN 0.18) compared with the male algorithm (UPOD 0.14, CCN 0.07), suggesting a better ability to correctly identify women without coronary stenosis. The results suggest that sex-stratified XGBoost algorithms can assist in excluding coronary stenosis across different clinical settings, though their utility may be limited by low specificity.

Current clinical decision-making regarding the need for coronary imaging is often guided by Bayesian-based risk stratification, which incorporates prior probabilities based on clinical parameters such as age, sex, risk factors and symptom presentation. While this approach is widely used, our findings suggest that a substantial proportion of patients (70–80% in our study) who undergo imaging ultimately do not have coronary stenosis. This highlights the potential for additional data-driven tools to refine patient selection for imaging. Our machine learning algorithm, which leverages EHR data, differs from traditional Bayesian methods by dynamically learning from large-scale real-world data rather than relying on predefined risk categories. By integrating a broader range of clinical parameters and optimising their weighting, our model could complement existing stratification methods by helping to identify patients with a very low probability of coronary stenosis at the time of referral, potentially reducing unnecessary imaging while maintaining diagnostic safety. Further research is needed to explore how such algorithms can be incorporated into clinical workflows alongside existing decision-making strategies.

To the best of our knowledge, this is the first study that created EHR-based algorithms to specifically exclude coronary stenosis in patients suspected of coronary stenosis and validated it externally. In comparison, recent cardiovascular risk prediction models based on EHRs often report different performance metrics tailored towards future disease prediction rather than immediate exclusion of disease. For instance, Suo et al used a Bayesian network-based model for predicting coronary artery disease, achieving an AUC of 0.84 in their validation cohort, reflecting good overall discrimination.³⁰ Du et al, also applying XGBoost along with random forest algorithms to a hypertensive patient population, reported notably higher discrimination performance, with an AUC of 0.943, surpassing simpler statistical methods such as logistic regression (AUC 0.865).³¹ Similarly, Huang et al employed a deep learning-based stacked denoising autoencoder model for acute coronary syndrome prediction, achieving an AUC of 0.868 and accuracy of 0.73.³² For most studies, external validation is lacking. Direct comparisons regarding specificity are challenging due to differences in reported performance metrics across studies; hence, the utility of our models should primarily be assessed in the context of their intended clinical use, specifically the reliable exclusion of immediate disease rather than prediction of future risk.

Several important limitations warrant caution in interpreting these findings. First, the outcome was defined based on cardiac CT reports extracted via text mining rather than through direct human verification, and a conservative endpoint (absence of any plaque) was used. This approach, while safe, deviates from the gold standard for excluding obstructive coronary artery disease. Second, the long data retrieval period (1997–2020) may introduce heterogeneity due to technological advances in imaging, which may affect the generalisability of the findings despite stratified analyses indicating stable performance over time. Third, the exclusion of ECG and haematology data from the external validation—driven by data availability across cohorts—necessitated the use of standard clinical variables only. While this may reduce the potential of the machine learning approach to identify novel, non‐intuitive predictors, it also improves the model’s transportability and generalisability. In real‐world applications, reliance on readily available clinical data ensures that the algorithm can be widely implemented without the need for additional specialised tests.

The study’s strengths include external validation across diverse healthcare settings, compatibility with established diagnostic processes and model explainability through Shapley values, which highlighted the clinical relevance of predictors such as age, body mass index, SBP, renal function and lipid profiles. The use of sex‐stratified models further underscores the importance of capturing demographic nuances in coronary disease risk. However, before these algorithms can be integrated into clinical practice as a reliable, standalone decision support tool, further refinement is required—particularly to improve specificity—and prospective clinical validation and more diverse populations are essential to determine whether unnecessary CT scans can be reduced without increasing cardiovascular events.

While this study validates the performance of a machine learning algorithm for predicting the absence of coronary stenosis, its real-world impact on clinical decision-making and patient outcomes remains to be determined. A crucial next step is a prospective evaluation assessing how the algorithm influences clinical management, including whether its implementation leads to changes in imaging referrals, reduces unnecessary testing or affects treatment decisions. A discordance analysis—comparing cases where the algorithm’s predictions differ from standard clinical assessments—could provide insight into its potential to optimise care. Additionally, studies assessing long-term patient outcomes following algorithm-guided decision-making would be valuable in determining its safety and effectiveness in routine practice. Future research should focus on integrating the algorithm into clinical workflows and assessing its role alongside existing risk stratification strategies to ensure both efficiency and patient benefit.

In conclusion, this study externally validates sex‐stratified machine learning algorithms using EHR data to non-invasively predict the absence of coronary stenosis, with high NPVs observed across settings. However, given the modest specificity and study limitations, these findings should be considered preliminary, warranting further refinement before clinical adoption.

Data availability statement

No data are available. No applicable.

Ethics statements

Patient consent for publication

Not applicable.

Ethics approval

The ethics of our study was reviewed by the medical ethics committee NedMec (METC NedMec). METC NedMec is a recognised medical research ethics committee in the Netherlands, formed through a collaboration between UMC Utrecht, Princess Máxima Center for Pediatric Oncology and the Antoni van Leeuwenhoek institute. As our research involves retrospective analysis of EHRs and does not fall under the scope of the Dutch Medical Research Involving Human Subjects Act (WMO), the committee determined that formal ethical approval was not required (waived). Furthermore, in line with GDPR Article 14(5)(b), individual patient consent was not required due to the disproportionate effort involved in contacting all individuals. Data management specialists from an ISO 9001 certified database (Utrecht Patient Oriented Database) extracted, pseudonymised and securely stored the data.

Footnote

Contributors LMO and BvE performed the data analysis. LMO wrote the manuscript. MCHDG was responsible for data management. All authors reviewed and edited the manuscript and approved the final version of the manuscript. SH is guarantor.

Funding This project is part of the Dutch Cardiovascular Alliance Consortium IMPRESS+ (2020B004).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

References

¹ Shaw LJ, Hachamovitch R, Berman DS, et al. The economic consequences of available diagnostic and prognostic strategies for the evaluation of stable angina patients: an observational assessment of the value of precatheterization ischemia. Economics of Noninvasive Diagnosis (END) Multicenter Study Group. J Am Coll Cardiol 1999; 33: 661–9. doi:10.1016/s0735-1097(98)00606-8

² Rumberger JA, Behrenbeck T, Breen JF, et al. Coronary calcification by electron beam computed tomography and obstructive coronary artery disease: a model for costs and effectiveness of diagnosis as compared with conventional cardiac testing methods. J Am Coll Cardiol 1999; 33: 453–62. doi:10.1016/s0735-1097(98)00583-x

³ Bertoldi EG, Stella SF, Rohde LEP, et al. Cost-effectiveness of anatomical and functional test strategies for stable chest pain: public health perspective from a middle-income country. BMJ Open 2017; 7: e012652. doi:10.1136/bmjopen-2016-012652

⁴ Bertoldi EG, Stella SF, Rohde LE, et al. Long-term Cost-Effectiveness of Diagnostic Tests for Assessing Stable Chest Pain: Modeled Analysis of Anatomical and Functional Strategies. Clin Cardiol 2016; 39: 249–56. doi:10.1002/clc.22532

⁵ Overmars LM, van Es B, Groepenhoff F, et al. Preventing unnecessary imaging in patients suspect of coronary artery disease through machine learning of electronic health records. Eur Heart J Digit Health 2022; 3: 11–9. doi:10.1093/ehjdh/ztab103

⁶ Overmars LM. Big data, small vessels (doctoral dissertation). University Medical Center Utrecht, 2024

⁷ van Smeden M, Heinze G, Van Calster B, et al. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J 2022; 43: 2921–30. doi:10.1093/eurheartj/ehac238

⁸ Knottnerus JA. Between iatrotropic stimulus and interiatric referral: the domain of primary care research. J Clin Epidemiol 2002; 55: 1201–6. doi:10.1016/s0895-4356(02)00528-0

⁹ Riley RD, Ensor J, Snell KIE, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016; 353: i3140. doi:10.1136/bmj.i3140

¹⁰ Pennells L, Kaptoge S, White IR, et al. Assessing risk prediction models using individual participant data from multiple studies. Am J Epidemiol 2014; 179: 621–32. doi:10.1093/aje/kwt298

¹¹ Bots SH, Siegersma KR, Onland-Moret NC, et al. Routine clinical care data from thirteen cardiac outpatient clinics: design of the Cardiology Centers of the Netherlands (CCN) database. BMC Cardiovasc Disord 2021; 21: 287. doi:10.1186/s12872-021-02020-7

¹² ten Berg MJ, Huisman A, van den Bemt PMLA, et al. Linking laboratory and medication data: new opportunities for pharmacoepidemiological research. Clin Chem Lab Med 2007; 45: 13–9. doi:10.1515/CCLM.2007.009

¹³ Menger V, Scheepers F, van Wijk LM, et al. DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text. Telematics and Informatics 2018; 35: 727–36. doi:10.1016/j.tele.2017.08.002

¹⁴ Overmars LM, Niemantsverdriet MSA, Groenhof TKJ, et al. A Wolf in Sheep’s Clothing: Reuse of Routinely Obtained Laboratory Data in Research. J Med Internet Res 2022; 24: e40516. doi:10.2196/40516

¹⁵ Levey AS, Stevens LA, Schmid CH, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med 2009; 150: 604–12. doi:10.7326/0003-4819-150-9-200905050-00006

¹⁶ Nijman S, Leeuwenberg AM, Beekers I, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review. J Clin Epidemiol 2022; 142: 218–29. doi:10.1016/j.jclinepi.2021.11.023

¹⁷ Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393. doi:10.1136/bmj.b2393

¹⁸ Groenwold RHH. Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res 2020; 4: 8. doi:10.1186/s41512-020-00077-0

¹⁹ Oostenbrink R, Moons KGM, Bleeker SE, et al. Diagnostic research on routine care data: prospects and problems. J Clin Epidemiol 2003; 56: 501–6. doi:10.1016/s0895-4356(03)00080-5

²⁰ van Es B, Reteig LC, Tan SC, et al. Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods. BMC Bioinformatics 2023; 24: 10. doi:10.1186/s12859-022-05130-x

²¹ Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD’16; 2016: 785–94. doi:10.1145/2939672.2939785

²² Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17: 230. doi:10.1186/s12916-019-1466-7

²³ Wickham H, François R, Henry L, et al. Dplyr: a grammar of data manipulation. The R Foundation; 2024.

²⁴ Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting. 2024.

²⁵ Wickham H. Ggplot2: elegant graphics for data analysis (use r!). 2nd edn. Cham: Springer, 2016.

²⁶ Sadatsafavi M, Safari A, Lee TY. Predtools: prediction model tools. 2023.

²⁷ Wickham H, Averick M, Bryan J, et al. Welcome to the Tidyverse. JOSS 2019; 4: 1686. doi:10.21105/joss.01686

²⁸ Patil I. Visualizations with statistical details: The 'ggstatsplot' approach. JOSS 2021; 6: 3167. doi:10.21105/joss.03167

²⁹ Kuhn M. Building Predictive Models in R Using the caret Package. J Stat Softw 2008; 28. doi:10.18637/jss.v028.i05

³⁰ Suo X, Huang X, Zhong L, et al. Development and Validation of a Bayesian Network-Based Model for Predicting Coronary Heart Disease Risk From Electronic Health Records. J Am Heart Assoc 2024; 13: e029400. doi:10.1161/JAHA.123.029400

³¹ Du Z, Yang Y, Zheng J, et al. Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation. JMIR Med Inform 2020; 8: e17257. doi:10.2196/17257

³² Huang Z, Dong W, Duan H, et al. A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Trans Biomed Eng 2018; 65: 956–68. doi:10.1109/TBME.2017.2731158

Word count: 5003

Show less

© 2025 Author(s) (or their employer(s)) 2025. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ Group. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Background

Exclusion of coronary stenosis in individuals with suggestive symptoms is challenging. Cardiac CT or coronary angiography is often used but is inefficient and costly and involves risks. Sex-stratified algorithms based on electronic health records (EHRs) could be a non-invasive alternative for excluding coronary stenosis, yet their performance may vary by healthcare settings. Thus, external validation is crucial for determining their generalisability. This study aimed to externally validate sex-stratified machine learning algorithms based on EHR data to predict the absence of coronary stenosis, evaluated in diverse clinical settings.

Methods

Sex-stratified XGBoost algorithms were trained on EHR data from patients who underwent coronary imaging at the University Medical Center Utrecht (n=14 674) and externally tested on EHR data of 13 Cardiology centres in the Netherlands (n=9252). The outcome was defined as the absence of coronary stenosis, identified through text mining of radiology report conclusions, and predictive performance was assessed by negative predictive values (NPVs) and specificities.

Results

On the training cohort (9298 men (median age 55 years, 73% no coronary stenosis) and 5376 women (median age 59 years, 83% no coronary stenosis)), the algorithms showed NPVs and specificities of 0.95 and 0.14 in men and 0.93 and 0.26 in women, respectively. On the testing cohort (4762 men (median age 60 years, 60% no coronary stenosis) and 4490 women (median age 60 years, 83% no coronary stenosis)), the algorithm showed NPVs and specificities of 0.89 and 0.07 in men and 0.87 and 0.18 in women, respectively.

Conclusions

This study externally validates sex‐stratified machine learning algorithms using EHR data to non-invasively predict the absence of coronary stenosis, with high NPVs observed across settings. However, given the modest specificity and study limitations, these findings should be considered preliminary, warranting further refinement before clinical adoption.

Details

Title

Optimising coronary imaging decisions with machine learning: an external validation study

Author

L Malin Overmars¹

; Bram van Es¹; Groepenhoff, Floor²; Mark C H De Groot¹; Somsen, G Aernout³; Bots, Sophie Heleen⁴; Tulevski, I Igor⁵; Hofstra, Leonard⁶; den Ruijter, Hester M²; van Solinge, Wouter W¹; Hoefer, Imo¹; Haitjema, Saskia¹

¹ Central Diagnostic Laboratory, University Medical Centre Utrecht, Utrecht, The Netherlands
² Experimental Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands
³ Cardiology, Cardiology Centers of the Netherlands, Utrecht, The Netherlands
⁴ Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht University Utrecht Institute for Pharmaceutical Sciences, Utrecht, The Netherlands
⁵ Cardiology, Cardiology Centers of the Netherlands, Amsterdam, The Netherlands
⁶ Cardiology, Cardiology Centers of the Netherlands, Amsterdam, The Netherlands; Department of Cardiology, Amsterdam UMC Locatie VUmc, Amsterdam, Noord-Holland, The Netherlands

First page

e003072

Section

Coronary artery disease

Publication year

2025

Publication date

2025

Publisher

BMJ Publishing Group LTD

ISSN

2398595X

e-ISSN

20533624

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1136/openhrt-2024-003072

ProQuest document ID

3199967936

Optimising coronary imaging decisions with machine learning: an external validation study

Jump to:

Full text

Abstract

Details

Suggested sources