This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
One of the most significant contributions to cardiovascular epidemiology was the creation of the Framingham study [1] which aimed to detect heart disease at an early stage and to identify subtle manifestations (predisposing factors) in apparently healthy individuals. Since then, various risk stratification models have been proposed to assist physicians in decision-making [2]. These models use risk factors to produce a numerical value (score) that represents the probability of experiencing a cardiovascular event within a given time period. However, according to Cortés et al. [3] and Ridker et al. [4], a significant number of people at risk are not identified by these tools, while others receive unnecessary preventive treatment.
Most vascular risk tables based on quantitative methods are derived from the Framingham study [5]. These scores use a common set of risk factors, namely, age, sex, smoking status, arterial pressure, and lipid levels. Additionally, some scores have integrated more sophisticated markers for cardiovascular disease. However, the addition of new risk factors, while useful in reclassifying those with medium risk above or below a chosen intervention threshold, often has a small effect on the overall model performance measured by the area under the receiver operating characteristic (ROC) curve (AUC-ROC) [6].
Some of the limitations inherent in these models lie in their predominant focus on conventional risk factors, potentially resulting in an underestimation of the influence exerted by emerging factors such as genetics, inflammation, obesity, or low height. Moreover, the constant evolution of knowledge in cardiovascular health may diminish the accuracy of these models in capturing contemporary trends, thereby compromising the reliability of their predictions. Part of this discrepancy may be attributed to the methodological framework underpinning these risk prediction tools, which relies on traditional regression statistics. For instance, the Cox model [1] utilized in the Framingham risk score (with an AUC-ROC of 0.734), the American College of Cardiology/American Heart Association (ACC/AHA) model (with an AUC-ROC of 0.728) [7], the Reynolds risk score (with an AUC-ROC of 0.765) [8], the Prospective Cardiovascular Münster Study (PROCAM) model (with an AUC-ROC of 0.744) [9], or the Weibull model employed in the Systematic Coronary Risk Evaluation (SCORE) model (with an AUC-ROC of 0.63) [10] and adjusted with Fine and Gray competing risk models for SCORE2 [11].
In aiming for the prediction of cardiovascular events, the efficacy of these techniques is constrained by a myriad of underlying assumptions. These include the requirement for linearity in the relationship between independent variables and the logarithm of odds, as well as assumptions of normality, homoscedasticity, and independence within the data. When the problem conforms to these statistical assumptions, the model typically demonstrates robust performance. However, when the interaction between predictor variables and outcomes contravenes these assumptions, the model’s ability to generalize predictions to novel cases diminishes significantly [12]. Given these constraints, novel cardiovascular risk models based on machine learning (ML) methodologies have emerged, offering alternative paradigms to traditional logistic or Cox regression models.
Furthermore, these models were developed from specific populations, which may limit their generalizability across different ethnic and geographic cohorts under different health systems. As a result, it is imperative for different countries to either develop their own customized models or conduct calibration studies [13]. In Colombia, several studies have been undertaken to validate:
a. The Framingham and PROCAM models, wherein it was found that despite calibration efforts by the study group, the former exhibited low discriminatory capacity (AUC-ROC of 0.5819), while PROCAM performed more favorably, particularly upon adjustment for sex (AUC-ROC: 0.7446) [14].
b. The ACC/AHA ASCVD score, which demonstrated no significant disparities between expected and observed events and achieved a good discriminatory capacity with an AUC-ROC of 0.782 (95% CI 0.71–0.85) [15].
Moreover, a model known as GLOBORISK-LAC has been developed incorporating a substantial proportion of local Colombian data, which attained a C-statistic of 72%, with calibration slopes of 0.994 for men and 0.852 for women [16]. However, it is crucial to note that this model has not yet undergone validation in other Colombian populations.
In light of the above considerations, there is a striking need to develop a novel strategy to facilitate the construction of a model tailored for predicting cardiovascular risk in the Colombian population. This study advocates the formulation of predictive cardiovascular risk models employing advanced artificial intelligence methodologies, including neural networks, decision trees, support vector machines (SVMs), random forests, and Gaussian Bayesian networks.
2. Materials and Methods
2.1. Study Design
Figure 1 illustrates the general methodology employed in this research to derive ML models for cardiovascular risk estimation. The same database used for the validation study of the Framingham and PROCAM models in Colombia was utilized. The population characteristics, operational definitions of the variables, and outcome determinations are fully explained by Muñoz et al. [14]. In brief, the study included patients aged 30–74 years who were free of cardiovascular events at baseline and were followed at the Primary Prevention Clinic of the Central Military Hospital, at Bogotá (Colombia), from 1984 to 2006. Previous studies conducted at the Central Military Hospital have shown that the demographic characteristics and incidence of the most common diseases (including cardiovascular disease) in this population are similar to those reported for the broader Colombian population. This investigation exclusively incorporated clinical variables collected retrospectively, with no specification of names, identification numbers, or other confidential information, thereby obviating the need for informed consent. The authors assert that this research adheres to international standards of biomedical research as per the 64th version of the Helsinki Declaration. The Institutional Research and Ethics Committee of the School of Medicine at the Pontificia Universidad Javeriana approved the study (approval code: FM-CIE-1094-21).
[figure(s) omitted; refer to PDF]
The dataset comprises records from individual patients, each represented by 14 values: 13 independent variables—including age, gender, weight, height, diabetes, systolic and diastolic blood pressure, cholesterol levels, triglycerides, smoking status, and family history of early coronary disease (Table 1)—and one dependent variable. The dependent variable, “Cardiovascular Event,” refers to the diagnosis confirmed by a domain expert physician. To assess model performance and enhance generalizability, a 5-fold cross-validation was applied over the entire dataset. In this procedure, the data are partitioned into five equal subsets; in each iteration, four subsets are used for training and the remaining one for validation. This process is repeated five times so that each subset is used once as the validation set, and the performance metrics are averaged across all folds to provide a more robust estimate of model performance. Five ML techniques were specifically utilized, namely, neural networks, decision trees, SVMs, random forests, and Gaussian Bayesian networks.
Table 1
Attributes used for cardiovascular risk estimation.
Index | Variable | Definition | Variable type | Operational level |
1 | Age | Time between the date of birth and the date of entry into the record | Discrete quantitative | (30–74) |
2 | Gender | Patient’s gender | Nominal qualitative | 0 = male |
3 | Weight | Patient’s weight | Continuous quantitative | (41–102) |
4 | Height | Patient’s height | Continuous quantitative | (136–197) |
5 | Diabetes | Indicates whether the patient suffers from diabetes | Nominal qualitative | 0 = negative |
6 | Systolic blood pressure | Measurement of systolic blood pressure | Continuous quantitative | (90–230) |
7 | Diastolic blood pressure | Measurement of diastolic blood pressure | Continuous quantitative | (60–140) |
8 | Total cholesterol | Measurement of the total amount of cholesterol in the blood | Continuous quantitative | (98–478) |
9 | HDL | Measurement of the amount of HDL cholesterol in the blood | Continuous quantitative | (18.1–100) |
10 | LDL | Measurement of the amount of LDL cholesterol in the blood | Continuous quantitative | (5.8–386.4) |
11 | Triglycerides | Measurement of the amount of triglycerides in the blood | Continuous quantitative | (40–932) |
12 | Smoking | Tobacco or similar substance consumption | Nominal qualitative | 0 = negative |
13 | Family history | Indicates whether first-degree relatives have experienced any coronary event before the age of 60 | Nominal qualitative | 0 = negative |
14 | Coronary event | Confirmed coronary heart disease: Cardiovascular death, acute myocardial infarction, angina pectoris, or coronary insufficiency | Nominal qualitative | 0 = negative |
Ultimately, the best model was selected based on the AUC-ROC curve. The curve examined the relationship between the true positive rate (TPR) and false positive rate (FPR) by varying classification thresholds. TPR represents the proportion of actual positive cases correctly classified as positive by the model, while FPR indicates the proportion of true negative cases incorrectly classified as positive. A value of 1.0 signifies perfect discriminatory capability, while a value of 0.5 indicates performance similar to randomness. Additionally, we evaluated the mean absolute error (MAE). The MAE calculates the absolute difference between observed values and predicted values, preserving the magnitude of errors without considering their direction. This metric is particularly valuable for evaluating the accuracy of regression models, providing a clear insight into the proximity between predictions and actual values.
Final results were made accessible through a web application, enabling specialist physicians to estimate cardiovascular risk using artificial intelligence. An integral aspect of this article is that the ML models aim to estimate cardiovascular risk by providing an associated risk score. This approach mirrors that of the Framingham score calculation, which similarly considers patient factors in the context of potential cardiovascular events. Finally, the discriminative ability of the ML models was compared with that of traditional statistical methods such as Framingham’s.
2.2. Dataset and Data Balancing
In this research, a total of 847 records were utilized, with 62 (7.31%) corresponding to patients with positive cardiovascular risk and 785 (92.69%) representing negative diagnoses. This data imbalance primarily stems from patient inclusion based on the presence or absence of cardiovascular risk rather than the occurrence of a cardiovascular event. Unlike other studies that select the population based on whether patients experienced events, such as myocardial infarction, this dataset originates from relatively healthy patients, evaluating the necessity of monitoring them to reduce the incidence of cardiovascular events.
Following attribute selection, an analysis of patient distribution based on the presence of the outcome was conducted. This analysis revealed a significant imbalance between the number of patients who experienced cardiovascular events and those who did not. The utilization of datasets with imbalanced class distribution can introduce biases in ML models. This may cause the model’s estimates to lean toward the predominant class in the data, hindering the detection of cases from the minority class. A similar imbalance has been observed in various works addressing the issue of cardiovascular risk prediction [17–20].
Therefore, an oversampling technique called SMOTE-NC [21] was employed to generate additional records. This technique utilizes the difference from its nearest neighbor to insert new values, considering a random number between 0 and 1. Furthermore, it considers both numerical and categorical variables when generating synthetic records. This oversampling technique differs from other approaches as it creates records with subtle variations compared to real data, allowing for a dataset closer to reality and mitigating the impact of class imbalance on the model. To improve data quality and reduce the risk of overfitting due to excessive synthetic data, a partial balancing strategy with a 2:1 ratio was applied. In this configuration, the original 62 instances from the minority class were expanded to 392, while all 785 instances from the majority class were retained, resulting in a balanced dataset of 1177 records, with 330 synthetic instances. This partial oversampling approach maintains a realistic class distribution, helping the model to better detect patterns associated with rare events while preserving the underlying characteristics of the dataset. Compared to complete balancing, the 2:1 ratio offers a compromise that enhances minority class representation without introducing excessive synthetic noise.
2.3. ML Models for Cardiovascular Risk Estimation
In this study, Python was employed as the programming language, and scikit-learn [22] served as the ML tool to derive various models for cardiovascular risk estimation. Each technique involves a set of hyperparameters that must be fine-tuned through experimentation to determine a model capable of making predictions with greater accuracy.
2.3.1. Models Obtained With Neural Networks
To obtain models using neural networks, the architecture of the multilayer perceptron for regression outputs (MLP Regressor) was implemented, available in the scikit-learn library of Python [22]. The choice of this implementation was based on the need to obtain continuous values in the range 0–1, as opposed to binary positive or negative responses that might result from other implementations. Manipulating hyperparameters emerges as a crucial aspect to derive various configurations of the neural network intended for cardiovascular risk estimation. In this context, activation functions such as hyperbolic tangent and sigmoid were explored, given the requirement to normalize output values for results within a defined interval. These functions facilitate the generation of bounded values considering the weights and bias of the last hidden layer. Additionally, solvers such as “adam,” “lbfgs,” and “sgd” were employed for the optimization of neural network weights.
To prevent overfitting and improve generalization, L2 regularization was applied through the hyperparameter alpha, which controls the magnitude of the penalty imposed on large weight values. The values explored for alpha were 0.0001, 0.001, 0.01, 0.1, and 1.0, allowing the assessment of different levels of regularization. During the experimental process, networks with 2–5 hidden layers were evaluated, and each layer considered between 1 and 20 nodes. Considering the inclusion of alpha as an additional hyperparameter, a total of 15,000 models were generated. Figure 2 depicts one of the obtained neural networks with a specific topology of 13-5-3-1. This entails 13 neurons in the input layer corresponding to independent variables, two hidden layers with five and three neurons, respectively, and an output layer with a single neuron.
[figure(s) omitted; refer to PDF]
To enhance the interpretability of the model and better understand the factors influencing its predictions, we employed SHapley Additive exPlanations (SHAP). This technique provides insights into how each variable contributes to the model’s output, offering a more transparent and explainable decision-making process. Figure 3 presents a summary of the SHAP values for each feature in the dataset.
[figure(s) omitted; refer to PDF]
2.3.2. Models Obtained With Decision Trees
Decision trees represent a valuable strategy for making estimations in datasets, standing out for their interpretability compared to other techniques. This feature holds particular significance for medical professionals, providing them with the ability to justify scores assigned to each patient. In this context, the scikit-learn library [22] offers the DecisionTreeRegressor implementation of this technique, enabling the generation of regression outputs ranging from 0 to 1 based on the training set. During hyperparameter tuning, different criteria were explored, such as “squared_error,” “Friedman_mse,” “absolute_error,” and “poisson,” to optimize the model’s performance. Additionally, the max_depth hyperparameter was varied in a range from 10 to 500 to examine its impact on the tree’s predictive ability. Ultimately, strategies for attribute selection at each node were evaluated, considering the options “best” and “random.” This comprehensive exploration led to the assessment of a total of 3000 models. Figure 4 illustrates a representative example of a decision tree with a depth of three, providing a visual and accessible insight into the structure and decisions made by the model in this specific context. To assign a score to a new patient, the attributes at each node are evaluated, and the branch path is followed based on the decisions made. For instance, if a male patient (encoded as 0) with an LDL level of 96 mg/dL and a weight of 70 kg is considered, the resulting estimation would be 0.763.
[figure(s) omitted; refer to PDF]
2.3.3. Models Obtained With SVMs
The models developed through the application of the SVM technique were obtained using the sklearn.svm.SVR module [22]. Throughout the experimental process, a thorough adjustment of hyperparameters was conducted to fine-tune the model’s performance. Specifically, various kernels were explored, including linear, radial basis function, and sigmoid. Adjustments were made to the gamma and penalty coefficient C hyperparameters, utilizing random floating-point values in the range of 0–1. The Coef0 hyperparameter, responsible for controlling the position of the decision boundary in the sigmoid kernel, varied between −100 and 100 by introducing random floating-point values. Finally, the options true and false were explored for the shrinking hyperparameter, indicating whether a shrinking heuristic is employed in SVM optimization. This configuration aims to identify and eliminate elements on the decision boundary, addressing a more manageable optimization problem. The combination of all these hyperparameter modifications led to the evaluation of a total of 10,000 SVM models, seeking the optimal configuration to maximize the system’s performance.
2.3.4. Models Obtained With Random Forests
The technique of random forests is grounded in the principle of ensemble learning, a process that combines multiple classifiers to address complex problems and enhance model accuracy. By amalgamating individual models, classification becomes more flexible, characterized by lower bias and less sensitivity to data variations, resulting in reduced variance. In the context of random forests, classification is executed based on predictions from individual decision trees, utilizing the average of the outputs from these trees.
To conduct this research, the RandomForestRegressor classifier from the scikit-learn library was utilized [22]. Among the most critical hyperparameters are the number of trees (n_estimators) used in the ensemble and the criterion parameter, which determines the function to measure the quality of a split. The criterion parameter was varied using four options: “squared_error,” “friedman_mse,” “absolute_error,” and “poisson.” During the experimental phase, the number of trees was adjusted from 10 to 500 in increments of 10, while the max_depth hyperparameter varied from 10 to 200, also in increments of 10. The combination of all these hyperparameter modifications resulted in the evaluation of a total of 4000 models, aiming to identify the optimal configuration that maximizes the performance of the random forest in the context of the research.
2.3.5. Models Obtained With Gaussian Bayesian Networks
Another technique employed to obtain models facilitating cardiovascular risk estimation is Gaussian Bayesian networks, which are grounded in Bayes’ theorem and conditional probabilities. The scikit-learn library [22] provides various implementations of this technique, adapted according to the binary or discrete nature of attributes. Throughout this research, exhaustive initial tests were conducted to select the implementation most suitable for the distribution of the training set, resulting in the choice of the standard derivation called Gaussian Naive Bayes (GaussianNB).
The GaussianNB classifier features a single hyperparameter, known as the smoothing variable, enabling the adaptation of various models without the use of randomness. To explore its impact, 3000 instances were generated in which this variable was adjusted, thereby contributing to smoothing the curve and, in some instances, enhancing the classification capability. Although this technique generates binary or classification responses, it is essential to note that the resulting probabilities from each test can be used as numerical outputs, allowing their consideration as an estimation rather than an absolute diagnosis. This approach offers valuable insight by providing continuous information on the probability of belonging to a particular category, thus enriching the interpretation of the obtained results.
3. Results
The results are displayed using both the balanced and imbalanced datasets. Furthermore, a comparative analysis is conducted through the AUC-ROC and illustrative graphs comparing the Framingham scale with the scores obtained by the models proposed in this research.
3.1. Framingham Risk Score
Before testing the proposed artificial intelligence models, the Framingham risk score, adjusted for the Colombian population [14], was applied to calculate the AUC-ROC for both the 847 instances in the unbalanced dataset and the 1177 instances in the balanced dataset. For the unbalanced dataset, the AUC was 0.538 (95% CI: 0.468–0.607), while for the balanced dataset, the AUC was 0.519 (95% CI: 0.482–0.551). Figure 5(a) (unbalanced data) illustrates that the Framingham method assigned high risk scores to individuals who did not experience a cardiovascular event, while individuals who did experience the event were assigned low scores, typically in the range of 0–0.2. The presence of patients who suffered a coronary event but whose risk scores do not accurately reflect their condition highlights the limited discriminatory power of the model.
[figure(s) omitted; refer to PDF]
3.2. Neural Networks
Table 2 presents the configurations of neural networks that achieved the top five results based on the AUC-ROC, for both the imbalanced dataset and the dataset balanced using the SMOTE-NC technique. For the imbalanced dataset, the highest AUC of 0.690 (95% CI: 0.622–0.759) was obtained using a tanh activation function, the Adam solver, a network architecture with three hidden layers containing 17, 11, and 18 neurons, and an alpha parameter of 0.1. For the balanced dataset, the highest AUC achieved was 0.677 (95% CI: 0.601–0.754) with a tanh activation function, the SGD solver, a network topology with two hidden layers containing 17 and 10 neurons, and an alpha parameter of 0.01. Figure 5(b) displays the distribution of estimations from one of the neural networks employed in this study, demonstrating superior performance compared to the Framingham score. However, it is important to note that the model exhibits certain limitations, suggesting the need for further refinement to enhance its accuracy in identifying individuals at risk.
Table 2
Results obtained by the top five estimation models using neural networks.
Activation function | Solver | Topology of hidden layers | Alpha | MAE | AUC-ROC (95% CI) |
Unbalanced dataset | |||||
Tanh | Adam | 17-11-18 | 0.1 | 0.138 | 0.690 (0.622–0.759) |
Logistic | Lbfgs | 3-8 | 0.1 | 0.141 | 0.666 (0.598–0.730) |
Tanh | Lbfgs | 20-19-16-4 | 0.1 | 0.134 | 0.665 (0.597–0.727) |
Logistic | Adam | 17-15-2-4 | 0.001 | 0.130 | 0.663 (0.593–0.730) |
Logistic | Adam | 12-20-8-8 | 0.01 | 0.137 | 0.662 (0.596–0.728) |
Balanced dataset | |||||
Tanh | sgd | 17-10 | 0.01 | 0.271 | 0.677 (0.601–0.754) |
Tanh | Adam | 9-12 | 1.0 | 0.248 | 0.675 (0.606–0.742) |
Tanh | Adam | 11-1 | 0.0001 | 0.251 | 0.671 (0.605–0.735) |
Tanh | Adam | 2-4 | 0.001 | 0.272 | 0.668 (0.602–0.733) |
Tanh | sgd | 19-7-16 | 0.1 | 0.263 | 0.661 (0.595–0.718) |
An analysis of the top five neural networks revealed that L2 regularization, controlled by the hyperparameter alpha, played a significant role in the performance of the models. The most accurate network, which reached an AUC-ROC of 0.690 (95% CI: 0.622–0.759), used an alpha value of 0.1. Other high-performing configurations used alpha values of 0.01, 0.001, and even 1.0, suggesting that moderate to strong levels of regularization contributed positively to model generalization. Notably, no top-performing models were associated with the lowest alpha value (0.0001), indicating that minimal regularization may not have been sufficient to prevent overfitting in this context. This behavior is consistent with the complexity of the models and the limited size of the dataset, where an adequate penalization of large weights helps prevent the model from fitting noise in the training data.
3.3. Decision Trees
Table 3 presents the selected hyperparameters for the decision trees that achieved the top five results based on the AUC-ROC. For the unbalanced dataset, a decision tree using the Poisson criterion, a maximum depth of 12, and a random splitter achieved an AUC of 0.637 (95% CI: 0.565–0.716), outperforming the Framingham score. Similarly, for the balanced dataset, a decision tree with the Poisson criterion, a maximum depth of 10, and the best splitter achieved an AUC-ROC of 0.656 (95% CI: 0.589–0.721). However, it is important to note that in both the unbalanced and balanced datasets, decision trees exhibited lower AUC values compared to those obtained using neural networks.
Table 3
Results obtained by the top five estimation models using different ML models.
ML models | Criterion | Maximum depth | Splitter | MAE | AUC-ROC (95% CI) | ||
Decision trees | Unbalanced dataset | ||||||
Poisson | 12 | Random | 0.133 | 0.637 (0.565–0.716) | |||
Poisson | 11 | Best | 0.126 | 0.627 (0.562–0.687) | |||
Poisson | 11 | Random | 0.142 | 0.627 (0.556–0.692) | |||
Poisson | 10 | Best | 0.127 | 0.624 (0.559–0.684) | |||
Poisson | 13 | Random | 0.137 | 0.620 (0.554–0.690) | |||
Balanced dataset | |||||||
Poisson | 10 | Best | 0.192 | 0.656 (0.589–0.721) | |||
Poisson | 13 | Best | 0.181 | 0.649 (0.583–0.714) | |||
Poisson | 16 | Best | 0.176 | 0.648 (0.581–0.713) | |||
Poisson | 17 | Best | 0.175 | 0.648 (0.581–0.713) | |||
Poisson | 14 | Best | 0.181 | 0.648 (0.581–0.710) | |||
Kernel | C | Gamma | Coef 0 | Shrinking | MAE | AUC-ROC (95% CI) | |
Support vector machines | Unbalanced dataset | ||||||
Sigmoid | 0.67 | 0.01 | −11.1 | True | 0.159 | 0.648 (0.576–0.718) | |
Sigmoid | 0.67 | 0.01 | −11.1 | False | 0.159 | 0.648 (0.576–0.718) | |
Sigmoid | 0.51 | 0.01 | −11.1 | True | 0.159 | 0.648 (0.576–0.718) | |
Sigmoid | 0.59 | 0.01 | −11.1 | False | 0.159 | 0.648 (0.576–0.718) | |
Sigmoid | 0.59 | 0.01 | −11.1 | False | 0.159 | 0.648 (0.576–0.718) | |
Balanced dataset | |||||||
Sigmoid | 0.75 | 0.01 | −11.1 | False | 0.159 | 0.651 (0.577–0.721) | |
Sigmoid | 0.75 | 0.01 | −11.1 | True | 0.159 | 0.651 (0.577–0.721) | |
Sigmoid | 0.34 | 0.01 | −11.1 | True | 0.159 | 0.651 (0.577–0.721) | |
Sigmoid | 0.34 | 0.01 | −11.1 | False | 0.159 | 0.651 (0.577–0.721) | |
Sigmoid | 0.42 | 0.01 | −11.1 | False | 0.159 | 0.651 (0.577–0.721) | |
Criterion | Maximum depth | Number of trees | MAE | AUC-ROC (95% CI) | |||
Random forests | Unbalanced dataset | ||||||
Friedman_mse | 10 | 10 | 0.145 | 0.578 (0.501–0.661) | |||
Squared_error | 10 | 10 | 0.145 | 0.578 (0.501–0.661) | |||
Friedman_mse | 10 | 30 | 0.143 | 0.572 (0.488–0.656) | |||
Squared_error | 10 | 30 | 0.143 | 0.572 (0.488–0.656) | |||
Squared_error | 10 | 220 | 0.145 | 0.568 (0.482–0.654) | |||
Balanced dataset | |||||||
Squared_error | 10 | 20 | 0.199 | 0.607 (0.533–0.682) | |||
Friedman_mse | 10 | 20 | 0.199 | 0.607 (0.533–0.682) | |||
Squared_error | 10 | 50 | 0.203 | 0.602 (0.527–0.674) | |||
Friedman_mse | 10 | 50 | 0.203 | 0.602 (0.527–0.674) | |||
Absolute error | 173 | 130 | 0.224 | 0.601 (0.523–0.677) |
The feature importance tool provided by the scikit-learn library was utilized to identify the attributes that play a more significant role in making estimations through this artificial intelligence technique. The most determining variables in the assessment of cardiovascular risk, listed in decreasing order of importance, were age, gender, triglycerides, height, weight, medical history, diastolic blood pressure, LDL, systolic blood pressure, cholesterol, smoking, diabetes, and HDL.
3.4. SVMs
Table 3 presents the key hyperparameters associated with the SVM technique that yielded the top five results based on the AUC-ROC. For the imbalanced dataset, the SVM with a sigmoid kernel, a regularization parameter C of 0.67, a gamma coefficient of 0.01, a coef0 of −11.1, and the shrinking hyperparameter set to true, achieved an AUC of 0.648 (95% CI: 0.576–0.718), demonstrating strong performance. In contrast, for the balanced dataset, the configuration of the SVM with a sigmoid kernel, a regularization parameter C of 0.75, a gamma coefficient of 0.01, a coef0 of −11.1, and the shrinking hyperparameter set to false, yielded an AUC of 0.651 (95% CI: 0.577–0.721), indicating its effectiveness in handling balanced data.
3.5. Random Forests
Table 3 presents the hyperparameters corresponding to the top five results based on the AUC-ROC when applying the random forest technique to both the imbalanced and balanced datasets. The optimal performance, achieved with the balanced dataset, results in an AUC-ROC of 0.607 (95% CI: 0.533–0.682). This outcome is obtained using 20 trees, a maximum depth of 10, and square error to evaluate the quality of the splits. In contrast, an anomalous behavior was observed when using the imbalanced dataset, where there was a consistent repetition of estimation values, which undermined the reliability of the model for this research. Consequently, this led to the lowest AUC-ROC recorded among all the techniques evaluated, with a value of 0.578 (95% CI: 0.501–0.661), which is comparable to the results obtained using the Framingham method.
3.6. Gaussian Bayesian Networks
Table 4 presents the top five results based on the AUC-ROC for the application of Bayesian networks to both the imbalanced and balanced datasets. In the case of the imbalanced dataset, a unique hyperparameter stands out, revealing multiple configurations that achieve the same AUC value of 0.593 (95% CI: 0.522–0.667). For the balanced dataset, an AUC-ROC of 0.579 (95% CI: 0.517–0.648) is obtained using different configurations of the smoothing parameter.
Table 4
Results obtained by the top five estimation models using Bayesian networks.
Smoothing variable | MAE | AUC-ROC (95% CI) |
Unbalanced dataset | ||
1.00e − 07 | 0.105 | 0.593 (0.522–0.667) |
1.00e − 12 | 0.105 | 0.593 (0.522–0.667) |
1.02e − 12 | 0.105 | 0.593 (0.522–0.667) |
1.01e − 12 | 0.105 | 0.593 (0.522–0.667) |
1.03e − 12 | 0.105 | 0.593 (0.522–0.667) |
Balanced dataset | ||
1.00e − 07 | 0.312 | 0.579 (0.517–0.648) |
1.00e − 12 | 0.312 | 0.579 (0.517–0.648) |
1.01e − 11 | 0.312 | 0.579 (0.517–0.648) |
1.01e − 12 | 0.312 | 0.579 (0.517–0.648) |
1.02e − 12 | 0.312 | 0.579 (0.517–0.648) |
4. Discussion
In general, the five ML techniques evaluated in this study for cardiovascular risk assessment exhibited an acceptable ability to discriminate between patients who experienced cardiovascular events and those who did not, surpassing the performance of the widely used Framingham scale. Among these, neural networks stood out, achieving an AUC-ROC of 0.690 (95% CI: 0.622–0.759). This study constitutes a significant contribution in the Colombian context, as it is the first to address the challenge of cardiovascular risk estimation using artificial intelligence techniques. The results are promising and suggest a potential for meaningful impact on clinical decision-making.
The outcomes elucidate the pioneering potential of artificial intelligence in addressing intricate medical challenges such as cardiovascular prediction. Our findings align with a recent meta-analysis conducted by Liu et al. [23], which juxtaposed ML against conventional methodologies for forecasting atherosclerotic cardiovascular risk in primary prevention cohorts. The meta-analysis concluded that ML models exhibit statistically superior discriminative capability, as quantified by Harrell’s C statistic, compared to traditional risk assessment tools. This observation remained robust across varying levels of bias risk. However, the assessment of calibration and net reclassification improvement was hindered by the absence of calibration metrics in several studies.
Although the AUC-ROC values obtained in this study are generally lower than those reported in other research [17–19, 24–34], in several cases, they reach comparable levels. For example, Quesada et al. [29] reported an AUC-ROC of 0.708 with Bayesian networks and 0.704 with neural networks in a Spanish population. In our study, the application of neural networks to a Colombian population resulted in an AUC-ROC of 0.690 (95% CI: 0.622–0.759), a value close to that reported in the Spanish context. Likewise, Alaa et al. [17] reported an AUC-ROC of 0.774 using SVMs, random forests, and neural networks on the UK population. Although our values are lower, they reflect the potential of these ML techniques when applied to local data, considering differences in demographic and clinical characteristics across populations.
Finally, as mentioned earlier, there are very few works where any kind of software is developed to enable qualified medical personnel to use the proposed models, limiting the practical application of artificial intelligence models for decision-making in the medical field. To address this limitation, a web application for estimating cardiovascular risk in the Colombian population was proposed. As an integral part of this study, a web application specifically designed for healthcare professionals has been conceptualized and developed. The purpose of this application is to provide healthcare professionals with a simple tool to use the neural network model (that demonstrated the best discrimination capacity) for cardiovascular risk estimation.
Figure 6 depicts the home page of the application, where healthcare professionals can input the 13 independent variables used as input to the model, detailed in Table 1. After reading and accepting the privacy policy, the healthcare professional can click the “generate estimation” button. As a result, the score obtained by the neural network is displayed on the right side of the screen. In the visual example shown in Figure 6, this value is 0.799. Additionally, the value obtained according to the Framingham scale is provided, offering the medical professional additional information to support the decision-making process. This comprehensive approach not only facilitates the interpretation of the result provided by the neural network but also enables healthcare professionals to compare and contextualize scores in relation to the reference established by the Framingham scale, thus enhancing the analysis and assessment of the patient’s cardiovascular risk.
[figure(s) omitted; refer to PDF]
There are some limitations that need to be recognized. First, our study was developed with data from patients followed up 2 decades ago, so our results need to be externally validated in different settings that represent the actual conditions of the Colombian health system. Beside, a direct comparison with currently used models such as the ACC/AHA ASCVD score [7] and the SCORE2 [11] are needed to confirm our conclusions. Second, the ML techniques used may not behave similarly in new populations, as they tend to overfit the original data, so external validation is also needed in contemporary settings. Finally, our data do not have a significant representation of certain ethnic groups with high prevalence in Colombia, for example Afro-Colombian or indigenous populations represent 14% of the Colombian population but are not represented in our study, so our data cannot be used in these populations without prior verification of our conclusions. Indeed, there is still work to be performed to ensure that they are suitable for implementation, including new studies developed in prospective cohort studies, nonrandomized controlled trials, among others.
5. Conclusions
In conclusion, our study supports the notion that flexible ML approaches for cardiovascular risk prediction could be the way for enhanced cardiovascular risk assessment in Colombia, taking advantage of an increasingly data-rich world. Our data even suggest that risk prediction using these techniques could be even more discriminative than widely used risk-stimulation models such as Framingham’s, adapted to the Colombian population. However, new prospective studies need to validate our data before generalized implementation.
Author Contributions
Jared Agudelo: conceptualization, formal analysis, investigation, writing – original draft, and writing – review and editing.
Oscar Bedoya: data curation, formal analysis, investigation, writing – original draft, and writing – review and editing.
Oscar Muñoz-Velandia: conceptualization, data curation, formal analysis, investigation, and writing – review and editing.
Kevin David Rodriguez Belalcazar: data curation, formal analysis, investigation, writing – original draft, and writing – review and editing.
Alvaro Ruiz-Morales: data curation, investigation, and writing – review and editing.
Funding
No funding was received for this research.
[1] P. W. F. Wilson, R. B. D’Agostino, D. Levy, A. M. Belanger, H. Silbershatz, W. B. Kannel, "Prediction of Coronary Heart Disease Using Risk Factor Categories," Circulation, vol. 97 no. 18, pp. 1837-1847, DOI: 10.1161/01.cir.97.18.1837, 1998.
[2] J. A. A. G. Damen, L. Hooft, E. Schuit, "Prediction Models for Cardiovascular Disease Risk in the General Population: Systematic Review," BMJ, vol. 353,DOI: 10.1136/bmj.i2416, 2016.
[3] M. Cortes-Bergoder, R. J. Thomas, F. N. Albuquerque, "Validity of Cardiovascular Risk Prediction Models in Latin America and Among Hispanics in the United States of America: A Systematic Review," Revista Panamericana de Salud Publica/Pan American Journal of Public Health, vol. 32 no. 2, pp. 131-139, DOI: 10.1590/S1020-49892012000800007, 2012.
[4] P. M. Ridker, M. Cushman, M. J. Stampfer, R. P. Tracy, C. H. Hennekens, "Inflammation, Aspirin, and the Risk of Cardiovascular Disease in Apparently Healthy Men," New England Journal of Medicine, vol. 336 no. 14, pp. 973-979, DOI: 10.1056/NEJM199704033361401, 1997.
[5] J. R. Banegas, F. Villar, A. Graciani, F. Rodríguez-Artalejo, "Epidemiología de las Enfermedades Cardiovasculares en España," Revista Española de Cardiología Suplementos, vol. 6 no. 7, pp. 3G-12G, DOI: 10.1016/s1131-3587(06)75324-9, 2006.
[6] M. T. Cooney, A. L. Dudina, I. M. Graham, "Value and Limitations of Existing Scores for the Assessment of Cardiovascular Risk," Journal of the American College of Cardiology, vol. 54 no. 14, pp. 1209-1227, DOI: 10.1016/j.jacc.2009.07.020, 2009.
[7] D. C. Goff, D. M. Lloyd-Jones, G. Bennett, "ACC/AHA Guideline on the Assessment of Cardiovascular Risk," Journal of the American College of Cardiology, vol. 63 no. 25, 2013.
[8] N. R. Cook, N. P. Paynter, C. B. Eaton, "Comparison of the Framingham and Reynolds Risk Scores for Global Cardiovascular Risk Prediction in the Multiethnic Women’s Health Initiative," Circulation, vol. 125 no. 14, pp. 1748-1756, DOI: 10.1161/circulationaha.111.075929, 2012.
[9] G. Assmann, P. Cullen, H. Schulte, "Simple Scoring Scheme for Calculating the Risk of Acute Coronary Events Based on the 10-Year Follow-Up of the Prospective Cardiovascular Münster (PROCAM) Study," Circulation, vol. 105 no. 3, pp. 310-315, DOI: 10.1161/hc0302.102575, 2002.
[10] R. Conroy, "Estimation of Ten-Year Risk of Fatal Cardiovascular Disease in Europe: The SCORE Project," European Heart Journal, vol. 24 no. 11, pp. 987-1003, DOI: 10.1016/s0195-668x(03)00114-3, 2003.
[11] S. Hageman, L. Pennells, F. Ojeda, "SCORE2 Risk Prediction Algorithms: New Models to Estimate 10-Year Risk of Cardiovascular Disease in Europe," European Heart Journal, vol. 42 no. 25, pp. 2439-2454, DOI: 10.1093/eurheartj/ehab309, 2021.
[12] K. P. Kresoja, M. Unterhuber, R. Wachter, H. Thiele, P. Lurz, "A Cardiologist’s Guide to Machine Learning in Cardiovascular Disease Prognosis Prediction," Basic Research in Cardiology, vol. 118 no. 1,DOI: 10.1007/s00395-023-00982-7, 2023.
[13] D. Zhao, J. Liu, W. Xie, Y. Qi, "Cardiovascular Risk Assessment: A Global Perspective," Nature Reviews Cardiology, vol. 12 no. 5, pp. 301-311, DOI: 10.1038/nrcardio.2015.28, 2015.
[14] O. M. Muñoz, N. I. Rodríguez, Á. Ruiz, M. Rondón, "Validación de los Modelos de Predicción de Framingham y PROCAM Como Estimadores del Riesgo Cardiovascular en una Población Colombiana," Revista Colombiana de Cardiología, vol. 21 no. 4, pp. 202-212, 2014.
[15] C. D. Rodríguez-Ariza, A. Cabrera-Villamizar, A. L. Rodríguez-Pulido, "External Validation of the ACC/AHA ASCVD Risk Score in a Colombian Population Cohort," Scientific Reports, vol. 13 no. 1,DOI: 10.1038/s41598-023-32668-4, 2023.
[16] Cohorts Consortium of Latin America and the Caribbean Cc-Lac, "Derivation, Internal Validation, and Recalibration of a Cardiovascular Risk Score for Latin America and the Caribbean (Globorisk-LAC): A Pooled Analysis of Cohort Studies," Lancet Regional Health: Americas, vol. 9,DOI: 10.1016/j.lana.2022.100258, 2022.
[17] A. M. Alaa, T. Bolton, E. Di Angelantonio, J. H. F. Rudd, M. van der Schaar, "Cardiovascular Disease Risk Prediction Using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants," PLoS One, vol. 14 no. 5,DOI: 10.1371/journal.pone.0213653, 2019.
[18] Y. Li, M. Sperrin, D. M. Ashcroft, T. P. van Staa, "Consistency of Variety of Machine Learning and Statistical Models in Predicting Clinical Risks of Individual Patients: Longitudinal Cohort Study Using Cardiovascular Disease as Exemplar," BMJ, vol. 371,DOI: 10.1136/bmj.m3919, 2020.
[19] A. C. Dimopoulos, M. Nikolaidou, F. F. Caballero, "Machine Learning Methodologies Versus Cardiovascular Risk Scores, in Predicting Disease Risk," BMC Medical Research Methodology, vol. 18 no. 1,DOI: 10.1186/s12874-018-0644-1, 2018.
[20] F. Commandeur, P. J. Slomka, M. Goeller, "Machine Learning to Predict the Long-Term Risk of Myocardial Infarction and Cardiac Death Based on Clinical Risk, Coronary Calcium, and Epicardial Adipose Tissue: A Prospective Study," Cardiovascular Research, vol. 116 no. 14, pp. 2216-2225, DOI: 10.1093/cvr/cvz321, 2020.
[21] M. Mukherjee, M. Khushi, "SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features," ASI, vol. 4 no. 1,DOI: 10.3390/asi4010018, 2021.
[22] F. Pedregosa, G. Varoquaux, A. Gramfort, "Scikit-Learn: Machine Learning in Python," Journal of Machine Learning Research, vol. 12 no. Oct, pp. 2825-2830, 2011.
[23] W. Liu, L. Laranjo, H. Klimis, "Machine-Learning versus Traditional Approaches for Atherosclerotic Cardiovascular Risk Prognostication in Primary Prevention Cohorts: a Systematic Review and Meta-Analysis," European heart journal. Quality of care & clinical outcomes, vol. 9 no. 4, pp. 310-322, DOI: 10.1093/ehjqcco/qcad017, 2023.
[24] B. Ambale-Venkatesh, X. Yang, C. O. Wu, "Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis," Circulation Research, vol. 121 no. 9, pp. 1092-1101, DOI: 10.1161/circresaha.117.311312, 2017.
[25] S. Barbieri, S. Mehta, B. Wu, "Predicting Cardiovascular Risk From National Administrative Databases Using a Combined Survival Analysis and Deep Learning Approach," International Journal of Epidemiology, vol. 51 no. 3, pp. 931-944, DOI: 10.1093/ije/dyab258, 2022.
[26] I. A. Kakadiaris, M. Vrigkas, A. A. Yen, T. Kuznetsova, M. Budoff, M. Naghavi, "Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA," Journal of the American Heart Association, vol. 7 no. 22,DOI: 10.1161/jaha.118.009476, 2018.
[27] E. H. Kennedy, W. L. Wiitala, R. A. Hayward, J. B. Sussman, "Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data," Medical Care, vol. 51 no. 3, pp. 251-258, DOI: 10.1097/mlr.0b013e31827da594, 2013.
[28] R. Nakanishi, P. J. Slomka, R. Rios, "Machine Learning Adds to Clinical and CAC Assessments in Predicting 10-Year CHD and CVD Deaths," Journal of the American College of Cardiology: Cardiovascular Imaging, vol. 14 no. 3, pp. 615-625, DOI: 10.1016/j.jcmg.2020.08.024, 2021.
[29] J. A. Quesada, A. Lopez-Pineda, V. F. Gil-Guillén, "Machine Learning to Predict Cardiovascular Risk," International Journal of Clinical Practice, vol. 73 no. 10,DOI: 10.1111/ijcp.13389, 2019.
[30] P. Unnikrishnan, D. K. Kumar, S. Poosapadi Arjunan, H. Kumar, P. Mitchell, R. Kawasaki, "Development of Health Parameter Model for Risk Prediction of CVD Using SVM," Computational and Mathematical Methods in Medicine, vol. 2016,DOI: 10.1155/2016/3016245, 2016.
[31] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, N. Qureshi, "Can Machine-Learning Improve Cardiovascular Risk Prediction Using Routine Clinical Data?," PLoS One, vol. 12 no. 4,DOI: 10.1371/journal.pone.0174944, 2017.
[32] R. Voss, P. Cullen, H. Schulte, G. Assmann, "Prediction of Risk of Coronary Events in Middle-Aged Men in the Prospective Cardiovascular Münster Study (PROCAM) Using Neural Networks," International Journal of Epidemiology, vol. 31 no. 6, pp. 1253-1262, DOI: 10.1093/ije/31.6.1253, 2002.
[33] J. You, Y. Guo, J. J. Kang, "Development of Machine Learning-Based Models to Predict 10-Year Risk of Cardiovascular Disease: A Prospective Cohort Study," Stroke and Vascular Neurology, vol. 8 no. 6, pp. 475-485, DOI: 10.1136/svn-2023-002332, 2023.
[34] J. Zhao, Q. Feng, P. Wu, "Learning From Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction," Scientific Reports, vol. 9 no. 1,DOI: 10.1038/s41598-018-36745-x, 2019.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2025 Jared Agudelo et al. Cardiology Research and Practice published by John Wiley & Sons Ltd. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Introduction: There is no information on the potential of machine learning (ML)–based techniques to improve cardiovascular risk estimation in the Colombian population. This article presents innovative models using five artificial intelligence techniques: neural networks, decision trees, support vector machines, random forests, and Gaussian Bayesian networks.
Methods: The research is based on a cohort of 847 patients free of cardiovascular disease at baseline and followed for cardiovascular disease events over 10 years at the Central Military Hospital in Bogotá, Colombia. To enhance the robustness and reduce the risk of overfitting, model evaluation was conducted using a 5-fold cross-validation on the entire dataset. Discriminatory ability was evaluated with the area under a ROC curve (AUC-ROC) for each ML-based model and the Framingham model.
Results: Experimental results showed that the neural network technique had the best discriminative ability to predict cardiovascular events, with an AUC-ROC of 0.69 (CI 95% 0.622–0.759) for unbalanced data and 0.67 (CI 95% 0.601–0.754) for balanced data. Other ML techniques also showed good discriminatory ability with AUC-ROC values between 0.56 and 0.65, superior to that observed for the Framingham model (0.53; CI 95% 0.468–0.607).
Conclusion: Our study supports the flexible ML approaches to cardiovascular risk prediction as a way forward for cardiovascular risk assessment in Colombia. Our data even suggest that risk prediction using these techniques could be even more discriminative than widely used risk-stimulation models such as Framingham, adapted to the Colombian population. However, new prospective studies need to validate our data before general implementation.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Department of Internal Medicine Universidad Libre Cali Colombia
2 Department of Systems Engineering and Computer Science Universidad del Valle Cali Colombia
3 Department on Internal Medicine Pontificia Universidad Javeriana Hospital Universitario San Ignacio Bogotá Colombia
4 Department of Clinical Epidemiology and Biostatistics Pontificia Universidad Javeriana Bogotá Colombia