Introduction
Diabetes mellitus (DM) constitutes a chronic and long-term disease responsible for morbimortality worldwide; in 2019, it was reported to be accountable for 1.5 million direct deaths and 460,000 indirect, mainly through kidney disease complications [1]. Additionally, it is estimated that 534.6 million people were suffering from DM in 2021, and the number of people with this condition will rise to 783.2 million by 2045 [2]. DM significantly increases medical costs, causes loss of productivity, premature mortality, and other effects such as reduced quality of life [3]. For example, in 2019, diabetes-related healthcare expenditure was around USD 760 billion for adults aged 20–79 years worldwide, and it will be approximately USD 845 billion by 2045 [4].
Healthcare systems implement comprehensive care pathways through evidence-based Clinical Practice Guidelines (CPG), and adherence to these recommendations by patients and their families, physicians, and institutions is critical to reducing DM’s short- and long-term complications. However, several barriers have been identified, such as health system management, lack of clarity and credibility of guidelines, knowledge of the health professionals, and patient’s knowledge and sociocultural beliefs [5]. To tackle barriers from an institutional flank, it is possible to develop clinical tools that include structured information and unstructured data from Electronic Health Records (EHR) to support clinical decision-making to predict DM progression.
The literature usually develops predictive models using socio-demographic characteristics and biomarkers as standard predictors of diabetes-CKD progression [6–14]. These models typically consider only progression to CKD or development of some particular complication, but do not consider whether the patient is meeting HbA1c targets. Moreover, these models typically do not consider the health professional’s adherence to CPG recommendations as an attribute, especially in recommendations on lifestyle changes and patients’ attitudes and support networks. This is because, among other things, health professional and patient adherence is typically registered within the EHR’s free text (instead of structured fields). The closest study, extracts information on past history of diseases in the context of a CKD progression model [15].
Our objectives are two. First, using Natural Language Processing (NLP) techniques and data from the EHR of a Health Maintenance Organization (HMO) in Colombia, we reached information not only on the patient’s characteristics but also on their pharmacological compliance and the professional’s adherence to the care pathways (non-pharmacological and pharmacological recommendations). Second, we develop machine learning (ML) models to estimate the progression of DM, in terms of metabolic goals and the development of complications, at one and two years and to determine critical variables in such transitions.
Research design and methods
A retrospective cohort study was conducted. Population belongs to a Colombian HMO with extensive coverage in the national territory. EHR records of patients with a confirmed diagnosis of DM and with at least one Glycosylated hemoglobin (HbA1c) measurement for 2018, 2019, and 2020 were included. We utilized ICD-10 codes registered in the EHR to identify patients with a diagnosis of diabetes mellitus type 1 or type 2, excluding gestational diabetes cases. Furthermore, to ensure accuracy, the status of these diagnoses was cross-checked by the auditing department of the Health Maintenance Organization (HMO). We received approval by UNISANITAS Ethics and Research Board (CEIFUS 2116–21; October 29, 2021). Consent was not obtained as the data was analyzed anonymously, and was provided by the insurance company on March 25, 2022.
Transition model
We proposed a theoretical disease progression model with four stages based on metabolic goals defined by HbA1c levels (<6.5, <7.0, <7.5 according to patients’ characteristics) suggested by the American Diabetes Association standards and the presence of chronic complications (retinopathy, cerebrovascular disease, and chronic kidney disease) [16]. According to the model, a patient with DM can be in four basic stages at the beginning of follow-up: 1. Within HbA1c goals without complications (ON-NOT), 2. Outside goals without complications (OUT-NOT), 3. Within goals, but with complications (ON-YES), and 4. Out of goals and with complications (OUT-YES). The patients can move from stages with no complications to stages with complications in a unidirectional way; that is, once they develop one of the complications, they are in the stage with complications and cannot return to a stage without complications. In contrast, the transitions between the stages on and off-goals are bidirectional since it is determined by their HbA1c levels, as indicated by the arrows in Fig 1. Likewise, a patient might remain in the same stage during the period.
[Figure omitted. See PDF.]
Data
The primary database comprised electronic health records (EHRs) that provided comprehensive information on outpatient care. This included demographic details such as age and sex, as well as complications like acute myocardial infarction, heart failure, peripheral vascular disease, chronic kidney disease, retinopathy, arrhythmias, and chronic obstructive pulmonary disease (COPD). Complications were documented using the International Statistical Classification of Diseases and Related Health Problems (ICD-10) (see Supplemental Material S1 File for the exact codes). It also covered measurements from physical examinations, including systolic and diastolic blood pressure, weight, height, and body mass index (BMI). Laboratory test results were part of the dataset, featuring HbA1c, LDL cholesterol, creatinine, and estimated glomerular filtration rate (eGFR) levels. The database also included information on diabetes mellitus treatment drugs, such as oral hypoglycemics and insulins, and concomitant treatments like analgesics, antacids, anticoagulants, antihypertensives, and lipid-lowering medications. Furthermore, the data encompassed referrals to specialists in ophthalmology, nutrition, psychology, and social services. It tracked professional adherence to personalized HbA1c goals, blood pressure goals, and dyslipidemia management. Additionally, free text information might provide insights into patient compliance with medication and adherence to medical recommendations for non-pharmacological interventions, including nutritional advice, physical exercise, and cessation of alcohol and tobacco use.
The information was recorded in a combination of standardized forms (structured data) and free-text fields (unstructured data), or both. Hence, before estimating transition models, we need first to consolidate a structured dataset. Table 1 presents the variables that were selected for the estimation of transition models due to their availability and relevance according to the CPG, and the source of information used to construct those variables.
[Figure omitted. See PDF.]
Deriving structured data from free text fields: the NLP pipeline
As an example of the goal of this data processing step, the Table 2 shows the information from the medical consultation of a patient. Originally, this patient had no data associated with complications or drugs in the EHR. However, free-text information has detailed information on both fields as well as information on non-adherence to drug therapy. Hence, we use NLP to classify the patient and to assemble the information on the missing fields: the patient has hypertension (bold and italic text) and is under several drugs that are typically used to treat diabetes or that are relevant for choosing a particular treatment strategy according to the guidelines (italic text). Notice that the words that indicate either health conditions or drugs may differ from the ‘tags’ that we use to identify the condition. For instance, the free text mentions “HBP” (high blood pressure) instead of using the word “hypertension”. For the case of adherence to drug therapy (bold text), the procedure is more elaborated than searching for specific words, and the algorithm for assigning the indicator needs validation by a clinical expert.
[Figure omitted. See PDF.]
We follow a three-step methodology to assign the corresponding labels to the unstructured data. We start by “cleaning” and “preparing” the data set (pre-processing). This involved removing URLs and special characters and digits, converting the text to lowercase, and removing stop words. Specific details of the pre-processing step are presented in supplemental material S2 File. Then, we extracted the relevant characteristics to classify the information (characterization). Next, we inferred the labels assigned to the records of interest (classification). This process creates new data, initially absent in the structured EHR fields, and then integrates it into the dataset.
For the characterization and classification steps, we used two approaches to process the data: a simple search using a pipeline in which labels are generated from patterns in the free text and the Bag of Words (BoW) method [17,18]. The BoW method is ideal for scenarios where the simple search might not be straightforward (for instance, to detect if the physician considers that there is no pharmacological adherence) but where we can still rely on a pre-established “dictionary”, a set of words that are relevant. This is the case as clinical literature uses relatively standard terms that indicate health conditions, treatments, and medications, among others. When such a dictionary is not available, NLP techniques should rely on methods that guess the most relevant words from the free text (for instance the Term Frequency-Inverse Document Frequency TF-iDF method).
We created a custom medical dictionary to strengthen spelling correction, lemmatization, and stemming stages for text analysis. To construct it, we reviewed medical literature, including GPC and research articles, to ensure the vocabulary was complete and relevant to DM. Terms were carefully selected to include standard medical nomenclature, common variants, and colloquial terms that might appear in patient-reported data or EHR. The dictionary was validated by engineers and clinicians and tailored to the specific context of the DM. The Supplemental Material S2 File presents the details of how the BoW is implemented and the pre-processing steps involved.
With complete information on every field of interest in the database, we evaluated the level of adherence to the CPG recommendations. We assessed adherence in three key aspects: pharmacological, nutritional, and lifestyle recommendations such as tobacco and alcohol consumption. For medications, we considered whether the patients received adequate prescriptions for hypoglycemic agents, antihypertensives, and drugs to control cholesterolemia. Treatment inertia, the lack of a support, the attitude of the patient, or the incorrect prescription explain the discrepancies between the CPG algorithms and what is reported in the EHR. Again, we used NLP to examine these inconsistencies and have more precise patient adherence numbers.
Estimation of transition models
Third, to estimate the predictive model of DM progression, we followed an architecture corresponding to a nested tree model in which one multiclass classification problem is transformed into two levels of binary classifications [19,20]. In particular, outcomes are defined as the stages in the model of Fig 1. Thus, we train two models per stage and time horizon, except for the stages ON-YES and OUT-YES, which only predict if patients will be in or out of goals. We had twelve different models since we predicted one and two years forward. Initially, we modeled the transitions in a multinomial setting, but the best results regarding the metrics described above were attained with the present modeling strategy. We compared the performance of several statistical models, including ML algorithms, as alternatives. In particular, we used K-nearest neighbors (KNN), logistic regression (LR), decision trees (DT), random forest (RF), neural networks (NN), and boosting (Boost).
The selection of ML models was guided by the characteristics of the dataset and the complexity of predicting diabetes progression. Specifically, LR was included due to its interpretability, making it a valuable tool for understanding the impact of individual predictors in binary classification tasks. Additionally, RF was chosen for its robustness to overfitting and ability to handle high-dimensional clinical data with complex interactions, which are prevalent in disease progression modeling. These models complement other approaches, such as KNN for benchmarking, DT for interpretability, NN for capturing nonlinear relationships and Boosting methods for handling imbalanced datasets. Supplemental material S3 File provides a detailed justification for selecting each model.
We profited from Pycaret (an ML library in Python) for automatizing part of the preprocessing tasks and for comparing performances across models. Pycaret allowed us to build the entire ML pipeline with minimal coding and to compare the performance across architectures.
For the preprocessing stage, we oversampled unbalanced classes and performed a feature selection that removed variables with low variance and that were highly collinear. In a nutshell, iteratively runs a LightGradientBoosting Machine model on different sets of characteristics to pick the most relevant ones for classification.
In the next step, the pipeline sets up a grid of models, where each model has a parameter grid, so we perform a grid search across both models and hyperparameters to determine which is the best predictor for a specific transition and outcome. We trained twenty models with a 10-fold cross-validation strategy (StratifiedKFold) and compared their performance in the validation set. This approach ensures that the model is trained on various data set partitions while maintaining the proportion of classes in each fold. In this case, cross-validation aimed to select hyperparameters and evaluate the model’s generalization. Upon completion of the ten iterations, the model’s performance at each fold is averaged to obtain an overall estimate of its performance on the data. This strategy ensures that the results reflect an average of several data set splits, providing a more robust metric. To determine the most adequate model in the gid search we considered as performances metrics the accuracy, the F1-score, and the Area Under the Curve (AUC). Among them, we privilege the F1-score. Considering the performance metrics, a common scenario when considering imbalanced datasets is that the F1 score exhibits proficiency, while the AUC appears suboptimal. The AUC can yield a lower value due to challenges in effectively distinguishing the minority class amidst the dominance of the majority class. Supplemental material 3 includes more details about the process and alternative strategies that we considered.
Notwithstanding Pycaret’s ease of use, its performance with some models was quite low, so we turn to model a neural network with three fully connected layers, each one with 256 neurons and Rectified Linear Unit activation functions (ReLU). In between each layer we input a dropout layer that helps us regularize the weights and tackle overfitting. The final layer has a sigmoid activation function for modeling the transition probability. For each model we design a hyperparameter grid that varies the regularization parameter, the learning rate, and the epochs. Finally, we estimated marginal changes in the predicted transition probability to determine the relevance of variables included in the training.
Results
Population description, NLP results, and transitions
The original dataset started with 75,714 patients. Of those, only 25,320 had HbA1C measurements in the three study years, even though CPG suggest at least two measurements per year. Of those, a total 23,802 patients were included in the analysis as the remainder had missing information in some the predictor variables. These patients were dynamically distributed in the four stages (Table 3). The median age of the patients and the proportion of women in the four stages are similar. Some differences in biomarkers and results of physical examinations are explained by being at different stages. Interestingly, the proportion of compliant patients with pharmacological treatment is lower in the better scenarios (OUT-YES and ON-YES) than in the worst settings (ON-NOT and OUT-NOT). As expected, the proportion of patients treated by a professional who adhere to the HbA1c guideline is higher among those who are ON-goals than those who are out of goals. The proportion of patients receiving non-pharmacological recommendations is similar for patients ON-goals (independent of whether they have complications) and lower for those OUT-goals. In contrast, tobacco cessation recommendations seem more focused on patients with complications than patients without complications, regardless of their metabolic goals stage.
[Figure omitted. See PDF.]
Using NLP, we found that 99% of patients received nutritional recommendations, 96% received advice on physical activity, 74% on alcohol consumption, and only 14% on tobacco use. In addition, NLP helps identify text patterns when analyzing professional adherence to the metabolic control guidelines algorithms. For example, Table 4 shows how the classification of patients receiving hypoglycemic treatment improved significantly after NLP (see differences between panels A and B in Table 4). Ideally, all patients would be on the diagonal of the matrix. However, in real clinical practice, patients may be reluctant to accept prescriptions, physicians may deviate from the CPG, or there may be treatment inertia instead of the CPG recommendations. With NLP, we reduced the number of patients out of the diagonal due to the lack of structured data.
[Figure omitted. See PDF.]
In general, for one-year transitions, patients tend to remain at their initial stage (see panel A in Table 5). However, at the two-year follow-up, patients without complications moved to other stages during the first year, while the transitions for patients with complications were stable (see panel B in Table 5). In particular, 18.1% of patients in the ON-YES stage moved out of goals after one year and 19% after two years. In addition, 26.9% of those who start in the OUT-YES stage tend to return to metabolic goals in the first year, and 32.9% do so in the second year, transitioning to the ON-YES stage. The second most common scenario for patients without complications and within goals is to develop some complications at one year (47.7% remain in ON-NOT, and 34.6% move to ON-YES). Analyzing two years, the scenario where the patients have complications becomes the most feasible: 52% move to the ON-YES stage, and only 26.8% remain in the ON-NOT stage. The group of patients initially in the OUT-NOT stage showed the most profound changes in transition patterns. In one year, 16.1% of them achieved goals, 40.8% remained in the same conditions, 11.9% only developed complications, and 31.1% moved to the worst scenario of OUT-YES. At two years, these percentages are 11.5%, 21%, 22.3%, and 45%, respectively.
[Figure omitted. See PDF.]
ML prediction models
Table 6 presents metrics for training and testing datasets for the set of the best algorithms according to their F1 score in the testing data, for each of the 12 models. Fig 2 illustrates the F1 scores for each of the models. Conditional to start in the better scenarios (ON-goals and NOT complications), the NN models have acceptable performance in predicting OUT-goals transition at one and two years (F1-scores between 0.72 and 0.76 in Table 6). For patients who start with the worst-case scenario (YES-complications), the LR fits well for predicting OUT-goals transition at one and two years (F1-scores > 0.8 for both). Predictions for those who remain in the same OUT-goals scenario were performed at one year with LR and two years with LGBoost, with good performance in both cases (F1-score of 0.83 and 0.78, respectively). The prediction of developing complications at one year was estimated using Boosting models for both ON-goals and OUT-goals starters (F1-score of 0.81 for AdaBoost and GBC). Similar predictions, but with superior performance, were made for the second year using LR (F1-score > 0.94 in both cases). The full set of results for all tested models is available in Supplemental Material S4 File. As discussed in the methods section, in imbalanced datasets, the F1 score gives favorable results when the predictive model excels in identifying the less prevalent but critical class (outside goals or with complications).
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Finally, aiming to address the unbalanced nature of our training set, we tried two different adjustments to make all the classes equally representative. By means of the Synthetic Minority Oversampling Technique (SMOTE), (i) we upsampled the minority class, while holding the majority class constant; and (ii) we upsampled the minority class and downsampled the majority class. This was only performed on the training set while holding the test set constant. Even though there were minor gains in the cross-validation process, the testing metrics did not substantially change, as can be seen on Table 7 and Table 8. Given that the processing pipeline for the initial models was relatively simpler, we preferred to maintain them.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Table 9 presents the confusion matrix for the selected models. For its construction we selected a cutoff to translate predicted probabilities into binary predictions that maximize the kappa score in order to balance sensitivity (true positives rate) and specificity (true negatives rate). In general the resulting models perform better detecting true positives (predicting transitions that actually happened) than true negatives. In particular in the 2-year horizon, for those with complications the model has problems predicting that the patient will not move to OUT-goals, and the same for predictions into YES-complications from both ON and OUT-goals.
[Figure omitted. See PDF.]
Aside from the performance metrics, from the general results from the model we can establish which factors exerted the most significant influence on predictions. Feature importance plots are provided for the models in rows two and four of Table 6, which correspond to Logistic Regressions predicting the one-year transition of HbA1c levels. Specifically, Fig 3 illustrates the most influential factors for patients without complications, while Fig 4 does so for patients with complications. For patients initially without complications, staying out of target ranges is associated with receiving nutrition recommendations and not adhering to clinical guidelines. This underscores the critical role of prevention measures and proper nutrition in maintaining healthy HbA1c levels. Conversely, patients with early-stage Chronic Kidney Disease who do not follow the Cholesterol control guidelines also tend to miss their HbA1c targets. Overall, NLP-extracted features—particularly those related to nutritional habits—highlight the impact of dietary control in preventing HbA1c increases. Notably, lipid-lowering agents play a key role as well. Patients who consistently receive nutrition recommendations and take these medications often remain out of target, likely due to unchanged dietary habits.
[Figure omitted. See PDF.]
= 0).
[Figure omitted. See PDF.]
= 0).
We move from feature importance to understand the marginal effect of each variable on the transition probabilities. Figs 5 and 6 show predictors, such as age, explaining the probability of being OUT-goals at one and two years, regardless of whether the patient has complications. Fig 7 considers instead the role of predictors of developing complications at one year for patients OUT-goals, from which we highlight the importance of eGFR. This result is expected as most patients in this category develop CKD.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Professional adherence to HbA1c treatment recommendations reduces the probability (between 2 pp. and 5 pp. at one year and around 2 pp. at two years) of deviating from goals as long as the patients are on goals (Figs 5 and 6). Adherence to the dyslipidemia treatment guidelines decreases the probability of around 5 pp. of patients with complications remaining OUT-goals at two years (Fig 6). The probability of moving outside goals for patients in the better initial stage, i.e., ON-NOT, decreases more than 10% at one year (Fig 5) and only 1 pp. at two years (Fig 6) when the physicians adhere to the cholesterol recommendations. In addition, if professionals follow such guidelines, patients are less likely (20 pp. less for those ON-goals and around 3 pp. for OUT-goals) to develop complications at two years (Fig 7).
From the patients’ side, non-compliance to pharmacological treatments increases, by more than 5 pp., the probability of developing complications at one year for the group of patients in the better scenario (see ON-NOT in Fig 7). Surprisingly, the direction of the effect of non-compliance on the probability of being out of the HbA1c goals at two years is the opposite for patients with complications compared with those without complications: the probability of being OUT-goals increases by 4 pp. if the patient belongs to the first group and decreases by around 3 pp. for the second (Fig 6).
Discussion
The transitions explored in the current analysis include a critical variable for the disease’s management (metabolic goals) and dimensions involving multidimensional treatment factors. In particular, the model systematizes, in a replicable and scalable way, the information on the pharmacological compliance of the patient and the professional’s adherence to the care pathways (non-pharmacological and pharmacological). In this sense, the model’s predictions regarding the development of complications go beyond classic predictors such as age or creatinine levels, providing additional information on how institutional variables such as physicians’ adherence to metabolic control guidelines and lifestyle recommendations play an essential role in the disease progression.
We run the models using a relatively similar number of patients and follow-up time compared to other studies in the literature [21]. Our models achieve close to 80% accuracy, similar to some results reported in the literature. For example, to predict the development of diabetes in China, the accuracy of RF models reaches up to 80% and NN 78% [22], and to predict risk factors for the progression of diabetic kidney disease to end-stage renal disease in the same Chinese context, RF models show accuracy around 82% [23]. In Japan, using big data machine learning methods with EHR data on a CKD aggravation model, the AUC was 0.743 and accuracy of 71% [15]. A CKD model, using data from a biobank, that on top of the EHR incorporated novel biomarkers for predicting the deterioration of renal function, reported an AUC of 0.77 [24]. Our models even outperform a recent Chinese study that used LR, DT, RF, and extreme gradient boosting (XGBoost) to predict progression to developing DM at 1 and 2 years (Accuracy around 60% and F1-scores close to 40%) [25].
This study could impact patients and the healthcare system from a clinical perspective. The model developed might aid in identifying individuals at risk of future complications early, enabling timely preventive measures. We highlight that the incidence of complications, despite achieving glycemic control, suggests the need for more comprehensive management strategies in patients with diabetes mellitus. For example, healthcare providers could implement more frequent monitoring of renal function (as indicated by the glomerular filtration rate) and cardiovascular health, especially in older patients or those with a history of poor compliance to treatment, even when they are within HbA1c targets. Additionally, our findings support the importance of reinforcing adherence to treatment guidelines for HbA1c and dyslipidemia, as well as personalized patient education focusing on lifestyle interventions, such as diet and exercise. These interventions could lead to better management of chronic complications like cardiovascular disease and kidney failure, ultimately improving long-term health outcomes in this population.
This analysis has several limitations. First, DM is a chronic disease usually related to complications incidence for five years, and our follow-up considers only 1 and 2 years, which is too short to identify all new cases in the cohort. Our model focuses on measuring the progression of DM towards critical stages like the development of complications. However, we had no data on more decisive outcomes such as hospitalization and mortality. Second, we recognize that the retrospective nature of our study and the inclusion criteria based on the availability of HbA1c measurements could introduce selection bias. Patients with more frequent follow-ups or better adherence to treatment might be overrepresented, potentially influencing the results. In addition, if the patients with diabetes who moved were more (or less) likely to develop complications or to be out of the sample, our selection would induce a bias. The usual transfer rate of general patients between HMOs in the country was around 6% [26], but we do not have such data for the HMO studied. Third, as our study was conducted within a single HMO in Colombia, the findings may not be directly generalizable to other populations or healthcare systems. Elements such as the NLP-generated labels are typically specific to the Colombian context, and to some extent to the specific HMO. Still, the general procedure can be replicated elsewhere. For future work, we suggest validating the NLP exercise by obtaining metrics through the revision of the classifications by an independent clinical expert team in another dataset. In general, we recommend validating the presented prediction model in other populations prior to its clinical use.
In a more recent cohort of patients from the HMO studied, only 30% of patients with DM developed chronic complications, and about half were controlled. Surprisingly, we found that more than 60% of patients who start in the better scenario develop chronic complications at two years. Our result may be explained by characteristics specific to the study cohort analyzed. Considering the ML-model performance metrics, our F1 scores exhibit proficiency, while the AUC appears suboptimal and for some models the true negatives rate is not ideal. This phenomenon is observed in imbalanced datasets, where the instances of one class significantly outnumber those of the other [27].
Conclusion
Using NLP techniques to incorporate unstructured information, we developed an ML-based model to estimate the probability of diabetes progression over one and two years in a cohort of 23,802 patients from a large Colombian HMO. Our findings reveal that glycemic control alone does not prevent disease progression. Despite starting in the best-case scenario—on-target HbA1c and no complications—more than 60% of patients develop chronic complications within two years. This highlights the need for a more comprehensive management approach beyond glucose control.
Our models also show that adherence to dyslipidemia treatment guidelines significantly reduces the likelihood of patients falling outside HbA1c targets and developing complications, while non-adherence to pharmacological treatments is a strong predictor of worsening outcomes. These results suggest that managing diabetes effectively requires enhanced monitoring of kidney and cardiovascular health, reinforcement of professional adherence to lipid and metabolic control guidelines, and improved patient education on lifestyle interventions.
Although our models perform well compared to existing literature, the results suggest that even with highly granular patient-level data from a major insurance company, predictions remain far from ideal. This underscores the importance of improving data collection and integration, particularly regarding lifestyle factors and long-term patient behaviors. Future work should focus on validating these findings in other populations and exploring interventions that can actively improve adherence and long-term outcomes.
The analysis here is based on correlations; in this sense, adherence to recommendations may be endogenous as doctors may be more incisive depending on what they expect their patients to do [28]. Further research should include some experimental variation, for instance, on the intensity of the exposure to the recommendations from the physicians. Finally, the ML models were packaged into a calculator that can be used as a decision-support tool in clinical practice. The calculator works like a simulator, allowing physicians to enter values of clinical variables and determine in real time the probability that the patient will move to different stages. Currently, the calculator is a test prototype designed and expected to be used for research purposes.
Supporting information
S1 File. List of ICD-10 codes used to characterize complications.
https://doi.org/10.1371/journal.pone.0321258.s001
(DOCX)
S2 File. Details on the NLP algorithm.
https://doi.org/10.1371/journal.pone.0321258.s002
(DOCX)
S3 File. Machine learning algorithm details.
https://doi.org/10.1371/journal.pone.0321258.s003
(DOCX)
S4 File. All trained models’ metrics.
https://doi.org/10.1371/journal.pone.0321258.s004
(CSV)
Acknowledgments
We would like to thank Andrés Ramírez for the provided intellectual support and technical assistance during the elaboration of this paper, and Andrea Castillo Niuman for her contribution with administrative tasks. We also thank Fundación Universitaria Sanitas in Colombia for providing the primary dataset for this research.
References
1. 1. Institute for Health Metrics and Evaluation. Global Burden of Disease Study Results. In: University of Washington. 2019.
2. 2. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. pmid:34879977
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Yang W, Dall TM, Beronjia K, Lin J, Semilla AP, Chakrabarti R, et al. Economic Costs of Diabetes in the U.S. in 2017. Diabetes Care. 2018;41(5):917–28. pmid:29567642
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Williams R, Karuranga S, Malanda B, Saeedi P, Basit A, Besançon S, et al. Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract. 2020;162:108072. pmid:32061820
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Correa VC, Lugo-Agudelo LH, Aguirre-Acevedo DC, Contreras JAP, Borrero AMP, Patiño-Lugo DF, et al. Individual, health system, and contextual barriers and facilitators for the implementation of clinical practice guidelines: a systematic metareview. Health Res Policy Syst. 2020;18(1):74. pmid:32600417
* View Article
* PubMed/NCBI
* Google Scholar
6. 6. Aljumah AA, Ahamad MG, Siddiqui MK. Application of data mining: Diabetes health care in young and old patients. J King Saud University - Comp Information Sciences. 2013;25(2):127–36.
* View Article
* Google Scholar
7. 7. Zhou H, Isaman DJM, Messinger S, Brown MB, Klein R, Brandle M, et al. A computer simulation model of diabetes progression, quality of life, and cost. Diabetes Care. 2005;28(12):2856–63. pmid:16306545
* View Article
* PubMed/NCBI
* Google Scholar
8. 8. De Gaetano A, Hardy T, Beck B, Abu-Raddad E, Palumbo P, Bue-Valleskey J, et al. Mathematical models of diabetes progression. Am J Physiol Endocrinol Metab. 2008;295(6):E1462-79. pmid:18780774
* View Article
* PubMed/NCBI
* Google Scholar
9. 9. Jones AP, Homer JB, Murphy DL, Essien JDK, Milstein B, Seville DA. Understanding diabetes population dynamics through simulation modeling and experimentation. Am J Public Health. 2006;96(3):488–94. pmid:16449587
* View Article
* PubMed/NCBI
* Google Scholar
10. 10. Ahmad Kiadaliri A, Gerdtham U-G, Nilsson P, Eliasson B, Gudbjörnsdottir S, Carlsson KS. Towards renewed health economic simulation of type 2 diabetes: risk equations for first and second cardiovascular events from Swedish register data. PLoS One. 2013;8(5):e62650. pmid:23671618
* View Article
* PubMed/NCBI
* Google Scholar
11. 11. Yi Y, Philips Z, Bergman G, Burslem K. Economic models in type 2 diabetes. Curr Med Res Opin. 2010;26(9):2105–18. pmid:20642392
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Krishnamurthy S, Ks K, Dovgan E, Luštrek M, Gradišek Piletič B, Srinivasan K, et al. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare (Basel). 2021;9(5):546. pmid:34067129
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Alexiuk M, Tangri N. Prediction models for earlier stages of chronic kidney disease. Curr Opin Nephrol Hypertens. 2024;33(3):325–30. pmid:38420892
* View Article
* PubMed/NCBI
* Google Scholar
14. 14. Islam R, Sultana A, Tuhin MN, Saikat MSH, Islam MR. Clinical Decision Support System for Diabetic Patients by Predicting Type 2 Diabetes Using Machine Learning Algorithms. J Healthc Eng. 2023;2023:6992441. pmid:37287539
* View Article
* PubMed/NCBI
* Google Scholar
15. 15. Makino M, Yoshimoto R, Ono M, Itoko T, Katsuki T, Koseki A, et al. Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Sci Rep. 2019;9(1):11862. pmid:31413285
* View Article
* PubMed/NCBI
* Google Scholar
16. 16. Committee ADAPP, ElSayed NA, Aleppo G, Bannuru RR, Bruemmer D, Collins BS, et al. 16. Diabetes Care in the Hospital: Standards of Care in Diabetes-2024. Diabetes Care. 2024;47(Suppl 1):S295–306. pmid:38078585
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Hughes Mark, Li Irene, Kotoulas Spyros, Suzumura Toyotaro. Medical Text Classification Using Convolutional Neural Networks. Studies in Health Technology and Informatics. 2017.
* View Article
* Google Scholar
18. 18. Shao Y, Taylor S, Marshall N, Morioka C, Zeng-Treitler Q. Clinical Text Classification with Word Embedding Features vs. Bag-of-Words Features. 2018 IEEE International Conference on Big Data (Big Data). 2018:2874–8.
* View Article
* Google Scholar
19. 19. Rezaei MJ, Woodward JR, Ramírez J, Munroe P. A Novel Two-Stage Heart Arrhythmia Ensemble Classifier. Computers. 2021;10(5):60.
* View Article
* Google Scholar
20. 20. Leathart T, Frank E, Pfahringer B, Holmes G. Ensembles of nested dichotomies with multiple subset evaluation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019;11439: 81–93.
* View Article
* Google Scholar
21. 21. Bertsimas D, Kallus N, Weinstein AM, Zhuo YD. Personalized Diabetes Management Using Electronic Medical Records. Diabetes Care. 2017;40(2):210–7. pmid:27920019
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet. 2018;9:416440.
* View Article
* Google Scholar
23. 23. Zou Y, Zhao L, Zhang J, Wang Y, Wu Y, Ren H, et al. Development and internal validation of machine learning algorithms for end-stage renal disease risk prediction model of people with type 2 diabetes mellitus and diabetic kidney disease. Ren Fail. 2022;44(1):562–70. pmid:35373711
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Chan L, Nadkarni GN, Fleming F, McCullough JR, Connolly P, Mosoyan G, et al. Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease. Diabetologia. 2021;64(7):1504–15. pmid:33797560
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Liu Q, Zhou Q, He Y, Zou J, Guo Y, Yan Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. J Pers Med. 2022;12(7):1055. pmid:35887552
* View Article
* PubMed/NCBI
* Google Scholar
26. 26. Prada Ríos SI. Traslados entre eps en Colombia: ¿Qué dicen las historias laborales de cotizantes en cinco ciudades del país?. RGYPS. 2016;15(30).
* View Article
* Google Scholar
27. 27. DeVries Z, Locke E, Hoda M, Moravek D, Phan K, Stratton A, et al. Using a national surgical database to predict complications following posterior lumbar surgery and comparing the area under the curve and F1-score for the assessment of prognostic capability. Spine J. 2021;21(7):1135–42. pmid:33601012
* View Article
* PubMed/NCBI
* Google Scholar
28. 28. Kaestner R, Darden M, Lakdawalla D. Are investments in disease prevention complements? The case of statins and health behaviors. J Health Econ. 2014;36:151–63. pmid:24814322
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Colmenares-Mejia CC, García-Suaza AF, Rodríguez-Lesmes P, Lochmuller C, Atehortúa SC, Camacho-Cogollo J, et al. (2025) Predicting diabetes mellitus metabolic goals and chronic complications transitions—analysis based on natural language processing and machine learning models. PLoS ONE 20(4): e0321258. https://doi.org/10.1371/journal.pone.0321258
About the Authors:
Claudia C. Colmenares-Mejia
Roles: Conceptualization, Formal analysis, Funding acquisition, Investigation, Project administration, Resources, Validation, Writing – review & editing
Affiliation: Fundación Universitaria Sanitas, Bogotá, Colombia
Andrés F. García-Suaza
Roles: Conceptualization, Formal analysis, Funding acquisition, Investigation, Project administration, Supervision, Validation, Writing – review & editing
Affiliation: Universidad del Rosario, Bogotá, Colombia
ORICD: https://orcid.org/0000-0002-9617-6873
Paul Rodríguez-Lesmes
Roles: Conceptualization, Formal analysis, Investigation, Validation, Writing – review & editing
E-mail: [email protected]
Affiliation: Universidad del Rosario, Bogotá, Colombia
ORICD: https://orcid.org/0000-0003-1058-3062
Christian Lochmuller
Roles: Investigation, Validation
Affiliation: Universidad EIA, Envigado, Colombia
Sara C. Atehortúa
Roles: Formal analysis, Investigation, Visualization, Writing – original draft
Affiliation: Universidad de Antioquia, Medellín, Colombia
J.E. Camacho-Cogollo
Roles: Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation
Affiliation: Universidad EIA, Envigado, Colombia
ORICD: https://orcid.org/0000-0003-0252-4580
Juan P. Martínez
Roles: Data curation, Investigation, Visualization
Affiliation: Universidad del Rosario, Bogotá, Colombia
Juliana Rincón
Roles: Investigation, Validation
Affiliation: Fundación Universitaria Sanitas, Bogotá, Colombia
Yohan R. Céspedes
Roles: Data curation, Investigation, Methodology, Visualization
Affiliation: Fundación Universitaria Sanitas, Bogotá, Colombia
Esteban Morales-Mendoza
Roles: Data curation, Investigation, Resources, Validation
Affiliation: Fundación Universitaria Sanitas, Bogotá, Colombia
ORICD: https://orcid.org/0009-0001-1189-8288
Mario A. Isaza-Ruget
Roles: Conceptualization, Investigation, Resources
Affiliation: Fundación Universitaria Sanitas, Bogotá, Colombia
[/RAW_REF_TEXT]
1. Institute for Health Metrics and Evaluation. Global Burden of Disease Study Results. In: University of Washington. 2019.
2. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. pmid:34879977
3. Yang W, Dall TM, Beronjia K, Lin J, Semilla AP, Chakrabarti R, et al. Economic Costs of Diabetes in the U.S. in 2017. Diabetes Care. 2018;41(5):917–28. pmid:29567642
4. Williams R, Karuranga S, Malanda B, Saeedi P, Basit A, Besançon S, et al. Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract. 2020;162:108072. pmid:32061820
5. Correa VC, Lugo-Agudelo LH, Aguirre-Acevedo DC, Contreras JAP, Borrero AMP, Patiño-Lugo DF, et al. Individual, health system, and contextual barriers and facilitators for the implementation of clinical practice guidelines: a systematic metareview. Health Res Policy Syst. 2020;18(1):74. pmid:32600417
6. Aljumah AA, Ahamad MG, Siddiqui MK. Application of data mining: Diabetes health care in young and old patients. J King Saud University - Comp Information Sciences. 2013;25(2):127–36.
7. Zhou H, Isaman DJM, Messinger S, Brown MB, Klein R, Brandle M, et al. A computer simulation model of diabetes progression, quality of life, and cost. Diabetes Care. 2005;28(12):2856–63. pmid:16306545
8. De Gaetano A, Hardy T, Beck B, Abu-Raddad E, Palumbo P, Bue-Valleskey J, et al. Mathematical models of diabetes progression. Am J Physiol Endocrinol Metab. 2008;295(6):E1462-79. pmid:18780774
9. Jones AP, Homer JB, Murphy DL, Essien JDK, Milstein B, Seville DA. Understanding diabetes population dynamics through simulation modeling and experimentation. Am J Public Health. 2006;96(3):488–94. pmid:16449587
10. Ahmad Kiadaliri A, Gerdtham U-G, Nilsson P, Eliasson B, Gudbjörnsdottir S, Carlsson KS. Towards renewed health economic simulation of type 2 diabetes: risk equations for first and second cardiovascular events from Swedish register data. PLoS One. 2013;8(5):e62650. pmid:23671618
11. Yi Y, Philips Z, Bergman G, Burslem K. Economic models in type 2 diabetes. Curr Med Res Opin. 2010;26(9):2105–18. pmid:20642392
12. Krishnamurthy S, Ks K, Dovgan E, Luštrek M, Gradišek Piletič B, Srinivasan K, et al. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare (Basel). 2021;9(5):546. pmid:34067129
13. Alexiuk M, Tangri N. Prediction models for earlier stages of chronic kidney disease. Curr Opin Nephrol Hypertens. 2024;33(3):325–30. pmid:38420892
14. Islam R, Sultana A, Tuhin MN, Saikat MSH, Islam MR. Clinical Decision Support System for Diabetic Patients by Predicting Type 2 Diabetes Using Machine Learning Algorithms. J Healthc Eng. 2023;2023:6992441. pmid:37287539
15. Makino M, Yoshimoto R, Ono M, Itoko T, Katsuki T, Koseki A, et al. Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Sci Rep. 2019;9(1):11862. pmid:31413285
16. Committee ADAPP, ElSayed NA, Aleppo G, Bannuru RR, Bruemmer D, Collins BS, et al. 16. Diabetes Care in the Hospital: Standards of Care in Diabetes-2024. Diabetes Care. 2024;47(Suppl 1):S295–306. pmid:38078585
17. Hughes Mark, Li Irene, Kotoulas Spyros, Suzumura Toyotaro. Medical Text Classification Using Convolutional Neural Networks. Studies in Health Technology and Informatics. 2017.
18. Shao Y, Taylor S, Marshall N, Morioka C, Zeng-Treitler Q. Clinical Text Classification with Word Embedding Features vs. Bag-of-Words Features. 2018 IEEE International Conference on Big Data (Big Data). 2018:2874–8.
19. Rezaei MJ, Woodward JR, Ramírez J, Munroe P. A Novel Two-Stage Heart Arrhythmia Ensemble Classifier. Computers. 2021;10(5):60.
20. Leathart T, Frank E, Pfahringer B, Holmes G. Ensembles of nested dichotomies with multiple subset evaluation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019;11439: 81–93.
21. Bertsimas D, Kallus N, Weinstein AM, Zhuo YD. Personalized Diabetes Management Using Electronic Medical Records. Diabetes Care. 2017;40(2):210–7. pmid:27920019
22. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet. 2018;9:416440.
23. Zou Y, Zhao L, Zhang J, Wang Y, Wu Y, Ren H, et al. Development and internal validation of machine learning algorithms for end-stage renal disease risk prediction model of people with type 2 diabetes mellitus and diabetic kidney disease. Ren Fail. 2022;44(1):562–70. pmid:35373711
24. Chan L, Nadkarni GN, Fleming F, McCullough JR, Connolly P, Mosoyan G, et al. Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease. Diabetologia. 2021;64(7):1504–15. pmid:33797560
25. Liu Q, Zhou Q, He Y, Zou J, Guo Y, Yan Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. J Pers Med. 2022;12(7):1055. pmid:35887552
26. Prada Ríos SI. Traslados entre eps en Colombia: ¿Qué dicen las historias laborales de cotizantes en cinco ciudades del país?. RGYPS. 2016;15(30).
27. DeVries Z, Locke E, Hoda M, Moravek D, Phan K, Stratton A, et al. Using a national surgical database to predict complications following posterior lumbar surgery and comparing the area under the curve and F1-score for the assessment of prognostic capability. Spine J. 2021;21(7):1135–42. pmid:33601012
28. Kaestner R, Darden M, Lakdawalla D. Are investments in disease prevention complements? The case of statins and health behaviors. J Health Econ. 2014;36:151–63. pmid:24814322
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Colmenares-Mejia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Objective
To estimate Diabetes mellitus (DM) progression at one and two years in terms of glycemic targets and development of complications.
Research design and methods
We analyzed a retrospective cohort of adult DM patients treated in a Health Maintenance Organization in Colombia, including those with at least one glycosylated hemoglobin (HbA1c) measurement in 2018, 2019, and 2020. We defined four disease transition stages based on metabolic goals according to HbA1c levels and complications: 1. Within HbA1c goals and without complications; 2. Outside goals and without complications, 3. Within goals, but with complications, and 4. Outside goals and with complications. We applied Natural Language Processing (NLP) techniques to extract relevant clinical information from Electronic Health Records. Machine learning (ML) models were used to predict patient progression.
Results
A total of 23,802 patients were included. Despite achieving initial glycemic control, more than 60% of patients who started within HbA1c targets and without complications developed chronic complications within two years. Our models, which achieved up to 80% accuracy and F1 scores above 74%, identified key predictors of disease progression. Adherence to dyslipidemia treatment guidelines significantly reduced the likelihood of HbA1c deterioration and complications, whereas non-adherence to pharmacological treatments increased the risk of complications. These findings suggest that HbA1c control alone is insufficient to prevent disease progression and that a more comprehensive management approach—including lipid control, kidney function monitoring, and improved adherence to clinical guidelines—is necessary.
Conclusions
Patient compliance with pharmacological treatments, professional adherence to clinical practice guidelines, and lifestyle interventions play a crucial role in diabetes progression. While our models provide strong predictive capabilities, improving data quality and integration remains essential for better forecasting and intervention strategies.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer