Machine learning-based prediction model for

Full text

Turn on search term navigation

1. Introduction

Chronic kidney disease (CKD) has emerged as a significant global public health challenge. Epidemiological studies indicate that the worldwide prevalence of CKD is approximately 14.3%, with a prevalence of 10.8% in China [1–4]. Moreover, CKD is projected to become the fifth leading cause of mortality globally by 2040 [5].

Cognition is the core of human mental activity, involving the processes of knowledge acquisition and understanding [6]. Cognitive impairment (CI) is a neurodegenerative aging process, characterized by impairments in language, attention, memory, reasoning, judgment, and visual perceptual functions [7].

It has been estimated that CI affects approximately 10% to 40% of CKD patients [8]. CI reduces patients’ quality of life, impacts their adherence to treatment, and increases the risk of death [9]. Therefore, predicting the risk of CI in the CKD population is crucial. However, there are relatively few studies on CI risk prediction models for CKD patients both domestically and internationally, with most focusing on exploring

CI influencing factors. Most existing models are designed for maintenance hemodialysis patients and predominantly use traditional logistic regression algorithms. This study will apply four machine learning algorithms (NNET, RF, LR, and SVM) to construct a CI risk prediction model for the CKD population using data from the China Health and Retirement Longitudinal Study (CHARLS) database. The model with the best predictive performance will be identified through comparison, its predictors’ importance will be evaluated using the SHAP method, and the optimal model will be deployed on a web page using the Streamlit library. This approach aims to improve CI risk prediction in the CKD population and provide a basis for early intervention.

2. Methods

2.1. Data source

The data for this study were sourced from the China Health and Retirement Longitudinal Study (CHARLS). CHARLS is led by the National Development Research Institute of Peking University. It aims to investigate health and aging issues in the middle-aged and elderly population in China. A nationwide baseline survey was conducted from 2011 to 2012, employing random sampling across 150 county-level units, surveying 10,257 households with at least one resident aged 45 or older, totaling 17,708 individuals. Follow-up surveys were conducted in 2013 and 2015, encompassing 20,284 individuals [10]. The database includes comprehensive health and socioeconomic information, such as renal disease history and cognitive functioning test results, among other variables.The Ethics Committee of Peking University reviewed and approved CHARLS (Ethics No. IRB00001052–11015). All participants or their proxies provided signed informed consent.The data analyzed in this study were obtained from CHARLS and did not require additional ethical review by the investigator’s affiliated institution. Secondary analyses did not necessitate additional institutional review board approval.

2.2. Participants

Data were obtained from the 2015 CHARLS database. We accessed the database on April 10, 2024, and included 13,273 individuals relevant to the scope of the study, screened 800 individuals with chronic kidney disease, and excluded 385 individuals with missing information on cognitive function tests, resulting in a final valid sample of 415 individuals (Fig 1).

[Figure omitted. See PDF.]

2.3. Relevant definition

Chronic Kidney Disease (CKD) is assessed based on the estimated glomerular filtration rate (eGFR), with CKD defined as eGFR < 60 mL/(min·1.73 m²) according to the Expert Consensus on Renal Disease [11]. eGFR is calculated using the following MDRD equation [12].

Male:eGFR (MDRD) [mL/(min·1.73 m²)] = 175*Scr(mg/dL)^1.154*Age^0.203

Female:eGFR (MDRD) [mL/(min·1.73 m²)] = 175*Scr(mg/dL)^1.154*Age^0.203*0.742

2.4. Outcome variable

CI determination: The 2015 CHARLS study did not provide a definitive diagnosis of cognitive impairment. Referring to previous literature [13], cognitive function was assessed using two indicators: mental status and situational memory capacity, derived from the Health Status and Functioning Questionnaire. The mental status test evaluated the perception of the day of the year, day of the week, and season, along with numeracy and graphing abilities. Each correct response was awarded 1 point, with a total score ranging from 0 to 11. The situational memory test involved word recall, where participants were tasked with recalling two sets of 10 words at two different times. Each correctly recalled word scored 1 point, and the average score of the two recalls represented the final situational memory ability, with a total score range of 0–10. The cognitive functioning score was the sum of the mental status and situational memory scores, with a total possible range of 0–21. CI was defined as the lowest 10% of the population based on this total score.

2.5. Predictor variables

A total of 21 potential factors were extracted by combining all relevant factors reported in recent studies and considering the accessibility of variables in the database: (1) demographic characteristics—gender, age, education level, marital status [14]; (2) health behavior factors—smoking, alcohol consumption [7], self-assessment of health, sleep duration, depression status [15], and social participation [16]; (3) physical examination indicators—BMI [7]; (4) laboratory test indicators—blood creatinine, hemoglobin, glycosylated hemoglobin, total cholesterol, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, fasting blood glucose [8], ultrasensitive C-reactive protein [15], and glomerular filtration rate. Due to the proportion of missing data for smoking and self-rated health exceeding 20% in the database [17], these two variables were excluded. The remaining 19 variables were selected for inclusion in the study. After conducting univariate and multivariate binary logistic regression analysis, variables with p < 0.2 were included in the model construction.

Depression status was assessed using 10 questions: (1) eight negative mood items evaluating whether the individual was bothered by small things, had difficulty concentrating, felt depressed, struggled to accomplish tasks, experienced fear, had trouble sleeping, felt lonely, or was unable to get along with others; and (2) two positive mood items assessing whether the individual felt hopeful about the future and experienced happiness. Each question was scored on a 4-point scale (0 = rarely or not at all, 1 = not very much, 2 = sometimes or half the time, 3 = most of the time), with reverse scoring applied to the two positive mood items. The total scores from all 10 questions were summed to create a composite index of depressive symptoms, ranging from 0 to 30 points. Higher scores indicate greater levels of depression and lower mental health status among older adults.

Social participation data were derived from 11 activities listed in the CHARLS questionnaire, which included items such as “Have you engaged in any of the following activities in the past month: interacting with friends, playing mahjong, or playing cards,” among others. Each activity was scored based on the frequency of participation: a value of 3 was assigned if the respondent participated almost daily, 2 if they participated almost weekly, and 1 if participation was infrequent during the month. The scores for all 11 activities were summed to calculate the total social participation score, with higher scores indicating greater levels of social participation [18].

Body mass index (BMI) was calculated using the formula:BMI = weight (kg)/ height (m)^2 [19].

2.6. Statistical analysis

Continuous variables were expressed as mean ± standard deviation, and comparisons between groups were conducted using independent samples t-tests. Categorical variables were presented as n (%), with group comparisons performed using chi-square tests. Missing values were imputed using the MICE package in R Studio 4.3.1. Binary logistic regression, including univariate and multivariate analyses, was applied to screen for CI influencing factors in CKD patients, with variables exhibiting P < 0.2 included in the subsequent model construction.

The study sample consists of 415 participants, with a relatively small data volume. The predictor variables include both multicategorical and continuous variables, while the outcome variable is dichotomous. RF is a decision tree-based model, SVM is based on support vector machines, LR is a linear model, and NNET is a neural network model. Given the limited data in this study, complex models like deep learning are not suitable. Therefore, four machine learning algorithms—NNET, RF, LR, and SVM—were selected for model construction.These algorithms are well-suited for handling diverse data types, as they can capture complex nonlinear relationships and demonstrate stable performance across varying data contexts.

The dataset was split into training and test sets using the `train_test_split` function (75–25 ratio) in Python 3.10, and the predictor variables were one-hot encoded and normalized. Given the data imbalance (with a Positive: Negative sample ratio of approximately 7:1), the training dataset was oversampled using the SMOTE (Synthetic Minority Oversampling Technique) method. Specifically, minority class samples are randomly duplicated to balance the class distribution, enhancing the model’s ability to identify the minority class. Meanwhile, to maintain fairness, this adjustment is not applied to the test set.The model was trained and parameters optimized via 5-fold cross-validation and grid search. Model performance was evaluated based on AUC, accuracy, recall, specificity, precision, and F1 score. Model calibration was assessed using calibration curves. The model with the highest AUC was selected as the best-performing model. The importance of its predictors was evaluated using the SHAP method, and the optimal model was deployed on the web page using the Streamlit library.

3. Results

3.1. Description of the characteristics of the research sample

The 415 CKD participants included 53 patients with CI,362 patients with normal cognition, and Table 1 describes the differences in characteristics between the two. Compared with cognitively normal CKD patients, CKD patients with CI were older (mean age 71.7 + 8.0 years), had lower hemoglobin, and poorer social participation. The proportion of CKD patients with CI was higher in those who were illiterate and had not completed elementary school than in those who were cognitively normal; the proportion of CKD patients with CI was lower in those who had completed elementary school and junior high school than in those who were cognitively normal.

[Figure omitted. See PDF.]

3.2. The outcomes of univariate and multivariate analyses

After univariate and multivariate logistic regression analysis, no statistically significant differences were found between the two groups regarding gender, marital status, alcohol consumption, sleep duration, depression status, BMI, blood creatinine, glycosylated hemoglobin, total cholesterol, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, fasting blood glucose, ultrasensitive C-reactive protein, and glomerular filtration rate (P > 0.2). However, statistically significant differences (P < 0.2) were observed for age, education level, social participation, and hemoglobin (Table 1).

For further descriptive analysis, odds ratios (ORs) for each predictor variable were calculated using univariate and multivariate logistic regression (Table 1). Among the demographic characteristics, the analysis revealed that age was a risk factor for cognitive impairment (CI) in the CKD population (OR 1.07, 95% CI 1.02–1.13). Compared with illiterate CKD patients, those who had not completed elementary school, graduated from elementary school, and graduated from junior high school had a lower risk of developing CI (OR 0.34, 95% CI 0.15–0.76; OR 0.11, 95% CI 0.04–0.30; OR 0.03, 95% CI 0.00–0.20, respectively). Among the health behavior variables, CKD patients with good social engagement had a lower risk of CI compared to those with poor social engagement (OR 0.85, 95% CI 0.71–1.01). Among the laboratory indicators, CKD patients with higher hemoglobin levels had a lower risk of CI compared to those with lower hemoglobin levels (OR 0.75, 95% CI 0.60–0.93).

3.3. Comparison of predictive performance among machine learning algorithms.

In this study, four machine learning models were developed to predict the probability of cognitive impairment (CI) in CKD patients. Fig 2 illustrates the predictive efficacy of the four models on the test set using ROC curves. Among these models, the NNET model (AUC = 0.918) demonstrated the highest effectiveness in predicting the occurrence of CI in CKD patients, followed by the RF model (AUC = 0.889), the LR model (AUC = 0.872), and the SVM model (AUC = 0.760).

[Figure omitted. See PDF.]

Table 2 presents detailed performance metrics for the four models in both the training and test sets. In the test set, the RF model achieved the highest accuracy of 0.875, the NNET model exhibited the highest recall of 0.941, the SVM model attained the highest specificity of 0.931, and the LR model demonstrated the highest precision and F1 score of 0.75 and 0.834, respectively.

[Figure omitted. See PDF.]

The calibration curves of the four models are shown in Fig 3. The curves exhibit a high degree of overlap with the straight line of y = x, indicating good agreement between the predicted and actual occurrence risk values. Among the models, the NNET model demonstrates the best calibration.

[Figure omitted. See PDF.]

3.4. Optimal model SHAP visualization interpretation.

In the output model of this study, each predictor variable was translated into a contribution to the outcome for each patient. Specifically, the higher the SHAP value, the greater the risk of CI in CKD patients. The results indicated that the risk of CI increased progressively with the patient’s age, while a lower education level and reduced hemoglobin concentration were associated with a higher risk of CI (Fig 4). The significance of each variable in determining the outcome was quantified by the mean of the absolute SHAP values. The results indicated that age, education level, and hemoglobin concentration were the most influential predictors in the final model(Fig 5).

[Figure omitted. See PDF.]

3.5. Deployment of the optimal model on the Web.

Using the Streamlit library, the optimal model is deployed on a web platform, with prediction results visually presented. Upon logging into the webpage through the local area network (LAN), users are prompted to enter the relevant index data and click the prediction button to initiate the process. The basic interface of the web application is shown in Fig 6.

[Figure omitted. See PDF.]

4. Discussion

4.1. Analysis of variables associated with the risk of CI in patients with CKD.

4.1.1. Age.

This study identifies age as a significant risk factor for CI in the CKD population. As age advances, CKD patients become increasingly vulnerable to cardiovascular risk factors, which can precipitate cerebrovascular pathology. Additionally, age-related physiological and pathological renal changes may elevate blood inflammatory marker levels, heightening susceptibility to cerebrovascular conditions [8] such as microembolism and hypoperfusion. These factors contribute to the development of lacunar cerebral infarcts and cerebral white matter degeneration, ultimately leading to CI. Therefore, healthcare professionals should prioritize early screening and rigorous monitoring of cognitive function in middle-aged and elderly CKD patients [20].

4.1.2. Education level.

The study findings indicate that higher education levels are a protective factor for cognitive function in the CKD population. Individuals with greater educational attainment possess enhanced cognitive reserves and a more comprehensive array of cognitive skills and strategies, which enable them to better mitigate the impact of pathological changes associated with CKD. Moreover, higher education levels correlate with healthier behaviors [21], including balanced dietary patterns, regular physical activity, and reduced consumption of tobacco and alcohol, thereby lowering the risk of cardiovascular and other chronic diseases that can compromise cognitive function. Additionally, individuals with higher education levels often benefit from stronger social support networks and a more positive mental health status, which help alleviate the psychological stress linked to CKD, further safeguarding cognitive function. Hence, healthcare providers should consider patients’ educational backgrounds and tailor health education programs and interventions to address their specific needs effectively.

4.1.3. Social participation.

Social engagement is a protective factor for cognitive function in individuals with CKD. The 2019 World Health Organization Guidelines for Reducing the Risk of Cognitive Impairment and Dementia indicate that low social engagement is a risk factor for cognitive decline or dementia [22]. Studies have shown that prolonged social isolation affects brain structure and cognitive function in older adults, while active participation in social activities helps improve cognitive function and may slow cognitive decline [23]. Caregivers should encourage CKD patients to participate in social activities according to their interests to reduce the risk of cognitive decline.

4.1.4. Hemoglobin concentration.

In this study, lower hemoglobin concentrations were associated with an increased risk of CI in CKD patients. Research indicates that each 1 g/L increase in hemoglobin reduces the risk of mild cognitive impairment by 1.6% [20]. Low hemoglobin levels may impair cognitive function by inducing cerebral hypoxia or altering the microstructure of cerebral white matter [24,25]. Clinically, anemia is a common comorbidity in CKD patients, predisposing them to CI due to reduced oxygen-carrying capacity and inadequate cerebral oxygenation [26]. Regular monitoring of hemoglobin levels, iron supplementation, and erythropoietin therapy are essential to maintain normal hemoglobin levels. Preventing and managing vascular diseases such as atherosclerosis and thrombosis is critical to ensuring sufficient cerebral oxygenation by promoting vascular health. Proactive management of underlying conditions such as hypertension and diabetes is necessary to prevent further renal impairment and anemia, thereby mitigating its detrimental effects on cognitive function and improving quality of life in CKD patients.

4.2. Analysis of CI risk prediction model for CKD patients

Traditional CI risk prediction models are mainly for maintenance hemodialysis patients and are logistic regression models. The logistic regression algorithm has limitations. It cannot accurately identify relevant influencing factors and cannot fully utilize data characteristics, so the model accuracy is limited. The risk prediction models established based on machine learning algorithms in this study were explicitly designed for the field of CKD, providing an accurate and effective tool for CI risk prediction in this population, which can help medical workers to identify patients with high CI risk at an early stage, formulate an individualized intervention plan to slow down disease progression and improve patients’ prognosis, as well as provide a scientific basis for the development of cognitive health management and individualized intervention plans. The four risk prediction models in this study performed well overall, with AUC values ≥0.760, all with good differentiation and calibration, and the NNET model was able to predict the risk of CI occurrence in CKD patients better than the other three models. The SHAP method was used to visualize how the model worked so that clinical staff could make a general judgment about the risk of CI occurrence in CKD patients. The Streamlit library was utilized to deploy the model on a web page to achieve online real-time prediction of the model and to enhance the model’s utility.

A performance comparison of the machine learning model used in this study with the LR model from previous CKD research revealed the following: when compared to the LR-based Nomogram model from Literature 1 [27] (“Construction and Validation of a Predictive Model for the Onset of Cognitive Impairment Following Hemodialysis in Patients with Chronic Renal Failure.”), the AUC of the validation group for the LR model was 0.895, whereas the machine learning model (NNET model) in this study exhibited a higher AUC of 0.918. Similarly, when compared to the LR model from Literature 2 [28] (“Construction and Validation of a Risk Prediction Model for Mild Cognitive Impairment in Non-Dialysis Chronic Kidney Disease Patient”), the validation group AUC of 0.897 was lower than the 0.918 AUC achieved by the NNET model in this study.

4.3. Application of research

The successful deployment of this model through a web-based interface enhances its potential for widespread application in real-world clinical settings. The required input data are easily accessible, ensuring the model’s practicality. Specifically, age and education level can be directly retrieved from medical records and are readily available to physicians during routine consultations. Hemoglobin levels are included in regular blood monitoring and explicitly reported in laboratory results. Social participation is assessed through specific questions in the CHARLS questionnaire, which asks, “Have you engaged in any of the following activities in the past month?”—covering 11 social activities such as interacting with friends, playing mahjong, and card games. Physicians can simply review questionnaire responses, eliminating the need for additional testing equipment or specialized training. This tool enables healthcare professionals to conduct real-time risk assessments, facilitating timely interventions and ultimately improving patient outcomes.

4.4. Limitations

There are several limitations to this study. First, the relatively small sample size of this study may not sufficiently encompass all subtypes and stages of the disease progression in CKD patients, nor account for the intricate individual differences. This limitation could introduce bias in the feature patterns learned by the model, potentially impacting its applicability and generalizability to similar patient populations in different regions or healthcare settings. Second, because blood test indicators are not yet available for CHARLS data after 2015, this study used a cross-sectional design, and the causal relationships between variables and CIs may be questioned. Third, the four models have not undergone external validation due to constraints in research time and resources, representing a limitation of this study. In the future, external data will be collected, conditions permitting, to further assess the applicability and robustness of the models.

5. Conclusions

In conclusion, the four machine learning-based risk prediction models developed in this study serve as valuable tools for evaluating CI risk in CKD patients, with the NNET model demonstrating the best prediction efficacy. The SHAP package and Streamlit library enabled visual interpretation and web-based deployment of the models, facilitating real-time prediction and enhancing their practical applicability. Age, education level, social participation, and hemoglobin concentration were critical

factors influencing CI occurrence in CKD patients. Based on the output of the optimal model’s SHAP analysis, the most significant predictors are age, education level, and hemoglobin concentration. These help healthcare professionals to objectively assess the risk probability of CI in CKD patients and provide a basis for early intervention.

References

1. 1. GBD 2021 Stroke Risk Factor Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Neurol. 2024;23(10):973–1003. pmid:39304265

* View Article

* PubMed/NCBI

* Google Scholar

2. 2. Yang C, Wang W, Wang F, Wang Y, Zhang F, Liang Z, et al. Ambient PM2.5 components and prevalence of chronic kidney disease: a nationwide cross-sectional survey in China. Environ Geochem Health. 2024;46(2):70. pmid:38353840

* View Article

* PubMed/NCBI

* Google Scholar

3. 3. Zhang F, Wang W, Zhang H. Interpretation on the clinical practice guidelines and expert consensus about exercise rehabilitation in chronic kidney disease. Chin J Blood Purif. 2022;21(2):111–4.

* View Article

* Google Scholar

4. 4. Ma Y. Management of exercise rehabilitation in chronic kidney disease patients. J Nephrol Dial Transplant. 2022;31(4):353–4.

* View Article

* Google Scholar

5. 5. Ortiz A. RICORS2040: the need for collaborative research in chronic kidney disease. Clin Kidney J. 2021;15(3):372–87. pmid:35211298

* View Article

* PubMed/NCBI

* Google Scholar

6. 6. Miao L, Yang C, Zhu B. Research progress on cognitive impairment caused by chronic kidney disease. J Pract Med. 2017;33(11):1882–4.

* View Article

* Google Scholar

7. 7. Li W, Zeng L, Yuan S, Shang Y, Zhuang W, Chen Z. Machine learning for the prediction of cognitive impairment in older adults. Front Neurosci. 2023;17:1158141.

* View Article

* Google Scholar

8. 8. Feng YD, Huang Q, Zhou YL. Cognitive function and risk factors in patients with chronic kidney disease. J Capital Med Univ. 2023;44(5):788–94.

* View Article

* Google Scholar

9. 9. Yu Q, Li H, Wang S. Research progress on cognitive impairment in patients with chronic kidney disease. Chin J Blood Purif. 2021;20(08):509–11.

* View Article

* Google Scholar

10. 10. Lou F, Tian W, Tian H, Shi G. Construction and diagnostic value of a cardiovascular index joint prediction model for the risk of cognitive impairment in middle-aged and elderly populations. Nurs Res. 2022;36(05):753–61.

* View Article

* Google Scholar

11. 11. Lameire NH, Levin A, Kellum JA, Cheung M, Jadoul M, Winkelmayer WC, et al. Harmonizing acute and chronic kidney disease definition and classification: report of a Kidney Disease: Improving Global Outcomes (KDIGO) Consensus Conference. Kidney Int. 2021;100(3):516–26. pmid:34252450

* View Article

* PubMed/NCBI

* Google Scholar

12. 12. Levey AS, Coresh J, Greene T, Stevens LA, Zhang YL, Hendriksen S, et al. Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate. Ann Intern Med. 2006;145(4):247–54. pmid:16908915

* View Article

* PubMed/NCBI

* Google Scholar

13. 13. Fang K, Pan XM, Liu J, Wei T, Zhao PY, Qu QM, et al. Study on the correlation between visceral fat index and cognitive impairment under different age stratification. J Nanjing Med Univ (Nat Sci). 2023; 43(09): 1216–22.

* View Article

* Google Scholar

14. 14. Huang Y, Huang Z, Yang Q, Jin H, Xu T, Fu Y, et al. Predicting mild cognitive impairment among Chinese older adults: a longitudinal study based on long short-term memory networks and machine learning. Front Aging Neurosci. 2023;15:1283243. pmid:37937119

* View Article

* PubMed/NCBI

* Google Scholar

15. 15. He Y, Zhao Q, Li X, Zhang T, Wu H. Research progress on cognitive impairment in maintenance hemodialysis patients. Chin J Blood Purif. 2023;22(7):529–32.

* View Article

* Google Scholar

16. 16. Wang SJ, Wang WR, Li XW, Liu YF, Wei JM, Zheng JG. Using machine learning algorithms for predicting cognitive impairment and identifying modifiable factors among Chinese elderly people. Front Aging Neurosci. 2022;14:977034.

* View Article

* Google Scholar

17. 17. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. pmid:29207961

* View Article

* PubMed/NCBI

* Google Scholar

18. 18. Du M, Dai W, Liu J, Tao J. Less Social Participation Is Associated With a Higher Risk of Depressive Symptoms Among Chinese Older Adults: A Community-Based Longitudinal Prospective Cohort Study. Front Public Health. 2022;10:781771. pmid:35223728

* View Article

* PubMed/NCBI

* Google Scholar

19. 19. Chinese Obesity Working Group. Guidelines for the prevention and control of overweight and obesity in Chinese adults (excerpt). Acta Nutr Sin. 2004;26(1):1–4.

* View Article

* Google Scholar

20. 20. Yang Q, Xiang Y, Ma G, Cao M, Fang Y, Xu W, et al. A nomogram prediction model for mild cognitive impairment in non-dialysis outpatient patients with chronic kidney disease. Ren Fail. 2024;46(1):2317450. pmid:38419596

* View Article

* PubMed/NCBI

* Google Scholar

21. 21. Shao Y. Research on the intergenerational transmission of health and influencing factors in China. Shandong University of Finance and Economics. 2022.

22. 22. Gong WJ. Mechanisms and rehabilitation treatment of age-related cognitive impairment. J Rehabil. 2024;34(06):529–35.

* View Article

* Google Scholar

23. 23. Wang Q, Liu C, Hou X, Hang X, Wu B. Impact of social participation types on cognitive function in the elderly. Chin J Prev Med. 2023;24(07):632–6.

* View Article

* Google Scholar

24. 24. Drew DA, Weiner DE, Sarnak MJ. Cognitive Impairment in CKD: Pathophysiology, Management, and Prevention. Am J Kidney Dis. 2019;74(6):782–90. pmid:31378643

* View Article

* PubMed/NCBI

* Google Scholar

25. 25. Mu J, Chen T, Li P, Ding D, Ma X, Zhang M, et al. Altered white matter microstructure mediates the relationship between hemoglobin levels and cognitive control deficits in end-stage renal disease patients. Hum Brain Mapp. 2018;39(12):4766–75. pmid:30062855

* View Article

* PubMed/NCBI

* Google Scholar

26. 26. Pan Y, Gong N, Xie W, Xie D, Jia L, Tao X. Prevalence and influencing factors of mild cognitive impairment in out-patients with chronic kidney disease. Chin Nurs Manag. 2021;21(08):1–17.

* View Article

* Google Scholar

27. 27. Jiang J, Ru J, Jiao H, Zhang X, Hou Y. Construction and validation of a predictive model for cognitive impairment after hemodialysis in chronic renal failure patients. Nurs Res. 2023;37(21):3838–44.

* View Article

* Google Scholar

28. 28. Xu W, Yang Q, Li L, Xiang Y, Yang Q. Construction and Validation of a Risk Prediction Model for Mild Cognitive Impairment in Non-Dialysis Chronic Kidney Disease Patient. Kidney Blood Press Res. 2024;49(1):556–80. pmid:38952104

* View Article

* PubMed/NCBI

* Google Scholar

Citation: Cao M, Tang B, Yang L, Zeng J (2025) Machine learning-based prediction model for cognitive impairment risk in patients with chronic kidney disease. PLoS One 20(6): e0324632. https://doi.org/10.1371/journal.pone.0324632

About the Authors:

Meng Cao

Contributed equally to this work with: Meng Cao, Bixia Tang

Roles: Conceptualization, Data curation, Methodology, Writing – original draft

Affiliations: School of Nursing, Chengdu Medical College, Chengdu, China, School of Nursing, Sichuan Vocational College of Health and Rehabilitation, Zigong, China

Bixia Tang

Contributed equally to this work with: Meng Cao, Bixia Tang

Roles: Conceptualization, Data curation, Methodology, Writing – review & editing

Affiliation: Department of Urology, Chengdu Seventh People’s Hospital, Chengdu, China

Liwei Yang

Roles: Supervision

E-mail: [email protected] (JZ); [email protected] (LY)

Affiliation: School of Nursing, Chengdu Medical College, Chengdu, China

Jing Zeng

Roles: Supervision

E-mail: [email protected] (JZ); [email protected] (LY)

Affiliation: School of Nursing, Chengdu Medical College, Chengdu, China

ORICD: https://orcid.org/0009-0008-0856-5934

[/RAW_REF_TEXT]

References

1. GBD 2021 Stroke Risk Factor Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Neurol. 2024;23(10):973–1003. pmid:39304265

2. Yang C, Wang W, Wang F, Wang Y, Zhang F, Liang Z, et al. Ambient PM2.5 components and prevalence of chronic kidney disease: a nationwide cross-sectional survey in China. Environ Geochem Health. 2024;46(2):70. pmid:38353840

3. Zhang F, Wang W, Zhang H. Interpretation on the clinical practice guidelines and expert consensus about exercise rehabilitation in chronic kidney disease. Chin J Blood Purif. 2022;21(2):111–4.

4. Ma Y. Management of exercise rehabilitation in chronic kidney disease patients. J Nephrol Dial Transplant. 2022;31(4):353–4.

5. Ortiz A. RICORS2040: the need for collaborative research in chronic kidney disease. Clin Kidney J. 2021;15(3):372–87. pmid:35211298

6. Miao L, Yang C, Zhu B. Research progress on cognitive impairment caused by chronic kidney disease. J Pract Med. 2017;33(11):1882–4.

7. Li W, Zeng L, Yuan S, Shang Y, Zhuang W, Chen Z. Machine learning for the prediction of cognitive impairment in older adults. Front Neurosci. 2023;17:1158141.

8. Feng YD, Huang Q, Zhou YL. Cognitive function and risk factors in patients with chronic kidney disease. J Capital Med Univ. 2023;44(5):788–94.

9. Yu Q, Li H, Wang S. Research progress on cognitive impairment in patients with chronic kidney disease. Chin J Blood Purif. 2021;20(08):509–11.

10. Lou F, Tian W, Tian H, Shi G. Construction and diagnostic value of a cardiovascular index joint prediction model for the risk of cognitive impairment in middle-aged and elderly populations. Nurs Res. 2022;36(05):753–61.

11. Lameire NH, Levin A, Kellum JA, Cheung M, Jadoul M, Winkelmayer WC, et al. Harmonizing acute and chronic kidney disease definition and classification: report of a Kidney Disease: Improving Global Outcomes (KDIGO) Consensus Conference. Kidney Int. 2021;100(3):516–26. pmid:34252450

12. Levey AS, Coresh J, Greene T, Stevens LA, Zhang YL, Hendriksen S, et al. Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate. Ann Intern Med. 2006;145(4):247–54. pmid:16908915

13. Fang K, Pan XM, Liu J, Wei T, Zhao PY, Qu QM, et al. Study on the correlation between visceral fat index and cognitive impairment under different age stratification. J Nanjing Med Univ (Nat Sci). 2023; 43(09): 1216–22.

14. Huang Y, Huang Z, Yang Q, Jin H, Xu T, Fu Y, et al. Predicting mild cognitive impairment among Chinese older adults: a longitudinal study based on long short-term memory networks and machine learning. Front Aging Neurosci. 2023;15:1283243. pmid:37937119

15. He Y, Zhao Q, Li X, Zhang T, Wu H. Research progress on cognitive impairment in maintenance hemodialysis patients. Chin J Blood Purif. 2023;22(7):529–32.

16. Wang SJ, Wang WR, Li XW, Liu YF, Wei JM, Zheng JG. Using machine learning algorithms for predicting cognitive impairment and identifying modifiable factors among Chinese elderly people. Front Aging Neurosci. 2022;14:977034.

17. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. pmid:29207961

18. Du M, Dai W, Liu J, Tao J. Less Social Participation Is Associated With a Higher Risk of Depressive Symptoms Among Chinese Older Adults: A Community-Based Longitudinal Prospective Cohort Study. Front Public Health. 2022;10:781771. pmid:35223728

19. Chinese Obesity Working Group. Guidelines for the prevention and control of overweight and obesity in Chinese adults (excerpt). Acta Nutr Sin. 2004;26(1):1–4.

20. Yang Q, Xiang Y, Ma G, Cao M, Fang Y, Xu W, et al. A nomogram prediction model for mild cognitive impairment in non-dialysis outpatient patients with chronic kidney disease. Ren Fail. 2024;46(1):2317450. pmid:38419596

21. Shao Y. Research on the intergenerational transmission of health and influencing factors in China. Shandong University of Finance and Economics. 2022.

22. Gong WJ. Mechanisms and rehabilitation treatment of age-related cognitive impairment. J Rehabil. 2024;34(06):529–35.

23. Wang Q, Liu C, Hou X, Hang X, Wu B. Impact of social participation types on cognitive function in the elderly. Chin J Prev Med. 2023;24(07):632–6.

24. Drew DA, Weiner DE, Sarnak MJ. Cognitive Impairment in CKD: Pathophysiology, Management, and Prevention. Am J Kidney Dis. 2019;74(6):782–90. pmid:31378643

25. Mu J, Chen T, Li P, Ding D, Ma X, Zhang M, et al. Altered white matter microstructure mediates the relationship between hemoglobin levels and cognitive control deficits in end-stage renal disease patients. Hum Brain Mapp. 2018;39(12):4766–75. pmid:30062855

26. Pan Y, Gong N, Xie W, Xie D, Jia L, Tao X. Prevalence and influencing factors of mild cognitive impairment in out-patients with chronic kidney disease. Chin Nurs Manag. 2021;21(08):1–17.

27. Jiang J, Ru J, Jiao H, Zhang X, Hou Y. Construction and validation of a predictive model for cognitive impairment after hemodialysis in chronic renal failure patients. Nurs Res. 2023;37(21):3838–44.

28. Xu W, Yang Q, Li L, Xiang Y, Yang Q. Construction and Validation of a Risk Prediction Model for Mild Cognitive Impairment in Non-Dialysis Chronic Kidney Disease Patient. Kidney Blood Press Res. 2024;49(1):556–80. pmid:38952104

Word count: 5392

Show less

© 2025 Cao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Background

The high prevalence of cognitive impairment (CI) in Chronic kidney disease (CKD) patients impacts their quality of life and prognosis, yet risk prediction models for CI in this population remain underexplored.

Objective

This study aimed to develop a risk prediction model for CI in CKD patients using machine learning algorithms, with the objective of enhancing risk prediction accuracy and facilitating early intervention.

Methods

A total of 415 CKD patients from the 2015 China Health and Retirement Longitudinal Survey (CHARLS) dataset were included in this study. Participants were categorized into two groups: the CI group (n = 53) and the non-CI group (n = 362). Binary logistic regression, encompassing both univariate and multivariate analyses, was conducted to identify influencing factors. Subsequently, a CI risk prediction model was constructed using four machine learning algorithms: Support Vector Machine (SVM), Random Forest (RF), Neural Network (NN), and Logistic Regression (LR). The optimal model was further assessed for predictor importance utilizing the SHAP method and deployed on a web platform using the Streamlit library.

Results

Logistic regression analysis identified age, hemoglobin concentration, education level, and social participation as significant factors influencing CI. Models based on NNET, RF, LR, and SVM algorithms were developed, achieving AUC of 0.918, 0.889, 0.872, and 0.760, respectively, on the test set. Calibration curves demonstrated that all models were well-calibrated. Among these, the NNET model exhibited the highest predictive performance. According to the SHAP analysis of the optimal model, the most influential predictors are age, education level, and hemoglobin concentration.

Conclusion

Machine learning models are valuable tools for predicting the risk of CI in CKD patients and can assist healthcare professionals in developing appropriate intervention strategies.

Details

Title

Machine learning-based prediction model for cognitive impairment risk in patients with chronic kidney disease

Author

Cao, Meng; Tang, Bixia; Yang, Liwei; Zeng, Jing

First page

e0324632

Section

Research Article

Publication year

2025

Publication date

Jun 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0324632

ProQuest document ID

3216324440

Machine learning-based prediction model for cognitive impairment risk in patients with chronic kidney disease

Jump to:

Full text

1. Introduction

2. Methods

2.1. Data source

2.2. Participants

2.3. Relevant definition

2.4. Outcome variable

2.5. Predictor variables

2.6. Statistical analysis

3. Results

3.1. Description of the characteristics of the research sample

3.2. The outcomes of univariate and multivariate analyses

3.3. Comparison of predictive performance among machine learning algorithms.

3.4. Optimal model SHAP visualization interpretation.

3.5. Deployment of the optimal model on the Web.

4. Discussion

4.1. Analysis of variables associated with the risk of CI in patients with CKD.

4.1.1. Age.

4.1.2. Education level.

4.1.3. Social participation.

4.1.4. Hemoglobin concentration.

4.2. Analysis of CI risk prediction model for CKD patients

4.3. Application of research

4.4. Limitations

5. Conclusions

References

Abstract

Details

Suggested sources