Full Text

Turn on search term navigation

© 2023 Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Objectives

The prevalence of diabetes has increased globally, leading to a significant disease burden and financial cost. Early prediction is crucial to control its prevalence.

Design

A prospective cohort study.

Setting

National representative study on Irish.

Participants

8504 individuals aged 50 years or older were included.

Primary and secondary outcome measures

Surveys were conducted to collect over 40 000 variables related to social, financial, health, mental and family status. Feature selection was performed using logistic regression. Different machine/deep learning algorithms were trained, including distributed random forest, extremely randomised trees, a generalised linear model with regularisation, a gradient boosting machine and a deep neural network. These algorithms were integrated into a stacked ensemble to generate the best model. The model was tested using various metrics, such as the area under the curve (AUC), log loss, mean per classification error, mean square error (MSE) and root MSE (RMSE). The SHapley Additive exPlanations (SHAP) method was used to interpret the established model.

Results

After 2 years, 105 baseline features were identified as major contributors to diabetes risk, including sex, low-density lipoprotein cholesterol and cirrhosis. The best model achieved high accuracy, robustness and discrimination in predicting diabetes risk, with an AUC of 0.854, log loss of 0.187, mean per classification error of 0.267, RMSE of 0.229 and MSE of 0.052 in the independent test set. The model was also shown to be well calibrated. The SHAP algorithm provided insights into the decision-making process of the model.

Conclusions

These findings could help physicians in the early identification of high-risk patients and implement targeted interventions to reduce diabetes incidence.

Details

Title
Tailored machine learning for evaluating the long-term diabetes risk in older individuals: findings from the Irish Longitudinal Study on Ageing (TILDA)
Author
Xu, Xuezhong 1 ; Xue Mingyang 2 ; Yang, Jie 3 ; Zheng, Hailong 4 ; Che, Zhifei 5   VIAFID ORCID Logo 

 Department of Endocrinology, People's Hospital of Wanning, Wanning, Hainan Province, China 
 Department of Industrial Design, Hubei University of Technology, Wuhan, Hubei Province, China 
 Department of Urology, The Second Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China 
 Department of Endocrinology, The First Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China 
 Department of Urology, The First Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China 
First page
e072991
Section
Diabetes and endocrinology
Publication year
2023
Publication date
2023
Publisher
BMJ Publishing Group LTD
e-ISSN
20446055
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2820495556
Copyright
© 2023 Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.