Full Text

Turn on search term navigation

© 2024 Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Introduction

None of the studies of type 2 diabetes (T2D) subtyping to date have used linked population-level data for incident and prevalent T2D, incorporating a diverse set of variables, explainable methods for cluster characterization, or adhered to an established framework. We aimed to develop and validate machine learning (ML)-informed subtypes for type 2 diabetes mellitus (T2D) using nationally representative data.

Research design and methods

In population-based electronic health records (2006–2020; Clinical Practice Research Datalink) in individuals ≥18 years with incident T2D (n=420 448), we included factors (n=3787), including demography, history, examination, biomarkers and medications. Using a published framework, we identified subtypes through nine unsupervised ML methods (K-means, K-means++, K-mode, K-prototype, mini-batch, agglomerative hierarchical clustering, Birch, Gaussian mixture models, and consensus clustering). We characterized clusters using intracluster distributions and explainable artificial intelligence (AI) techniques. We evaluated subtypes for (1) internal validity (within dataset; across methods); (2) prognostic validity (prediction for 5-year all-cause mortality, hospitalization and new chronic diseases); and (3) medication burden.

Results

Development: We identified four T2D subtypes: metabolic, early onset, late onset and cardiometabolic. Internal validity: Subtypes were predicted with high accuracy (F1 score >0.98). Prognostic validity: 5-year all-cause mortality, hospitalization, new chronic disease incidence and medication burden differed across T2D subtypes. Compared with the metabolic subtype, 5-year risks of mortality and hospitalization in incident T2D were highest in late-onset subtype (HR 1.95, 1.85–2.05 and 1.66, 1.58–1.75) and lowest in early-onset subtype (1.18, 1.11–1.27 and 0.85, 0.80–0.90). Incidence of chronic diseases was highest in late-onset subtype and lowest in early-onset subtype. Medications: Compared with the metabolic subtype, after adjusting for age, sex, and pre-T2D medications, late-onset subtype (1.31, 1.28–1.35) and early-onset subtype (0.83, 0.81–0.85) were most and least likely, respectively, to be prescribed medications within 5 years following T2D onset.

Conclusions

In the largest study using ML to date in incident T2D, we identified four distinct subtypes, with potential future implications for etiology, therapeutics, and risk prediction.

Details

Title
Identifying subtypes of type 2 diabetes mellitus with machine learning: development, internal validation, prognostic validation and medication burden in linked electronic health records in 420 448 individuals
Author
Mizani, Mehrdad A 1   VIAFID ORCID Logo  ; Dashtban, Ashkan 2 ; Pasea, Laura 2 ; Zeng, Qingjia 3 ; Khunti, Kamlesh 4 ; Valabhji, Jonathan 5 ; Jil Billy Mamza 6 ; Gao, He 7   VIAFID ORCID Logo  ; Morris, Tamsin 7 ; Banerjee, Amitava 8   VIAFID ORCID Logo 

 University College London, London, UK; British Heart Foundation Data Science Centre, Health Data Research UK, London, UK 
 University College London, London, UK 
 University College London, London, UK; Peking Union Medical College Hospital, Beijing, China 
 Diabetes Research Department, University of Leicester, Leicester, UK 
 NHS England and NHS Improvement London, London, UK; Imperial College Healthcare NHS Trust, London, UK 
 AstraZeneca Cambridge Biomedical Campus, Cambridge, UK 
 AstraZeneca, Cambridge, UK 
 University College London, London, UK; Barts Health NHS Trust, London, UK 
First page
e004191
Section
Epidemiology/Health services research
Publication year
2024
Publication date
2024
Publisher
BMJ Publishing Group LTD
e-ISSN
20524897
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3064221893
Copyright
© 2024 Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.