Abstract

Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria.

Estimating disease prevalence in biobanks is prone to error, especially for self-reported traits. Here, the authors propose a method to estimate the prevalence of a disease within a cohort based on genetic risk scores.

Details

Title
Estimating disease prevalence in large datasets using genetic risk scores
Author
Evans, Benjamin D 1   VIAFID ORCID Logo  ; Słowiński Piotr 2   VIAFID ORCID Logo  ; Hattersley, Andrew T 3   VIAFID ORCID Logo  ; Jones, Samuel E 4   VIAFID ORCID Logo  ; Sharp, Seth 4 ; Kimmitt, Robert A 3 ; Weedon, Michael N 4 ; Oram, Richard A 3   VIAFID ORCID Logo  ; Tsaneva-Atanasova Krasimira 5   VIAFID ORCID Logo  ; Thomas, Nicholas J 6   VIAFID ORCID Logo 

 University of Exeter, Department of Mathematics, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); University of Exeter, Living Systems Institute, Centre for Biomedical Modelling and Analysis, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); University of Bristol, School of Psychological Science, Bristol, UK (GRID:grid.5337.2) (ISNI:0000 0004 1936 7603) 
 University of Exeter, Department of Mathematics, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); University of Exeter, Living Systems Institute, Translational Research Exchange @ Exeter, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024) 
 University of Exeter Medical School, Institute of Biomedical & Clinical Science, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); Royal Devon & Exeter NHS Foundation Trust, Exeter, UK (GRID:grid.419309.6) (ISNI:0000 0004 0495 6261) 
 University of Exeter Medical School, Institute of Biomedical & Clinical Science, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024) 
 University of Exeter, Department of Mathematics, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); Living Systems Institute, EPSRC Hub for Quantitative Modelling in Healthcare, University of Exeter, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024) 
 University of Exeter, Department of Mathematics, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); University of Exeter, Living Systems Institute, Centre for Biomedical Modelling and Analysis, Exeter, UK (GRID:grid.8391.3) (ISNI:0000 0004 1936 8024); Royal Devon & Exeter NHS Foundation Trust, Exeter, UK (GRID:grid.419309.6) (ISNI:0000 0004 0495 6261) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2594891122
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.