Abstract

Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300–400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.

Rare Mendelian disorders pose a major diagnostic challenge, but evaluation of automated tools that aim to uncover causal genes tools is limited. Here, the authors present a computational pipeline that simulates realistic clinical datasets to address this deficit.

Details

Title
Simulation of undiagnosed patients with novel genetic conditions
Author
Alsentzer, Emily 1 ; Finlayson, Samuel G. 2 ; Li, Michelle M. 3 ; Kobren, Shilpa N. 4   VIAFID ORCID Logo  ; Kohane, Isaac S. 4   VIAFID ORCID Logo 

 Harvard Medical School, Department of Biomedical Informatics, Boston, USA (GRID:grid.38142.3c) (ISNI:000000041936754X); Program in Health Sciences and Technology, MIT, Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786) 
 Harvard Medical School, Department of Biomedical Informatics, Boston, USA (GRID:grid.38142.3c) (ISNI:000000041936754X); Program in Health Sciences and Technology, MIT, Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786); Seattle Children’s Hospital, Department of Pediatrics, Division of Genetic Medicine, Seattle, USA (GRID:grid.240741.4) (ISNI:0000 0000 9026 4165); University of Washington, Division of Medical Genetics, Department of Medicine, Seattle, USA (GRID:grid.34477.33) (ISNI:0000 0001 2298 6657) 
 Harvard Medical School, Department of Biomedical Informatics, Boston, USA (GRID:grid.38142.3c) (ISNI:000000041936754X); Harvard Medical School, Bioinformatics and Integrative Genomics, Boston, USA (GRID:grid.38142.3c) (ISNI:000000041936754X) 
 Harvard Medical School, Department of Biomedical Informatics, Boston, USA (GRID:grid.38142.3c) (ISNI:000000041936754X) 
Pages
6403
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2876182265
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.