Abstract

Summary

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

Footnotes

* Numerous additions, particularly in the Supplementary Information (including dozens of supplementary figures and tables).

* https://gnomad.broadinstitute.org/

Details

Title
The mutational constraint spectrum quantified from variation in 141,456 humans
Author
Karczewski, Konrad J; Francioli, Laurent C; Tiao, Grace; Cummings, Beryl B; Alföldi, Jessica; Wang, Qingbo; Collins, Ryan L; Laricchia, Kristen M; Ganna, Andrea; Birnbaum, Daniel P; Gauthier, Laura D; Brand, Harrison; Solomonson, Matthew; Watts, Nicholas A; Rhodes, Daniel; Singer-Berk, Moriel; England, Eleina M; Seaby, Eleanor G; Kosmicki, Jack A; Walters, Raymond K; Tashman, Katherine; Farjoun, Yossi; Banks, Eric; Poterba, Timothy; Wang, Arcturus; Seed, Cotton; Whiffin, Nicola; Chong, Jessica X; Samocha, Kaitlin E; Pierce-Hoffman, Emma; Zappala, Zachary; Anne H O’donnell-Luria; Eric Vallabh Minikel; Weisburd, Ben; Lek, Monkol; Ware, James S; Vittal, Christopher; Armean, Irina M; Bergelson, Louis; Cibulskis, Kristian; Connolly, Kristen M; Covarrubias, Miguel; Donnelly, Stacey; Ferriera, Steven; Gabriel, Stacey; Gentry, Jeff; Gupta, Namrata; Thibault Jeandet; Kaplan, Diane; Llanwarne, Christopher; Munshi, Ruchi; Novod, Sam; Petrillo, Nikelle; Roazen, David; Ruano-Rubio, Valentin; Saltzman, Andrea; Schleicher, Molly; Soto, Jose; Tibbetts, Kathleen; Tolonen, Charlotte; Wade, Gordon; Talkowski, Michael E; Genome Aggregation Database (Gnomad) Consortium; Neale, Benjamin M; Daly, Mark J; Macarthur, Daniel G
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2020
Publication date
Apr 8, 2020
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2171884208
Copyright
© 2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.