Abstract

Background

Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples.

Methods

We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease.

Results

Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3).

Conclusions

Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.

Details

Title
Bayesian copy number detection and association in large-scale studies
Author
Cristiano, Stephen; McKean, David; Carey, Jacob; Bracci, Paige; Brennan, Paul; Chou, Michael; Du, Mengmeng; Gallinger, Steven; Goggins, Michael G; Hassan, Manal M; Hung, Rayjean J; Kurtz, Robert C; Li, Donghui; Lu, Lingeng; Neale, Rachel; Olson, Sara
Pages
1-14
Section
Research article
Publication year
2020
Publication date
2020
Publisher
BioMed Central
e-ISSN
14712407
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2444112343
Copyright
© 2020. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.