Content area
Full text
Introduction
Genetic studies can help identify the contributions of different variants and genes to various processes and pathways. Identifying pleiotropic genes can help us better understand the mechanism of metabolism pathways1,2. Given that technological advances have significantly accelerated the availability of various multi-omics data types (e.g., genomics, epigenomics, transcriptomics, proteomics, metabolomics, glycomics)3, an unprecedented opportunity arises in the characterization and quantification of pleiotropic genes and genetic variants that regulate multiple phenotypes. However, data analytic techniques to detect pleiotropic genes now lag behind the requirements for increasing high-dimensional data; there are few adequate data analytic methods and software tools available to address the complexity and multimodality of biological data in the detection of pleiotropic genes. Valid statistical methods are essential to explore and understand the underlying biology, generate new hypotheses, and design new experiments to deliver potentially better therapeutics as part of the effort to turn data into knowledge that ultimately improves human quality of life.
Our methods development is largely motivated by the objective of identifying pleiotropic genes for various metabolic traits associated with Type 2 diabetes (T2D) in the Metabolic Syndrome in Men (METSIM) cohort4, a longitudinal study of 10,197 middle-aged and older Finnish men that seeks to identify genetic variants that contribute to the risk of metabolic and cardiovascular disease. T2D is a complex trait that largely involves the interplay between multiple genes5,6. Discovering pleiotropic genetic variants is one of the key tasks to understand how multiple genetic variants interact in biochemical pathways, influencing the risk of developing T2D. Currently, most genome-wide association studies (GWAS) do not formally test for pleiotropy. If testing of pleiotropy is performed, they are based on a single-trait, single-variant analysis approach, which tests for the association of each trait with each variant7,8, followed by a second stage of detecting pleiotropic variants using certain GWAS summary statistics9, 10, 11–12. As evidenced by our investigation in this paper, in comparison to our proposed joint modeling approach, existing approaches based on marginal associations cannot control the false discovery rate (FDR) and hence are susceptible to spurious findings in the study of genetic pleiotropy. This is due largely to the fact that existing marginal methods may over-estimate the variance of...