Content area
Full Text
Heredity (2014) 113, 526532 & 2014 Macmillan Publishers Limited All rights reserved 0018-067X/14
http://www.nature.com/hdy
Web End =www.nature.com/hdy
ORIGINAL ARTICLE
Multiple-trait genome-wide association study based on principal component analysis for residual covariance matrix
H Gao1, T Zhang2, Y Wu3, Y Wu1, L Jiang4, J Zhan4, J Li1 and R Yang4
Given the drawbacks of implementing multivariate analysis for mapping multiple traits in genome-wide association study (GWAS), principal component analysis (PCA) has been widely used to generate independent super traits from the original multivariate phenotypic traits for the univariate analysis. However, parameter estimates in this framework may not be the same as those from the joint analysis of all traits, leading to spurious linkage results. In this paper, we propose to perform the PCA for residual covariance matrix instead of the phenotypical covariance matrix, based on which multiple traits are transformed toa group of pseudo principal components. The PCA for residual covariance matrix allows analyzing each pseudo principal component separately. In addition, all parameter estimates are equivalent to those obtained from the joint multivariate analysis under a linear transformation. However, a fast least absolute shrinkage and selection operator (LASSO) for estimating the sparse oversaturated genetic model greatly reduces the computational costs of this procedure. Extensive simulations show statistical and computational efciencies of the proposed method. We illustrate this method in a GWAS for 20 slaughtering traits and meat quality traits in beef cattle.
Heredity (2014) 113, 526532; doi:http://dx.doi.org/10.1038/hdy.2014.57
Web End =10.1038/hdy.2014.57; published online 2 July 2014
INTRODUCTIONWith the advance of high-throughput genotyping technology, the paradigm of mapping quantitative trait locus (QTL) based on the linkage analysis of sparse genetic markers has gradually shifted to genome-wide association studies (GWAS) based on thousands and thousands of single-nucleotide polymorphisms (SNPs). On the other hand, association studies tend to involve more than one quantitative traits or complex diseases located in different regions of chromosomes, allowing the investigation of common genetic risk factors underlying multiple traits. Although these traits could be analyzed separately with univariate genetic model, statistical methods and algorithms have been developed for simultaneously analyzing multiple normal traits (Jiang and Zeng, 1995; Fang et al., 2008; Ayroles et al., 2009; Zhu and Zhang, 2009; Stephens, 2010; Nadeau and Dudley, 2011; Shriner, 2012), multiple discrete traits (Lange and Whittaker, 2001; Xu...