Content area
Full text
ABSTRACT Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis.
(ProQuest: ... denotes formulae omitted.)
MANY modern statistical learning problems involve the analysis of high-dimensional data; this is particularly common in genetic studies where, for instance, phenotypes are regressed on large numbers of predictor variables (e.g., SNPs) concurrently. Implementing these large-p-with-small-n regressions (where n denotes sample size and p represents the number of predictors) poses several statistical and com- putational challenges, including how to confront the so-called "curse of dimensionality" (Bellman 1961) as well as the com- plexity of a genetic mechanism that can involve various types and orders of interactions. Recent developments in shrinkage and variable selection estimation procedures have made the implementation of these large-p-with-small-n regressions fea- sible. Consequently, whole-genome-regression approaches (Meuwissen et al. 2001) are becoming increasingly popular for the analysis and prediction of complex traits in plants (e.g., Crossa et al. 2010), animals (e.g.,Hayeset al. 2009, VanRaden et al. 2009), and humans (e.g.,Yanget al. 2010; Makowsky et al. 2011; Vazquez et al. 2012; de los Campos et al. 2013b).
In the past decade a large collection of parametric and nonparametric methods have been proposed and empirical evidence has demonstrated that no single approach per- forms best across data sets and traits. Indeed, the choice of the...