Interpreting single-step genomic evaluation as a

Full text

Turn on search term navigation

Background

The single-step approach [1–3] has been successfully adopted in genomic evaluations when only a subset of phenotyped individuals in the pedigree are genotyped. The single-step approach uses information from genotyped and non-genotyped relatives in two equivalent ways: (a) calculating an improved relationship matrix from pedigree and observed genotypes of genotyped individuals to model the covariances of breeding values for all relatives [1]; or equivalently, (b) imputing genotypes for non-genotyped individuals linearly based on gene contents (i.e., genotypes) of genotyped individuals and the pedigree, then propagating the uncertainty from the imputation by fitting additional random effects accounting for imputation errors in genomic evaluations [3] (see “Appendix”). In practice, the linear imputation in (b) can be obtained by modeling the gene content of each marker as a quantitative trait with a very high heritability and fitting the “expected” gene content as random effects based on covariances defined by the pedigree [4]. Thus, the latter interpretation (b) of the single-step approach, involves three sequential layers of information: pedigree, genotypes, and phenotypes. This leads to our new representation of the single-step approach as a neural network of three fully-connected sequential layers of information: pedigree (input layer), genotypes (middle layer), and phenotypes (output layer), as demonstrated in Fig. 1.

Fig. 1 [Images not available. See PDF.]

Framework of single-step NNMM with three fully-connected sequential layers of data: pedigree, genotypes, and phenotypes. Between the layer of pedigree and the layer of genotypes, the gene content of each marker is treated as a quantitative trait, and the pedigree is used to define the random effects covariance matrix. Each node in the middle layer represents the gene content of one marker. “NA” denotes missing values. For example, the nodes in the middle layer may be 2,2,0,1,0 for a genotyped individual or all missing (“NA”) for a non-genotyped individual. For non-genotyped individuals, all gene contents are missing and will be sampled conditional on pedigree, genotypes, and phenotypes in MCMC

In previous work, we have proposed a method named “NNMM” (neural network with mixed models) for quantitative genetics, to extend mixed models (“MM”) to neural networks (“NN”) by adding intermediate layers of data (e.g., gene expression levels) between genotype and phenotype layers [5, 6]. Better prediction accuracies were observed when intermediate omics data were incorporated into genomic prediction using NNMM. In this paper, we show that NNMM can be adopted to incorporate pedigree, genotype, and phenotype information as a unified network named “single-step NNMM”, thus providing a new representation of the single-step approach, and yielding equivalent or higher prediction accuracies, due to the advantages described below.

Single-step NNMM has several advantages over the conventional single-step approach [1–3]. First, in the conventional single-step approach, gene contents of non-genotyped individuals are imputed based on the genotypes of genotyped individuals only through pedigree relationships. This can be considered as pre-analysis processing using Gengler’s method [4], and phenotypes are not included in this pre-analysis. We will show that, in single-step NNMM, such pre-analysis is not needed, and gene contents of non-genotyped individuals can be “imputed” based on pedigree, genotypes, and phenotypes in the Bayesian neural networks using Markov chain Monte Carlo (MCMC). Second, in single-step NNMM, the relationships between genotypes and phenotypes can be approximated by nonlinear activation functions of the neural network to introduce non-linearity between genotypes and phenotypes. Lastly, the conventional single-step approach requires individuals to be genotyped using the same single nucleotide polymorphism (SNP) panel (i.e., same markers for all genotyped individuals), while single-step NNMM can include individuals genotyped by different SNP panels (i.e., different markers for genotyped individuals) without pre-analysis.

In this paper, we present single-step NNMM for genomic evaluation, study its performance, and compare it to the conventional single-step approach [1–3]. Here, we focus on studying the effect of fitting pedigree, genotypes, and phenotypes jointly as three unified fully-connected sequential layers, in which gene contents of non-genotyped individuals are sampled conditional on all three layers of data. The same assumptions of linearity and of individuals being genotyped using the same SNP panel, as in the conventional single-step approach, were used in singe-step NNMM (i.e., a linear activation function, and individuals genotyped with the same SNP panel).

Methods

In single-step NNMM, three sequential layers of information, i.e., pedigree, genotypes and phenotypes, form a unified neural network (instead of two separate steps) as demonstrated in Fig. 1. Mixed models were used to infer unknowns, including missing gene content of non-genotyped individuals and marker effects. In detail, at each iteration of the MCMC, unknowns will be sampled using Gibbs sampling from their full conditional posterior distributions at three levels: (1) from the input layer (pedigree) to the middle layer (gene contents): pedigree-based best linear unbiased prediction (PBLUP); (2) from the middle layer (gene contents) to the output layer (phenotypes): genomic BLUP (GBLUP) or Bayesian Alphabet; and (3) sampling missing values in the middle layer (genotypes for non-genotyped individuals) based on three layers of information, including pedigree, observed genotypes of genotyped individuals, and phenotypes.

From input layer (pedigree) to middle layer (gene contents): Pedigree-based BLUP

Assuming there are m markers (i.e., m nodes in the middle layer), for the jth marker, the observed gene content (i.e., genotypes) of genotyped individuals can be modeled as:

z_{g, j} = 1 μ_{j} + W u_{j} + ε_{j},

where

z_{g, j}

is a vector of observed gene contents (i.e., genotypes coded as 0/1/2) of marker j for genotyped individuals, and

μ_{j}

is its overall mean with a flat prior;

u_{j}

is the vector of gene content deviations (i.e., centered genotypes) for individuals in the pedigree with a prior

u_{j} \sim M V N (0, A σ_{u_{j}}^{2})

, where the covariance matrix is the numerator relationship matrix of individuals in the pedigree (

A

), scaled by variance component

σ_{u_{j}}^{2}

; and

W

is the incidence matrix associating

u_{j}

with

z_{g, j}

. The vector of random residuals,

ε_{j}

, is included to allow the use of mixed model equations, and to account for genotype or pedigree errors [4]. The prior of

ε_{j}

ε_{j} \sim N (0, I σ_{ϵ_{j}}^{2})

. In principle, the heritability of gene content of each SNP

(\frac{σ_{u_{j}}^{2}}{σ_{u_{j}}^{2} + σ_{ϵ_{j}}^{2}})

should be 1 if the genotypes and pedigree information are perfectly correct and, thus, a small value of the estimated heritability indicates that there are errors in either genotypes or pedigree. Variance components are treated as unknowns in single-step NNMM, and scaled inverse chi-square distributions are assigned as prior distributions for variance components.

From middle layer (gene contents) to output layer (phenotypes): GBLUP or Bayesian Alphabet

The phenotypes can be modeled as:

y = 1 μ + \sum_{j = 1}^{m} z_{j} α_{j} + e,

where

y

is the vector of phenotypes,

μ

is the overall mean with a flat prior,

z_{j}

is a vector of (observed and sampled) gene contents for the jth marker (

j = 1, \dots, m

), and

α_{j}

is the corresponding marker effect. Priors from GBLUP [7–9] or the Bayesian Alphabet [10–18], such as BayesC

π

, can be used for sampling marker effects or breeding values. The vector

e

represents the residuals of phenotypes, with prior

e \sim N (0, I σ_{e}^{2})

. The prior distribution of

σ_{e}^{2}

itself follows a scaled inverse chi-square distribution.

Sampling missing values in the middle layer (gene contents)

Here we label the matrices related to non-genotyped and genotyped individuals with subscripts “n” and “g”, respectively. For the jth marker, the full conditional posterior distribution of the missing gene content is proportional to the product of its prior and the likelihood:

\begin{matrix} f (z_{n, j} | Z_{n, - j}, Z_{g}, y, A, U, E L S E) \\ \propto f (y | Z_{n}, Z_{g}, E L S E) f (z_{n, j}, z_{g, j} | u_{j}, A, E L S E), \end{matrix}

where ELSE includes

μ

α_{j}

σ_{e}^{2}

μ_{j}

σ_{u_{j}}^{2}

, and

σ_{ϵ_{j}}^{2}

for

j = 1, \dots, m

, denoting the current values of all other unknowns except

Z_{n} = [z_{n, 1}, \dots, z_{n, m}]

and

U = [u_{1}, \dots, u_{m}]

. Detailed derivations are in “Appendix”.

When a nonlinear relationship is assumed between the middle layer (gene contents) and the output layer (phenotypes), Hamiltonian Monte Carlo (HMC) [19] may be employed for sampling missing genotypes. Note that if a linear relationship is assumed, missing genotypes can be sampled directly from a normal distribution at each iteration.

Data analysis

Assuming linear relationships between genotypes (middle layer) and phenotypes (output layer), and that the same SNP panel is used for all genotyped individuals in the conventional single-step approach, we applied the same assumptions in the single-step NNMM (i.e., a linear activation function and individuals genotyped with the same SNP panel) to compare the prediction performance of these two methods. Thus, GBLUP was employed between the middle layer (gene contents) and the output layer (phenotypes) in the single-step NNMM (i.e., SS-NN-GBLUP), and its performance was compared to the conventional single-step GBLUP approach (i.e., SS-GBLUP).

The pig dataset from [20] was used, which includes 3534 genotyped individuals, and a pedigree of 6473 individuals including parents and grandparents of the genotyped animals. Estimates of heritability of gene content for each marker were close to 1 [21]. In our analysis, we used 10,000 randomly-selected SNPs as the genotype data. A random sample of 0.5%, i.e. 50, of these markers was selected as quantitative trait loci (QTL), and they were included in the genotypes. Phenotypes were simulated with a heritability of 0.7 and a phenotypic variance of 1. The 100 youngest individuals, whose genotypes were observed but phenotypes were unknown, were used for testing, while the remaining individuals (i.e., 3434 individuals) with known phenotypes were used for training.

To compare the single-step NNMM with the conventional single-step method in this study, different proportions of non-genotyped individuals in the training dataset were considered, including 30, 50, 70, and 90%, and there were 10 replicates for each scenario. For each replicate, individuals were randomly selected to be non-genotyped individuals. The prediction accuracy was calculated as the Pearson correlation between the true breeding values and the estimated breeding values for individuals in the testing dataset. In single-step NNMM, at least 2000 MCMC iterations were applied to ensure convergence.

In single-step NNMM, the heritability ( $h^{2}$ ) of gene content in Eq. 1 can be considered as known to be 1 or unknown. When the heritability is considered known, a value close to 1 (i.e., $h^{2} = 0.999$ ) is used to facilitate the use of mixed model equations. In single-step NNMM, two strategies were used to sample missing genotypes of non-genotyped individuals, i.e., missing genotypes were sampled conditionally on or unconditionally on phenotypes.

Unlike the conventional single-step approach, which requires individuals to be genotyped using the same SNP panel (i.e., identical markers for all genotyped individuals), the single-step NNMM can accommodate individuals genotyped with different SNP panels (i.e., varying markers for genotyped individuals). Thus, we also tested scenarios where SNP sets differed among individuals.

Results

In single-step NNMM (SS-NN-GBLUP, i.e., single-step NNMM with GBLUP between middle and output layer), when the heritability ( $h^{2}$ ) in Eq. 1 is considered unknown, the estimated heritability for each SNP was very close to 1.0, and similar results for SS-NN-GBLUP were observed regardless of whether the heritability ( $h^{2}$ ) in Eq. 1 was assumed known (i.e., $h^{2} = 1$ ) or unknown. Thus, only the results obtained with SS-NN-GBLUP with $h^{2} = 1$ in Eq. 1 are presented. We compared the results of the conventional single-step method (SS-GBLUP, i.e., conventional single-step GBULP) and single-step NNMM (SS-NN-GBLUP) when missing genotypes of non-genotyped individuals were sampled conditionally on phenotypes, as described in Eq. 3. The results, presented in Table 1, demonstrate the prediction accuracy when various proportions of phenotyped individuals were genotyped. In general, the prediction accuracy of both methods decreased as the proportion of non-genotyped individuals increased. Overall, the SS-NN-GBLUP displayed a similar prediction accuracy to the SS-GBLUP approach, with no significant differences observed (pairwise t-test at a significance level of p < 0.01). The correlation between estimated marker effects from these two methods was high. Both the conventional single-step approach and single-step NNMM had significantly higher prediction accuracies compared to GBLUP using genotyped individuals only. The running time of SS-NN-GBLUP was less than 2 h using 20 central processing units (CPUs), while the conventional SS-GBLUP only took a few minutes (see “Discussion”). In addition, similar results were observed for SS-NN-GBLUP whether missing genotypes were sampled conditional on or unconditional on phenotypes.

Table 1. Comparison of prediction performances between conventional single-step (SS-GBLUP) and single-step NNMM (SS-NN-GBLUP)

Method	% non-genotyped individuals
Method	30%	50%	70%	90%
SS-GBLUP	0.808 (0.005)	0.757 (0.007)	0.694 (0.013)	0.558 (0.015)
SS-NN-GBLUP	0.810 (0.006)	0.754 (0.007)	0.691 (0.013)	0.559 (0.014)

The average prediction accuracies from 10 replications with the standard deviation in brackets

We also tested scenarios where SNP sets differed among individuals, and the prediction accuracies aligned with our expectations. For example, when we randomly introduced 50% missing values in the genotype covariate matrix of the training dataset, the prediction accuracy was 0.767, with a standard deviation of 0.011 across 10 replications. This result is reasonable when compared to our previous findings. Note that when all individuals were genotyped, the prediction accuracy of GBLUP was 0.849.

Discussion

In this paper, we propose a new method named single-step NNMM, which presents a novel framework for single-step methods by treating gene content (i.e., genotypes) as a middle layer of data between pedigree and phenotypes. Single-step NNMM represents single-step genomic evaluations as a neural network of three sequential layers: pedigree, genotypes, and phenotypes. Single-step NNMM is based on linear mixed models, i.e. PBLUP between the input layer (pedigree) and middle layer (gene content) and GBLUP/Bayesian Alphabet between the middle layer and the output layer (phenotype). This approach allows us to benefit from the implementation and optimization of well-studied linear mixed models for genomic prediction. Using the pedigree-based relationship matrix as an input of a neural network is not new. Gianola et al. [22] have shown that PBLUP is equivalent to a single (middle) layer neural network with a linear activation function, when the input is a pedigree-based relationship matrix. However, single-step NNMM extends conventional mixed models to a neural network with heterogeneous input data across multiple layers (more than two, i.e., pedigree, genotypes, phenotypes), whereas conventional mixed models or neural networks only consider two layers of data (input and output layers).

Compared to the conventional single-step method, the three sequential layers of information in single-step NNMM form a unified network, rather than two separate steps. Thus, the unobserved gene contents of non-genotyped individuals can be sampled based on information from all three layers: pedigree, observed genotypes of genotyped individuals, and phenotypes. Single-step NNMM offers a highly flexible framework for single-step methods, which allows nonlinear relationships between gene contents and phenotypes, as well as the genotyping of different individuals using distinct SNP panels (i.e., various patterns of missing genotypes). The single-step NNMM has been implemented in the software package “JWAS” [23, 24].

In our comparison, the same assumptions of linearity and identical SNP panels, as in conventional single-step approach, were used in singe-step NNMM. Overall, when some individuals were not genotyped, single-step NNMM had similar prediction accuracy as the conventional single-step approach, and the correlation between estimated marker effects from these two methods was high. Both conventional single-step approach and single-step NNMM had significantly higher prediction accuracies compared to GBLUP using genotyped individuals only.

As we have described, in addition to allowing non-linearity and individuals being genotyped with different SNP panels, a difference between single-step NNMM and the conventional single-step approach is in genotype imputation. Besides genotypes and pedigree, phenotypic information can also be used in the sampling of missing genotypes for non-genotyped individuals in single-step NNMM. However, similar prediction accuracies were observed regardless of whether missing genotypes were sampled conditional on or unconditional on phenotypes in SS-NN-GBLUP. For polygenic traits, this observation may be attributed to at least two reasons. First, a single SNP contributes only a small proportion of heritability and the correlation between the gene content of one SNP and phenotypes is generally low. As a result, incorporating phenotypic information into genotype imputation may introduce more noise than useful information. Second, phenotypes aid only in the imputation of causal variants, and variants in high linkage disequilibrium with causal variants. However, phenotypic information is employed in the imputation of all SNPs and can potentially introduce errors in marker imputation. However, when genotypes of relatives provide limited information (e.g., most individuals are not genotyped), the additional benefits in genotype imputation by including phenotypic information may not be negligible.

To enhance the applicability of our method to more realistic datasets, we implemented parallel computing using Message Passing Interface (MPI) [25], taking advantage of multiple computer processors’ capabilities. Ideally, with a sufficient number of computer processors, the computation time from the input layer (pedigree) to the middle layer (gene content) would be equal to the time required for one PBLUP, which should be relatively fast. The speed improvement from parallel computing is limited, however, by the hardware used. In our analysis of the pig dataset (i.e., 6473 individuals in the pedigree, 10,000 SNPs, and 3534 individuals with genotypes), running 2000 MCMC iterations on this dataset using 20 central processing units (CPUs) took less than 2 h for single-step NNMM, while the conventional single-step approach only took a few minutes. In future research, we plan to explore the use of graphics processing units (GPUs), which are commonly employed in neural networks, and more advanced parallel computing strategies (e.g., [26, 27]).

Acknowledgements

Dr. Rohan Fernando’s research, mentorship, and teaching have fundamentally shaped our studies. He has entirely changed HC’s career path. HC’s mentoring approach has also been deeply impacted by Rohan, as reflected in this paper.

Author contributions

HC conceived the study. HC and TZ developed the methods, implemented the algorithms, planned the validations, wrote the manuscript. Both authors read and approved the final manuscript.

Funding

This work was supported by the United States Department of Agriculture, Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive Grant No. 2021-67015-33412 and No. 2023-67015-39564.

Availability of data and materials

Pig genotypes and pedigree used in the analysis are publicly available in [20]. The simulated phenotypes and all scripts are available at https://github.com/zhaotianjing/SSNNMM. The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Declarations

Competing interests

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Legarra, A; Aguilar, I; Misztal, I. A relationship matrix including full pedigree and genomic information. J Dairy Sci; 2009; 92, pp. 4656-63.[COI: 1:CAS:528:DC%2BD1MXhtVKqtr3E] [DOI: https://dx.doi.org/10.3168/jds.2009-2061] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19700729]

2. Christensen, OF; Lund, MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol; 2010; 42, 2. [DOI: https://dx.doi.org/10.1186/1297-9686-42-2] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20105297][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2834608]

3. Fernando, RL; Dekkers, JC; Garrick, DJ. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet Sel Evol; 2014; 46, 50. [DOI: https://dx.doi.org/10.1186/1297-9686-46-50] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25253441][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255]

4. Gengler, N; Mayeres, P; Szydlowski, M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal; 2007; 1, pp. 21-8.[COI: 1:CAS:528:DC%2BD2sXotlWhtbw%3D] [DOI: https://dx.doi.org/10.1017/S1751731107392628] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22444206]

5. Zhao, T; Zeng, J; Cheng, H. Extend mixed models to multi-layer neural networks for genomic prediction including intermediate omics data. Genetics; 2022; 221, [DOI: https://dx.doi.org/10.1093/genetics/iyac034] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35212766][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9071534]

6. Zhao, T; Fernando, R; Cheng, H. Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies. G3 (Bethesda); 2021; 11, [DOI: https://dx.doi.org/10.1093/g3journal/jkab228] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34499126]

7. Habier, D; Fernando, RL; Dekkers, JC. The impact of genetic relationship information on genome-assisted breeding values. Genetics; 2007; 177, pp. 2389-97.[COI: 1:CAS:528:DC%2BD1cXhtlCjsb4%3D] [DOI: https://dx.doi.org/10.1534/genetics.107.081190] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18073436][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2219482]

8. VanRaden, PM. Efficient methods to compute genomic predictions. J Dairy Sci; 2008; 91, pp. 4414-23.[COI: 1:CAS:528:DC%2BD1cXhtlajtLzO] [DOI: https://dx.doi.org/10.3168/jds.2007-0980] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18946147]

9. Hayes, BJ; Visscher, PM; Goddard, ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res; 2009; 91, pp. 47-60.[COI: 1:CAS:528:DC%2BD1MXit1aisbc%3D] [DOI: https://dx.doi.org/10.1017/S0016672308009981]

10. Meuwissen, THE; Hayes, BJ; Goddard, ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics; 2001; 157, pp. 1819-29.[COI: 1:CAS:528:DC%2BD3MXjsFemtbY%3D] [DOI: https://dx.doi.org/10.1093/genetics/157.4.1819] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/11290733][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461589]

11. Kizilkaya, K; Fernando, RL; Garrick, DJ. Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. J Anim Sci; 2010; 88, pp. 544-51.[COI: 1:CAS:528:DC%2BC3cXktVOqt74%3D] [DOI: https://dx.doi.org/10.2527/jas.2009-2064] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19820059]

12. Habier, D; Fernando, RL; Kizilkaya, K; Garrick, DJ. Extension of the Bayesian alphabet for genomic selection. BMC Bioinform; 2011; 12, 186. [DOI: https://dx.doi.org/10.1186/1471-2105-12-186]

13. Park, T; Casella, G. The Bayesian lasso. J Am Stat Assoc; 2008; 103, pp. 681-6.[COI: 1:CAS:528:DC%2BD1cXptlansL8%3D] [DOI: https://dx.doi.org/10.1198/016214508000000337]

14. Cheng, H; Qu, L; Garrick, DJ; Fernando, RL. A fast and efficient Gibbs sampler for BayesB in whole-genome analyses. Genet Sel Evol; 2015; 47, 80. [DOI: https://dx.doi.org/10.1186/s12711-015-0157-x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26467850][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606519]

15. Gianola, D; Fernando, RL. A multiple-trait Bayesian Lasso for genome-enabled analysis and prediction of complex traits. Genetics; 2020; 214, pp. 305-31. [DOI: https://dx.doi.org/10.1534/genetics.119.302934] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31879318]

16. Erbe, M; Hayes, B; Matukumalli, L; Goswami, S; Bowman, P; Reich, C et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci; 2012; 95, pp. 4114-29.[COI: 1:CAS:528:DC%2BC38XoslGqt74%3D] [DOI: https://dx.doi.org/10.3168/jds.2011-5019] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22720968]

17. Moser, G; Lee, SH; Hayes, BJ; Goddard, ME; Wray, NR; Visscher, PM. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet; 2015; 11, [DOI: https://dx.doi.org/10.1371/journal.pgen.1004969] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25849665][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4388571]

18. Cheng, H; Kizilkaya, K; Zeng, J; Garrick, D; Fernando, R. Genomic prediction from multiple-trait Bayesian regression methods using mixture priors. Genetics; 2018; 209, pp. 89-103. [DOI: https://dx.doi.org/10.1534/genetics.118.300650] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29514861][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5937171]

19. Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv. 2018:1701.02434.

20. Cleveland, MA; Hickey, JM; Forni, S. A common dataset for genomic analysis of livestock populations. G3 (Bethesda); 2012; 2, pp. 429-435. [DOI: https://dx.doi.org/10.1534/g3.111.001453] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22540034]

21. Forneris, NS; Legarra, A; Vitezica, ZG; Tsuruta, S; Aguilar, I; Misztal, I et al. Quality control of genotypes using heritability estimates of gene content at the marker. Genetics; 2015; 199, pp. 675-81. [DOI: https://dx.doi.org/10.1534/genetics.114.173559] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25567991][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4349063]

22. Gianola, D; Okut, H; Weigel, KA; Rosa, GJ. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet; 2011; 12, 87.[COI: 1:CAS:528:DC%2BC3MXhsVWju77K] [DOI: https://dx.doi.org/10.1186/1471-2156-12-87] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21981731][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3474182]

23. Cheng H, Fernando R, Garrick D. JWAS: Julia implementation of whole-genome analysis software. In: Proceedings of the 11th world congress on genetics applied to livestock production. Auckland; 11–16 February 2018.

24. Cheng H, Fernando R, Garrick D, Zhao T, Qu J. JWAS version 2: leveraging biological information and high throughput phenotypes into genomic prediction and association. In: Proceedings of the 12th world congress on genetics applied to livestock production. Rotterdam; 3–8 July 2022.

25. Byrne S, Wilcox LC, Churavy V. MPI. jl: Julia bindings for the message passing interface. In: Proceedings of the JuliaCon conferences. virtual; 28–30 July 2021.

26. Zhao, T; Fernando, R; Garrick, D; Cheng, H. Fast parallelized sampling of Bayesian regression models for whole-genome prediction. Genet Sel Evol; 2020; 52, 16. [DOI: https://dx.doi.org/10.1186/s12711-020-00533-x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32293243][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087391]

27. Breen, EJ; MacLeod, IM; Ho, PN; Haile-Mariam, M; Pryce, JE; Thomas, CD et al. BayesR3 enables fast MCMC blocked processing for largescale multi-trait genomic prediction and QTN mapping analysis. Commun Biol; 2022; 5, 661.[COI: 1:CAS:528:DC%2BB38XhvVGmsrrK] [DOI: https://dx.doi.org/10.1038/s42003-022-03624-1] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35790806][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9256732]

Word count: 3687

Show less

© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The single-step approach has become the most widely-used methodology for genomic evaluations when only a subset of phenotyped individuals in the pedigree are genotyped, where the genotypes for non-genotyped individuals are imputed based on gene contents (i.e., genotypes) of genotyped individuals through their pedigree relationships. We proposed a new method named single-step neural network with mixed models (NNMM) to represent single-step genomic evaluations as a neural network of three sequential layers: pedigree, genotypes, and phenotypes. These three sequential layers of information create a unified network instead of two separate steps, allowing the unobserved gene contents of non-genotyped individuals to be sampled based on pedigree, observed genotypes of genotyped individuals, and phenotypes. In addition to imputation of genotypes using all three sources of information, including phenotypes, genotypes, and pedigree, single-step NNMM provides a more flexible framework to allow nonlinear relationships between genotypes and phenotypes, and for individuals to be genotyped with different single-nucleotide polymorphism (SNP) panels. The single-step NNMM has been implemented in the software package “JWAS’.

Details

Title

Interpreting single-step genomic evaluation as a neural network of three layers: pedigree, genotypes, and phenotypes

Author

Zhao, Tianjing¹

; Cheng, Hao²

¹ University of California Davis, Department of Animal Science, Davis, USA (GRID:grid.27860.3b) (ISNI:0000 0004 1936 9684); University of California Davis, Integrative Genetics and Genomics Graduate Group, Davis, USA (GRID:grid.27860.3b) (ISNI:0000 0004 1936 9684)
² University of California Davis, Department of Animal Science, Davis, USA (GRID:grid.27860.3b) (ISNI:0000 0004 1936 9684)

Pages

Publication year

2023

Publication date

Dec 2023

Publisher

BioMed Central

ISSN

0999193X

e-ISSN

12979686

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s12711-023-00838-7

ProQuest document ID

2871981028

Interpreting single-step genomic evaluation as a neural network of three layers: pedigree, genotypes, and phenotypes

Jump to:

Full text

Abstract

Details

Suggested sources