Multi‐trait genomic prediction improves selection

Full text

Translate

Turn on search term navigation

Abbreviations

BLUE
best linear unbiased estimate

GEBV
genomic estimated breeding value

GS
genomic selection

ICP-AES
inductively coupled plasma atomic emission spectrometry

MT
multi-trait

MT-GS
multi-trait genomic selection

NDSU
North Dakota State University

PBT
partially balanced testing

SNP
single-nucleotide polymorphism

UNI
univariate

INTRODUCTION

In recent times, there has been an increased demand for genetic improvement of nutritional traits in crops because of the growing demand for plant-based protein, mineral elements, and vitamins. Pulse crops are known to have high protein value and are rich in micronutrients with potential to alleviate hidden hunger (micronutrient deficiency) (Mudryj et al., 2014; Wadhawan et al., 2021; Bari et al., 2021). However, phenotyping and screening for nutritional traits, such as protein, manganese, selenium, copper, zinc, iron, potassium, phosphorus, magnesium, and calcium, is expensive and time consuming especially in the early yield testing stage with hundreds of lines to evaluate. Consequently, this becomes a major limitation in a public breeding program aiming to have a biofortified product profile. Because of advancement in next-generation sequencing and development of genotyping platforms, the cost of genotyping is becoming relatively less expensive than the cost of phenotyping; thus, genomic selection (GS) that uses whole-genome information to predict genomic estimated breeding value (GEBV) of unobserved genotypes is gaining traction as breeders’ choice of selection method (Poland et al., 2012; Zhao et al., 2021; Bassi et al., 2016; Santantonio et al., 2020; Atanda, et al., 2021a). Though GS research in pea (Pisum sativum L.) is scanty, the available studies (Annicchiarico et al., 2019; Crosta et al., 2021; Bari et al., 2021) showed GS potential to predict the genetic merit of pea breeding lines and germplasm accessions. Following Bari et al. (2021), the North Dakota State University (NDSU) pulse breeding program is prioritizing the use of GS particularly in the preliminary yield trial (or Stage 1) where effectiveness of phenotypic selection is limited by phenotyping in one or two locations because of seed multiplication challenges for hundreds of lines for multi-location trials. Consequently, the NDSU pulse breeding program is focused on redesigning the preliminary yield trial from phenotypic based selection to GS to reduce the number of seeds for phenotyping and increase selection accuracy for advancement of promising lines to advanced yield testing stage.

In general, GS is often performed with univariate (UNI) models that assume genetic correlation among traits to be zero (Jia & Jannink, 2012; Montesinos-López et al., 2016, 2018; Bhatta et al., 2020; Gaire et al., 2022). However, in practice, breeders select for multiple traits that are genetically correlated. To harness the genetic correlation between traits and among genotypes to improve predictive ability, multi-trait (MT) models, which are the generalization of UNI models, have been investigated. Several empirical studies (Calus & Veerkamp, 2011; Montesinos-López et al., 2018; Bhatta et al., 2020; Gaire et al., 2022) have reported improved predictive ability in different crops using MT models that allow borrowing of information between correlated traits and among genotypes compared with UNI models. Predictive ability in MT-GS improves as correlation between traits increases (Jia & Jannink, 2012; Okeke et al., 2017; Montesinos-López et al., 2018, 2019; Neyhart et al., 2019); however, trait heritability varies and is a key limiting factor to the upper bound of predictive ability (Manolio et al., 2009; Yang et al., 2015; Schopp et al., 2017; Zhang et al., 2019; Atanda et al., 2021a). These factors (heritability and genetic correlation) will likely influence the composition of traits in the training and the prediction sets in MT-GS and ultimately the predictive ability.

In the MT-GS models, the training set consists of individuals with phenotypic records for all traits to predict the genetic values of unphenotyped individuals in the prediction set using genome-wide marker information. The crucial question is how to design a MT-GS strategy that can optimize the trade-off between the limiting factors and accuracy of predicting the genetic value of the traits. A few studies (Montesinos-López et al., 2016, 2018, 2019; Guo et al., 2014; Bhatta et al., 2020; Gaire et al., 2022) have highlighted the importance of each factor to predictive ability; however, nothing is known about their combinations on composition of traits in the training and prediction set in the context of MT-GS. Consequently, we investigated the influence of the combination of limiting factors on composition of traits in the training and prediction sets to further guide the use of MT-GS in a breeding program.

Further, in most MT-GS cross-validation studies (Montesinos-López et al., 2016, 2018, 2019; Guo et al., 2014; Bhatta et al., 2020; Gaire et al., 2022), the same set of genotypes overlap across traits for testing prediction models (Supplemental Figure S1a,c). In such a scenario, the same set of genotypes have phenotypic records for all traits, while the other genotypes serve as a prediction set (partially balanced testing [PBT] strategy ); however, results from multi-environment GS studies have shown that this approach is less optimal than sparse testing using genomic prediction where phenotyping of genotypes is split across environments (Burgueño et al., 2012; Jarquin et al., 2020; Atanda et al., 2021b, 2022). In this study, we extend the sparse phenotyping method into the MT-GS framework in which the phenotyping of lines is split across traits (Supplemental Figure S1b,d). This strategy can improve predictive ability in MT-GS by maximizing use of information across traits and genotypes. To further evaluate the potential of GS in the NDSU pulse breeding program and how it can be efficiently deployed to improve genetic gain, the following were our objectives in this study: (a) determine the efficiency of MT-GS vs. UNI-GS in predicting nutritional traits in pea, (b) determine the optimal method for combining traits in MT-GS using heritability and genetic correlation between traits as metrics, and (c) identify optimal resource allocation for phenotyping nutritional traits in the early yield testing stage by comparing the predictive ability of sparse and partial balanced testing.

MATERIALS AND METHODS Genetic materials and field or greenhouse evaluation

The genetic material consisted of 282 pea lines (DS1) from NDSU pulse breeding program and 192 USDA pea accessions (DS2) previously described in Bari et al. (2021). The NDSU lines were planted in augmented row–column design with five repeated checks in the 2020–2021 growing season at the North Dakota Agricultural Experiment Station, Minot, ND, USA (27°29′ N, 109°56′ W). Seeds were treated with fungicide and insecticide prior to planting. At planting, 30 seeds were planted on a 152- × 60-cm plot size with 30-cm spacing between plots. Plots were harvested at physiological maturity (90–120 d after planting) and dried to 15% moisture content. For the USDA pea accessions, six plants of each accession were grown in 5-L black plastic pots filled with a synthetic soil mix composed of two parts Metro-Mix 360 (Scotts-Sierra Horticultural Products Co.) and one part vermiculite (Strong-Lite Medium Vermiculite, Sun Gro Horticulture Co.). Plants were grown in a controlled environment greenhouse with a temperature regime of 22 ± 3 (day) and 20 ± 3 °C (night) with a relative humidity ranging from 45 to 65% throughout the day–night cycle. Sunlight was supplemented with metal halide lamps set to a 15-h day, 9-h night cycle (lights on at 700 h). In order to maintain an adequate supply of all mineral nutrients, a complete fertilizer mixture was provided to each pot on a daily basis. Pots were irrigated with an automated drip irrigation system (one drip line to each pot); the system was regulated with a timer that delivered nutrient solution twice a day (younger plants) or three times a day (older plants) in sufficient quantity to saturate the soil mass at each irrigation. The nutrient solution contained the following concentrations of mineral salts: 1.0 mM KNO₃, 0.4 mM Ca(NO₃)₂, 0.1 mM MgSO₄, 0.15 mM KH₂PO₄ and 25 μM CaCl₂, 25 μM H₃BO₃, 2 μM MnSO₄, 2 μM ZnSO₄, 0.5 μM CuSO₄, 0.5 μM H₂MoO₄, 0.1 μM NiSO₄, 1 μM Fe(Ш)-N, and N’-ethylenebis[2-(2-hydroxyphenyl)-glycine] (Sprint 138; Becker-Underwood, Inc.). We thus attempted to maintain all essential minerals at sufficient but nontoxic levels in the soil. Seeds were harvested from each accession at physiological maturity.

Core Ideas

We extended the use of sparse phenotyping into the multi-trait genomic selection (MT-GS) framework by split testing of entries.
The sparse-phenotyping aided MT-GS can further improve predictive ability by >12% across traits.
Heritability and genetic correlation are possible metrics to optimize and further improve prediction performance of MT-GS.
The sparse-testing-aided MT-GS can be further extended to multi-environment, multi-trait GS framework.

Mineral analysis

Mineral elements for DS1 were measured following procedures described in Ma et al. (2017) and Lan et al. (2019). Briefly, 200 g nondehulled pea seeds were ground to fine flour and then digested with concentrated nitric acid (70% HNO₃) in a digestion system block at 90 °C for 60 min. Afterward, 3 ml of hydrogen peroxide was added to further the digestion process for 15 min followed by the addition of 3 ml hydrochloric acid (70% HCl) and heated for additional 5 min. After cooling to room temperature, the digested samples were filtered through DigiFILTER (SCP Science) and diluted to 10 ml with nanopure water. To validate the procedure and analytical measurement, an apple leaf standard (SRM 1515; National Institute of Standards and Technology) was analyzed simultaneously with the pea flour samples. Total concentration of the mineral elements was measured using inductively coupled plasma atomic emission spectrometry (ICP-AES, IRIS Advantage; Thermo Elemental). Mineral values were determined with the ICP-AES using the following spectral emission lines (in nm): Ca, 184.0; Mg, 285.2; K, 769.8; P, 177.4; Fe, 238.2; Zn, 213.8; Mn, 260.5; Cu, 324.7; Ni, 231.6; B, 208.9; and Mo, 202.0.

For the DS2, dried seeds (with seed coats) from six plants of each accession were ground to a uniform powder using a coffee grinder with stainless steel blades. Two subsamples of each accession were weighed (∼200 mg each), dry ashed, resuspended in ultra-pure nitric acid, and analyzed for Ca, Mg, K, P, Fe, Zn, Mn, Cu, Ni, B, and Mo concentrations using ICP-AES. Dry ashing was performed in quartz tubes, with samples ashed for 6 h at 450 °C. After cooling, to ensure complete oxidation of all tissues, 2.5 ml of 30% H₂O₂ was added to each tube and samples were reheated to 450 °C for 1 h. Apple leaf standards (SRM 1515; National Institute of Standards and Technology) were ashed and analyzed along with pea seed samples to verify the reliability of the procedures and analytical measurements. Mineral values were determined with the ICP-AES using the same spectral emission lines (in nm) noted for the DS1 population.

Genotyping

Details on DNA isolation and genotyping-by-sequencing can be found in Bari et al. (2021). Both DS1 and DS2 were genotyped using genotyping-by-sequencing, and 28,832 single-nucleotide polymorphism (SNP) markers were generated for DS1, whereas 380,527 SNP markers were generated for DS2. After removing SNPs with >90% missing values, heterozygosity >20%, and with a minor allele frequency <5%, 11,858 and 30,645 SNPs remained for DS1 and DS2, respectively, and were used for the analysis. Missing SNPs were imputed with Beagle v5.1 (Browning et al., 2018).

Phenotypic data analysis

Best linear unbiased estimates of the phenotypes for DS1, accounting for spatial trend on the field modeled by a smooth bivariate function of the spatial coordinates f(r, c) represented by 2D P-splines, was implemented in SpATS R package (Rodríguez-Álvarez et al., 2016). This was modeled as follows: [Image Omitted. See PDF]where y is the response variable for n-th trait; b is the fixed effect of the genotype; u_r and u_c are row and column random effects accounting for discontinuous field variation with multivariate normal distribution ${u}_{\mathrm{r}}\sim \mathrm{N}(0,\mathbf{I}{\mathrm{\sigma}}_{\mathrm{r}}^{2})$ and ${u}_{\mathrm{r}}\sim \mathrm{N}(0,\mathbf{I}{\mathrm{\sigma}}_{\mathrm{r}}^{2})$ respectively; I is an identity matrix, $\sigma _{\rm{r}}^2$ and $\sigma _{\rm{c}}^2$ are variance for row and column effect, f(r, c) is a smooth bivariate function defined over the row and column positions (see Velazco et al., 2017 for details); ε is the measurement error from each plot with distribution of $\mathrm{\varepsilon}\sim \mathrm{N}(0,\mathbf{I}{\mathrm{\sigma}}_{\mathrm{\varepsilon}}^{2})$ , where I is the same as above and $\sigma _\varepsilon ^2$ is variance for the residual term or simply referred to as nugget; and X and Z are incidence matrix for the fixed and random terms.

For the DS2, the mineral elements value of each accession was estimated as follows; mineral values from the two subsamples (see Mineral Analysis section for details) were averaged for each accession; these averaged values are presented as parts per million (ppm), which is equivalent to micrograms per gram dry weight (μg g⁻¹ dry wt.). In this study, it was denoted as mean phenotypic value of each accession for each mineral element. In general, the standard deviations for each mineral were low (i.e., within each accession). Across all accessions, the average standard deviation for each mineral (calculated as percentage of the mean of the two subsamples) was as follows: Ca, 10.4%; Mg, 2.0%; K, 2.5%; P, 2.5%; Fe, 6.0%; Zn, 5.1%; Mn, 6.9%; Cu, 7.5%; Ni, 15.4%; B, 5.8%; and Mo, 3.7%.

Genomic selection models

Given the genetic architecture of traits evaluated in this study and computational efficiency, genomic best linear unbiased prediction was the choice model for UNI-GS and MT-GS. Furthermore, because sparse-testing-aided MT-GS proposed in this study involved split phenotyping of genotypes across traits and was thus highly unbalanced (missing values in the training set). To our knowledge such unbalanced data type has not yet been accommodated in machine-learning models, therefore, it was not considered in this study. The UNI-GS model was implemented in the R package BGLR (Pérez & de los Campos 2014) and expressed as follows: [Image Omitted. See PDF]where y is the vector (n × 1) of adjusted means (best linear unbiased estimates [BLUEs]) using DS1 or mean phenotypic value using DS2 for k-th genotypes for a given n-th trait or mineral element; μ is the overall mean; 1_k (k × 1) is a vector of ones; u is the genomic effect of k-th genotypes for n-th trait and assumed to follow multivariate normal distribution expressed as $\mathbf{u}\sim \mathrm{N}(0,\mathbf{G}{\mathrm{\sigma}}_{\mathrm{g}}^{2})$ , where G is the genomic relationship matrix and $\sigma _{\rm{g}}^2$ is the additive genetic variance; and Z is the incidence matrix for genomic effect of the lines.

The MT-GS model, which is an extension of Equation 2, was fit using Bayesian multivariate gaussian model in MTM R package (de los Campos & Grüneberg 2016). This is expressed as the following: [Image Omitted. See PDF]where y₁ … y_n are the vector of phenotypes; μ₁ … μ_n are the overall mean for each n-th trait; Z₁ … Z_n is the incidence matrix for genomic effect of the lines for each n-th trait, u₁ … u_n is genomic effect of the lines for each n-th trait; and ε₁ … ε_n is the residual error for each n-th trait. The random term is assumed to follow multivariate normal distribution $[{\mathbf{u}}_{1}\text{\ensuremath{\cdots}}{\mathbf{u}}_{\mathrm{n}}]\mathbf{u}\sim \mathrm{N}[0,(\mathbf{G}\otimes {\mathbf{G}}_{\mathbf{o}})]$ , where G is the same as above and G_o is an n × n unstructured variance–covariance matrix of the genetic effect of the traits, which is represented as follows: [Image Omitted. See PDF]

The off-diagonal elements represent variance for each trait and covariances between traits are the off-diagonal elements.

Further, the residual term for each n-th trait is assumed to follow multivariate normal distribution: [ε₁… ε_n] ∼ N[0,(I ⊗ R)], where I is the same as above and R is a heterogeneous diagonal matrix of the residual variances for each n-th trait: [Image Omitted. See PDF]

The diagonal elements represent the residual variance for each n-th trait and off-diagonal elements of the R matrix equal to zero. In our preliminary analysis (results not shown), unstructured R matrix where off-diagonal element of R represent covariance of the residual effects of the traits was considered; however, we observed inconsistent model convergence for all iterations. The same results were observed when factor analytic model (Piepho, 1998; Smith et al., 2001; Crossa et al., 2004) that identifies one or few factors underlying the correlation among traits by their relationship to unobservable latent variables was considered for the R structure. This might be due to size of the dataset used in our study relative to the number of model parameters to estimate.

Genomic heritability estimate (de los Campos et al., 2015; Feldmann et al., 2021) for the n-th trait using individual level data was derived from the variance components obtained from the model using the complete dataset: [Image Omitted. See PDF]where ${\mathrm{\sigma}}_{{\mathrm{g}}_{\mathrm{n}}}^{2}$ and ${\mathrm{\sigma}}_{{\mathrm{\varepsilon}}_{\mathrm{n}}}^{2}$ are the genetic and residual variance estimates for n-th trait.

Cross-validation scheme

To evaluate the performance of sparse-testing-strategy-aided MT-GS, different cross-validations mimicking potential applications of MT-GS in a breeding program were explored. Leveraging on the results from sparse testing in multi-environment yield trials using GS (Jarquin et al., 2020; Persa et al., 2020; Atanda et al., 2021b, 2022), we varied the number of genotypes that serve as connectivity across traits to assess predictive ability in the different scenarios. Depending on the size of the data set and the number of phenotypes, different overlapping sizes were evaluated (Supplemental Table S1). Five different overlapping sizes (50, 60, 70, 80, and 90%) were considered for DS1 (n = 282), which had the highest total number of genotypes, followed by four overlapping sizes (40, 50, 60, and 70%) in DS2 (n = 192). For example, when 50% of the total genotypes in DS1 serve as connectivity across the traits, the remaining 141 genotypes were partitioned into 10 distinct sets, each trait with a unique set. Thus, each trait has 155 genotypes as training set to predict the genetic merit of 127 genotypes (Supplemental Figure S1b). This process was repeated 50 times. As the size of the overlapping genotypes increased (60, 70, 80, and 90% of total genotypes), the training set size increased to 180, 205, 230, and 255, and the prediction set size reduced to 102, 77, and 27, respectively. The splitting of the genotypes across traits was also repeated 50 times for each overlapping size scenario, each replication has different genotypes that serve as connectivity across traits, nonoverlapping training set for each trait, and the prediction set (Supplemental Table S1). In each replication, the Pearson correlation of the predicted GEBV and the BLUE estimates of the genotypes for each trait obtained using complete dataset was calculated, and the mean was recorded as the predictive ability of the prediction set for each trait.

To determine the efficiency of MT (sparse and partially balanced phenotyping) and UNI-GS model, we compared the predictive ability of the prediction sets using the different training set sizes defined in each dataset. Again, this process was repeated 50 times, each replication having different genotypes included in the training and prediction set for all traits in the UNI-GS model and across traits for the partially balanced phenotyping aided MT-GS. For sparse phenotyping, each replication had different genotypes that serve as connectivity across traits, a nonoverlapping training set for each trait, and the prediction set (Supplemental Figure S1). For DS1, predictive ability for each replication was measured as the Pearson correlation of the predicted GEBV and the BLUE estimates of the genotypes for each trait obtained using the full dataset. Average was reported. In the DS2, the BLUE estimates were replaced with the mean phenotypic value of each accession for each trait.

Based on the preliminary analysis results (Figure 2a,b), only the sparse-testing-aided MT-GS was considered to evaluate the efficiency of using heritability, genetic correlation between traits, or combination of the factors for trait assignment in the prediction set and training set, respectively. The following scenarios were assessed:

Exclusion of traits with lowest heritability but moderate-to high genetic correlation with other traits from the prediction set; however, these traits are reserved in the model as secondary traits. We also evaluated the scenario when it was removed from the model.
Exclusion of traits with moderate-to-high heritability but the highest occurrence of negative correlation with other traits from the prediction set, however, reserved in the model as secondary traits. We also evaluated the scenario when it was removed from the model.
Exclusion of traits with the lowest heritability and moderate-to-high genetic correlation with other traits as well as traits with moderate-to-high heritability and the highest occurrence of negative correlation with other traits from the prediction set but retained in the model as secondary traits. We also evaluated the scenario in which both were left out of the model.

RESULTS Genomic heritability (diagonal) and genetic correlation among traits (upper diagonal)

In DS1, heritability was moderately high for all traits except Ca, which had a very low heritability value of 0.01 (Figure 1a). However, in DS2, Ca had a moderate heritability of 0.40 (Figure 1b). Similarly, Fe had heritability of 0.63 in DS2 vs. 0.29 in DS1, and, in general, the trait heritability in DS2 ranged from moderate to high. In the two datasets, P consistently had the highest heritability of 0.87 in DS1 and 0.73 in DS2. The genetic correlation between traits in DS1 ranged from −0.01 to 0.96, whereas it ranged from −0.01 to 0.99 in DS2. In the DS1, Cd had zero or no genetic correlation with most of the traits. Similarly, K had no genetic correlation with Ca in DS2, contrary to the 0.33 genetic correlation observed in DS1 (Figure 1a,b). Generally, in DS1, K, Fe, P, and Mg had moderate to high genetic correlation with most of the traits, whereas in the DS2, Ni, Cu, and Mg had high genetic correlation with other traits except with Mo.

View Image - FIGURE 1. Genomic heritability (diagonal) and genetic correlation between pairs of traits (upper diagonal) from multi-trait genomic selection model using complete datasets: (a) results using DS1 dataset and (b) results using DS2 dataset

FIGURE 1. Genomic heritability (diagonal) and genetic correlation between pairs of traits (upper diagonal) from multi-trait genomic selection model using complete datasets: (a) results using DS1 dataset and (b) results using DS2 dataset

Sparse-testing-aided MT-GS improves predictive ability across traits vs. PBT-aided MT-GS and UNI-GS models

Regardless of cross-validation schemes or dataset, sparse-testing-aided MT-GS model outperformed PBT aided MT-GS and UNI-GS models for all traits except for Ca in DS1, which might be attributed to near-zero genetic signal observed for this trait (Figure 2a,b). For instance, in DS2, where the predictive ability is generally high compared with DS1, sparse testing using MT-GS outperformed PBT-aided MT-GS by 25, 36, 15, 26, 27, 67, 81, 50, 66, and 56%, respectively, for Mo, Mg, P, K, Ca, Mn, Fe, Zn, Cu, and Ni, while it improved predictive ability by 7, 58, 23, 17, 60, 12, 2, 95, and 56% compared with UN-GS model (Figure 2b). Surprisingly, PBT-aided MT-GS did not consistently outperform UNI-GS model in DS2 compared with DS1, where PBT-aided MT-GS results in marginally improved predictive ability for all traits.

View Image - FIGURE 2. Predictive performance of univariate (UNI) and multi-trait genomic selection (MT-GS) using partially balanced (PBT) and sparse testing (ST) phenotyping of the traits. (a) and (b) highlight results using DS1 and DS2 dataset, respectively. The number within each box represents mean predictive ability of 50 iterations of the process of line assignment as training and prediction set. In each replication different genotypes were assigned as training and prediction set for the traits in the UNI-GS model and across traits for the partially balanced and sparse phenotyping aided MT-GS, respectively

FIGURE 2. Predictive performance of univariate (UNI) and multi-trait genomic selection (MT-GS) using partially balanced (PBT) and sparse testing (ST) phenotyping of the traits. (a) and (b) highlight results using DS1 and DS2 dataset, respectively. The number within each box represents mean predictive ability of 50 iterations of the process of line assignment as training and prediction set. In each replication different genotypes were assigned as training and prediction set for the traits in the UNI-GS model and across traits for the partially balanced and sparse phenotyping aided MT-GS, respectively

Traits combination as a function of heritability, genetic correlation between traits, and their combination

When either heritability or genetic correlation was considered a decision tool for a combination of traits in the prediction and calibration set, the predictive ability improved for all traits compared with having all the traits in the prediction and the training set. However, the magnitude of the gain in predictive ability varied by trait (Figures 3 and 4). In DS1, for example, when Ca, with the lowest heritability (0.01) but moderate-to-high genetic correlation with other traits, was dropped from the prediction set but kept in the model, the gain in the predictive ability for the remaining nine traits in the prediction set across the overlapping scenarios ranged from 2.79 to 85.07% (Figure 3b), whereas it ranged from 3.28 to 63.37% when excluded from the model (Figure 3b*). When Cd, with high heritability (0.51) but zero genetic correlation with most traits, was removed from the prediction set but retained in the calibration model, the improvement in predictive ability ranged from 6.75 to 104.41% (Figure 3c) and ranged from 1.38 to 45.42% when removed from the training model (Figure 3c*). Similar results were obtained in DS2, when Mo, with heritability of 0.43 and negative correlation with the majority of the traits, was removed from the prediction set but reserved in the calibration model. The predictive ability of traits in the prediction set ranged from 2.41 to 77.92% (Figure 4b) and ranged from 0.62 to 19.62% when removed from the model (Figure 4b*). Because Mo has a negative genetic correlation with the majority of the traits, in addition to having the lowest heritability of all the traits in DS2, we substitute Mo with Ca, which has a heritability of 0.41 and a moderate to high genetic correlation with other traits, to disentangle the confounding effect of heritability and genetic correlation. The gain in predictive ability ranged from 3.19 to 90.34% when reserved in the calibration model (Figure 4c) and from 1.34 to 14.65% when excluded from the calibration model (Figure 4c*).

View Image - FIGURE 3. Predictive ability of untested lines in DS1 for each trait for different overlapping and nonoverlapping size. The different colors denote traits in the prediction set, which might also be present in the calibration model. The asterisk (*) indicates exclusion of traits from the calibration model based on its heritability, degree of genetic correlation with other traits, or combination of the two factors

FIGURE 3. Predictive ability of untested lines in DS1 for each trait for different overlapping and nonoverlapping size. The different colors denote traits in the prediction set, which might also be present in the calibration model. The asterisk (*) indicates exclusion of traits from the calibration model based on its heritability, degree of genetic correlation with other traits, or combination of the two factors

View Image - FIGURE 4. Predictive ability of untested lines in DS2 for each trait for different overlapping and nonoverlapping size. The different colors denote traits in the prediction set, which might also be present in the calibration model. The asterisk (*) indicates exclusion of traits from the calibration model based on its heritability, degree of genetic correlation with other traits, or combination of the two factors

FIGURE 4. Predictive ability of untested lines in DS2 for each trait for different overlapping and nonoverlapping size. The different colors denote traits in the prediction set, which might also be present in the calibration model. The asterisk (*) indicates exclusion of traits from the calibration model based on its heritability, degree of genetic correlation with other traits, or combination of the two factors

Unsurprisingly, when Mo and P, which have moderate and high heritabilities of 0.32 and 0.73 but are negatively correlated with other traits, respectively, were excluded from the prediction set, the predictive ability improved for all traits except Ca and Fe (Figure 4d). When the traits were removed from the calibration model, the predictive ability decreased for a majority of the traits (Figure 4d*). On the contrary, removing Mo and K from the prediction set with moderate heritability of 0.32 and 0.43 resulted in an improved predictive ability (Figures 4e,e*). In contrast to P, K has zero, weak negative, and strong positive correlations with other traits. Figures 3d and 3d* corroborate the findings in Figures 4d and 4d*, in which Se and Cd with moderate heritability of 0.51 and 0.52, respectively, but low genetic correlation with other traits, were excluded from the prediction set but retained or removed from the calibration model. The additional improvement in predictive ability observed when Se, Cd, and Ca were excluded from the prediction set (Figures 3e,e*) demonstrates the efficacy of heritability and genetic correlation between traits as decision metric, corroborating the results obtained in Figures 4e and 4e*.

DISCUSSION

Plant breeders make advancement decisions based on multiple traits with varying genetic correlations ranging from negative to positive and, in exceptional cases, no genetic correlation at all. Thus, the use of the MT-GS model is gaining popularity as a choice GS model to estimate the genetic merit of new genotypes. When comparing models, our results corroborate with previous studies (Calus & Veerkamp, 2011; Jia & Jannink, 2012; Montesinos-López et al., 2018; Lado et al., 2018; Bhatta et al., 2020; Gaire et al., 2022) indicating that MT-GS outperforms UNI-GS by harnessing genetic correlation between traits to improve predictive ability across traits. The proposed MT-GS-aided sparse phenotyping departed from the previous reports of weak genetic correlation between traits as a limitation to the advantage of MT-GS over UNI-GS, which was evident in the partially balanced phenotyping-aided MT-GS. The performance of sparse-phenotyping-enabled MT-GS was consistently superior to UNI-GS, with at least 12% improvement on predictive ability on average across traits, suggesting the importance of borrowing information across traits and related genotypes. Similar results have been reported in sparse-testing-aided GS in multi-environment trials (Atanda et al., 2021a, 2021b, 2022). This demonstrates the improvement in predictive performance in sparse-testing-aided MT-GS is primarily due to efficient estimation of correlated effects across genetically related traits, as phenotypic records are available for all traits, albeit in a different set of genotypes. In addition, allowing for significant genotype overlap improves predictive ability because genetic connectivity across traits improves estimates of trait-to-trait correlation effects. The results are consistent with previous research on sparse-testing-enabled GS in maize (Zea mays L.) (Jarquin et al., 2020, Atanda et al., 2021b) and wheat (Triticum aestivum L.) (He et al., 2021; Crespo-Herrera et al., 2021; Atanda et al., 2022). The observed inflection points in this study, however, suggests that more research is required to determine the optimal number of overlapping genotypes, which might be influenced by the degree of genetic relationship between lines, the number of lines per cross, the genetic correlation between traits, yield testing stage, and expected predictive ability.

Overall, predictive ability improves with heritability in all models except Se, Cd, and Mn in DS1, though DS1 generally has low predictive ability compared with DS2, presumably because of low genetic variation for nutritional traits in DS1, which are elite breeding lines, compared with DS2, which are accessions and the growing condition of the accessions in the greenhouse vs. DS1 planted out in the field. Thavarajah et al. (2022) reported heritability estimates of nearly zero for Ca, K, P, Mg, Mn, Fe, Zn, Cu, and Se in 44 pea lines evaluated in two locations in 2019 and one location in 2020 with two replications in each location. On the contrary, Ma et al. (2017) observed moderately high genetic diversity for mineral elements in 158 recombinant inbred lines evaluated in two locations with two replications. The number of replications and locations used in these studies further suggests that the degree of genetic variation (by inference heritability) for nutritional traits in DS1 may be responsible for the observed low predictive ability.

Multi-trait combinations were created in training and prediction sets based on genetic correlations between traits, heritability, and the combination of the limiting factors to optimize the trade-off between the limiting factors and the accuracy of predicting the genetic value of the phenotypes. In general, traits with very low heritability or genetic correlation with other traits cannot be adequately predicted due to lack of genetic signals (Jia & Jannink, 2012; Gaire et al., 2022). This is consistent with our findings that improvement in predictive ability was low for traits with low heritability but with moderate-to-high genetic correlation with other traits and for traits with high occurrence of negative correlation with other traits but with moderate-to-high heritability. However, reserving the traits in the training and prediction sets as secondary traits improves estimation of model parameters, resulting in an improvement in predictive ability compared with exclusion from the model. The observed difference in predictive ability for each limiting factor suggests both factors independently affects predictive ability. Consequently, both factors are equally important in determining trait combinations in MT-GS. In practice, this information can be sourced from relevant literature on the phenotypes or historical data in the breeding program. To our knowledge, this is the first time these two factors are designed to designate traits in the training and prediction set to improve predictive performance in MT-GS. The gain in predictive performance achieved by using this strategy requires further investigation because it has only been tested in pea datasets with limited environments (year × location combinations) and replication, which is a major limitation in this study, and does not represent extensive data generated in breeding programs. The availability of multi-environment dataset can improve estimates of genotypic values for quantitative traits. Because significant progress has been made in multi-trait, multi-environment genomic prediction (Montesinos-López et al., 2016, 2018, 2019; Gill et al., 2021; Sandhu et al., 2022), our findings suggest future research should focus on developing an optimal strategy for genomic-prediction-enabled sparse testing of multiple traits in multi-environment trials. This will likely further lower the cost of phenotyping and the time-consuming data collection process. In addition, we encourage the use of different crops with varying genetic backgrounds that fairly cover the diversity of data generated in breeding programs to gather more evidence on the efficiency of this strategy in improving prediction performance in MT-GS.

CONCLUSION

In this study, we proposed the use of sparse-testing-aided MT-GS to further improve prediction performance and reduce the cost of phenotyping and time-consuming data collection process. By redesigning the phenotyping strategy, we showed that trait combinations in training and prediction sets can influence prediction performance of MT-GS. Therefore, when designing MT-GS strategies, consideration should be given to trait combinations in the training and prediction sets. In addition, results suggest the use of heritability and genetic correlation between traits as metrics to further improve prediction accuracy. The sparse-testing-aided MT-GS proposed in this study can be further extended to multi-environment, multi-trait GS framework.

ACKNOWLEDGMENTS

The authors would like to acknowledge the funding provided by the North Dakota Department of Agriculture through the Specialty Crop Block Grant Program (19-429 and 20–489) and Northern Pulse Growers Association for the sequencing, field phenotyping, mineral analysis, and GS analysis of the NDSU breeding lines. The SNP genotyping, field phenotyping, and mineral analysis of the USDA germplasm was supported through the funding provided by the USDA Plant Genetic Resource Evaluation, USA Dry Pea and Lentil Council Research Committee, USDA–ARS Pulse Crop Health Initiative and USDA–ARS Projects 5348-21000-017-00D (CC),and 5348-21000-024-00D (RJM). This work used computing resources provided by the Center for Computationally Assisted Science and Technology (CCAST) at North Dakota State University, Fargo, ND, USA, which were made possible in part by NSF MRI Award No. 2019077. We thank Dr. Diego Jarquin and the two anonymous reviewers whose suggestions helped improve this manuscript.

AUTHOR CONTRIBUTIONS

Sikiru Adeniyi Atanda: Conceptualization; Data curation; Formal analysis; Methodology; Validation; Writing-original draft; Writing-review & editing. Jenna Steffes: Data curation; Resources. Yang lan: Data curation; Resources. Jeong-Hwa Kim: Data curation; Resources. Md Abdullah Al Bari: Data curation; Resources. Jeong-Hwa Kim: Data curation; Resources. Mario Morales: Data curation; Resources. Josephine P. Johnson: Data curation; Resources; Writing-review & editing. Rica Saludares: Data curation; Resources; Writing-review & editing. Hannah Worral: Data curation; Resources; Writing-review & editing. Lisa Piche: Data curation; Resources; Writing-review & editing. Andrew Ross: Data curation; Resources; Writing-review & editing. Mike Grusak: Funding acquisition; Resources; Writing-review & editing. Clarice Coyne: Data curation; Funding acquisition; Resources; Writing-review & editing. Rebecca McGee: Data curation; Funding acquisition; Resources; Writing-review & editing. Jiajia Rao: Data curation; Funding acquisition; Resources; Writing-review & editing. Nonoy Bandillo: Conceptualization; Funding acquisition; Supervision; Validation; Writing-review & editing.

CONFLICT OF INTEREST

The authors declare that the study was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

DATA AVAILABILITY STATEMENT

The SNP dataset used in this study is available online (https://www.ncbi.nlm.nih.gov/sra/PRJNA730349). The phenotypic data will be made available by reaching out to the corresponding author.

Word count: 5816

Show less

© 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Multi‐trait genomic selection (MT‐GS) has the potential to improve predictive ability by maximizing the use of information across related genotypes and genetically correlated traits. In this study, we extended the use of sparse phenotyping method into the MT‐GS framework by split testing of entries to maximize borrowing of information across genotypes and predict missing phenotypes for targeted traits without additional phenotyping expenditure. Using 300 advanced breeding lines from North Dakota State University (NDSU) pulse breeding program and ∼200 USDA accessions that were evaluated for 10 nutritional traits, our results show that the proposed sparse phenotyping aided MT‐GS can further improve predictive ability by >12% across traits compared with univariate (UNI) genomic selection. The proposed strategy departed from the previous reports that weak genetic correlation is a limitation to the advantage of MT‐GS over UNI genomic selection, which was evident in the partially balanced phenotyping‐enabled MT‐GS. Our results point to heritability and genetic correlation between traits as possible metrics to optimize and further improve the estimation of model parameters, and ultimately, prediction performance. Overall, our study offers a new approach to optimize the prediction performance using the MT‐GS and further highlight strategy to maximize the efficiency of GS in a plant breeding program. The sparse‐testing‐aided MT‐GS proposed in this study can be further extended to multi‐environment, multi‐trait GS to improve prediction performance and further reduce the cost of phenotyping and time‐consuming data collection process.

Details

Title

Multi‐trait genomic prediction improves selection accuracy for enhancing seed mineral concentrations in pea

Author

Sikiru Adeniyi Atanda¹

; Steffes, Jenna¹; Yang, lan¹; Md Abdullah Al Bari¹; Jeong‐Hwa Kim¹; Morales, Mario¹; Johnson, Josephine P¹; Saludares, Rica¹; Worral, Hannah²; Piche, Lisa¹; Ross, Andrew¹; Grusak, Mike³

; Coyne, Clarice⁴

; McGee, Rebecca⁵; Rao, Jiajia¹; Bandillo, Nonoy¹

¹ Dep. of Plant Sciences, North Dakota State Univ., Fargo, ND, USA
² North Central Research Extension Center, NDSU, South Minot, ND, USA
³ Edward T. Schafer Agricultural Research Center, USDA‐ARS, Fargo, ND, USA
⁴ USDA–ARS Plant Germplasm Introduction and Testing, Washington State Univ., Pullman, WA, USA
⁵ USDA–ARS, Grain Legume Genetics and Physiology Research, Pullman, WA, USA; Dep. of Horticulture, Washington State Univ., Pullman, WA, USA

Section

ORIGINAL RESEARCH

Publication year

2022

Publication date

Dec 2022

Publisher

John Wiley & Sons, Inc.

ISSN

19403372

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/tpg2.20260

ProQuest document ID

2753989816

Multi‐trait genomic prediction improves selection accuracy for enhancing seed mineral concentrations in pea

Jump to:

Full text

Abstract

Details

Suggested sources