Full Text

Translate

Turn on search term navigation

Abbreviations

AIC
Akaike information criterion

BLUP
best linear unbiased prediction

GBLUP
genomic best linear unbiased prediction

GEBV
genomic estimated breeding value

GWAS
genome-wide association study

NDVI
normalized difference vegetation index

NPCI
normalized pigment chlorophyll ratio index

PBLUP
phenomic best linear unbiased prediction

PGBLUP
genomic and phenomic best linear unbiased prediction

PSRI
plant senescence reflectance index

SNP
single nucleotide polymorphism

TOI
trait of interest

WRSI
weighted rank sum index

INTRODUCTION

Despite improvements in genetics and agronomics, food and nutritional security remain a global concern (Rezvi et al., 2023). Much effort has been placed on improving the availability of staple grains such as rice, wheat, and maize (Rosegrant & Cline, 2003). While this has led to increases in availability of calories, it has not increased the nutritional value of food consumed in low-income countries. One suggestion to overcome deficiencies in protein and micronutrients has been to increase the amount of legumes in human diets (Singh & Pratap, 2016). Mungbean would be an excellent crop for this purpose due to its wide cultural acceptability and ability to thrive in diverse environments.

Mungbean (Vigna radiata (L.) R. Wilczek var. radiata), native to the Indo-Burma region of Asia, is one of the most important warm-season grain legume crops in Asia and is spreading rapidly to other parts of the world. A South Asian domesticate, it appears in neolithic crop assemblages from ∼5000 years ago in India before it expanded across the continent (Murphy & Fuller, 2017; Ong et al., 2023). The crop is currently grown on over 7 million ha and provides significant amounts of calories, protein, and micronutrients, with improved varieties fitting into many production systems as rotation crops (Burlyaeva et al., 2019; Van Haeften et al., 2023). The cultural uses of mungbean vary from dahl in South Asia to a greater emphasis on sprouts in Southeast Asia to flour in parts of East Asia, thus creating regional differences in preferred seed traits (e.g., Burlyaeva et al., 2019). Mungbean breeding is currently focused on addressing production constraints such as leaf and root diseases, insect pests, salinity, water logging, and heat tolerance; erecting plant habits for mechanical harvesting; and improving nutritional properties of the beans, such as raising sulfur, amino acid, and iron contents. Traits for breeding are typically sourced from genebank collections maintained at different locations across the world (Ebert, 2013; Ong et al., 2023). In order for efficient utilization of the collected mungbean biodiversity, a mini-core collection (289 accessions) was developed from the WorldVeg genebank collection, comprising around ∼6700 accessions (Schafleitner et al., 2015).

Genome-wide association studies (GWASs) have emerged as a powerful tool to dissect variation in traits in mungbean as well as in other crops. GWAS panels for mungbean have been assembled from most of the large international genebank collections (WorldVeg, China, Australia, USDA-GRIN, Russia). Sokolkova et al. (2020) found substantial variation across mungbean lines in phenology, seed and plant color traits, and seed size in the WorldVeg mini-core collection. Chitieri and colleagues (2022, 2023) analyzed variation in seed, shoot, and root traits in a USDA mungbean panel of 495 accessions. Using smaller panels of released varieties and exotic germplasm, Reddy et al. (2020, 2021) found variation in phosphorus uptake. Wu et al. (2020) used 95 USDA accessions to examine differences in mineral uptake and their concentration in seeds. Chang et al. (2023) examined drought tolerance in a panel of mostly East Asian accessions, and Aski et al. (2021) reported genetic loci associated with root phenotypes. Flowering time, an important trait for short-duration mungbean and for synchronous maturation, was investigated in a diversity panel of accessions from 20 countries (Seo et al., 2023). Most GWASs were performed in a single environment, often in a single year, and although the studies seemed to have been motivated by breeding, there are not yet reports of any use of the identified candidate loci for selection in crop improvement. However, limitations arise when single-environment trials are conducted, as a single-environment trial does not depict the complete phenotypic variation for a quantitative trait of interest (TOI). Combinations of component traits through indices may increase variation and improve the accuracy of prediction. Prediction-based selection is becoming the norm in plant breeding for quantitative traits. The methodologies applied are evolving rapidly, especially with the application of computer learning algorithms. However, the tried-and-true method of genomic best linear unbiased prediction (GBLUP) continues to maintain its rank among these novel techniques, especially for the prediction of additive genetic effects (Montesinos-López et al., 2021; Rutkoski et al., 2016). The method uses relationship matrices based on genomic single nucleotide polymorphisms (SNPs) and can be expanded to phenomic systematic measurement data (Van Tassel et al., 2022). Selection for highly quantitative traits is a difficult task, and these are often the major TOIs in breeding programs, such as yield and extreme environmental tolerance. Quantitative traits are based on the infinitesimal model of genetics (Fisher, 1919), which states that these traits are controlled by an infinitely large number of genes, each contributing an infinitesimally small effect to the variation of the trait. Importantly, most of the variation for quantitative traits is additive and well explained by additive models, even though interactions among alleles (dominance) or loci (epistasis) are important at the molecular level (Jannink et al., 2010). These relationships can be estimated through SNP markers (genomics) and through systematic phenotypic observation of secondary traits (phenomics) (Meuwissen et al., 2001; Van Tassel et al., 2022). Genomics increases the predictive ability of these models and improves the confidence in the additive genetic effect predicted (GBLUP). Phenomics further increases our confidence in model performance by taking into account non-additive genetic effects, such as genotype-by-environment interaction, which often impact quantitative trait variation. Selection of lines with a large additive genetic effect on the phenotypic variation in the TOI is done by selecting lines with the highest genomic estimated breeding value (GEBV), phenomic estimated breeding value, or phenomic and genomic estimated breeding value.

However, limitations arise when single-environment trials are conducted. Here, an environment is defined as a combination of country, location, and year (other conceptualizations define the environment based on quantitative measurements such as temperature, precipitation, and soils). A single-environment trial is unlikely to generate the complete phenotypic variation for a quantitative TOI, on which the above framework relies. But combinations of component traits through indices may increase variation and improve the accuracy of prediction. A more appropriate methodology is increasing the environments in which TOIs are tested for the target population of genotypes, ideally in the target population of environments. Phenotypic measurement of the TOI across many environments improves confidence that the phenotypic variation measured is a realistic sample of the true phenotypic variation. Selection of lines with large additive genetic effects based on multi-environment evaluation identifies stable genotypes in the environment.

Genomic selection has become a standard way to increase genetic gain per unit time in many different crop species (Voss-Fels et al., 2019; Wartha & Lorenz, 2021). Genomic selection uses markers to predict performance rather than phenotypic screens, thus relying on GEBVs established in a test population (Bernardo, 1994; Daetwyler et al., 2013; Jannink et al., 2010). Genomic selection helps operationalize other breeding strategies such as performance prediction for hybrids, speed breeding, introgression from wild relatives, and crop modeling (Technow et al., 2015; Watson et al., 2019; Wolfe et al., 2019). Phenomics aims to measure all the phenotypic traits of a plant and then use that information to help understand its overall performance (Chen et al., 2022). Phenomic selection is operationalized through the use of high-dimensional data developed from different types of imagery (e.g., thermal, red–blue–green, multispectral, and hyperspectral) has been gaining popularity and has increased the understanding of quantitative traits and improved breeding outcomes (Jangra et al., 2021). Creating models that can use this information efficiently has been shown to increase gain per cycle. Within the mungbean community, it is common to do single-environment selection as well as multi-environment selection; however, it is unclear if selection under these contrasting conditions leads to similar choices. Therefore, the goal of this study was to do a comparison of two common selection methodologies for mungbean yield: (1) single-environment trials using yield components and (2) multi-environment trials using yield per se.

Core Ideas

Core collections are key place to identify useful variation for crop improvement.
Combining genomic and phenomic information improves prediction accuracy.
Selection indices from single-environment trials show overlaps with selection in multi-environment trials.

MATERIALS AND METHODS

Germplasm

The World Vegetable Mini-Core collection was used for this study. This population of 292 accessions reflects the diversity of accessions of cultivated mungbean conserved in the World Vegetable Center genebank (Schafleitner et al., 2015; Sokolkova et al., 2020). The germplasm of the mini-core collection was obtained from 19 countries in Asia, the Americas, Australia, and Africa and is available at the WorldVeg genebank () with a Standard Material Transfer Agreement ().

Genotyping

The raw reads were obtained from Illumina under the Illumina Greater Agricultural Good Award 2020, which performed a resequencing effort of the whole minicore collection with high depth (>40X). We cleaned the reads using Cutadapt v1.14 (Martin, 2011) and SolexaQA++ v3.1.7.1 (Cox et al., 2010), performed reference-based SNP calling according to the Crystal genome () with Burrows–Wheeler aligner v0.7.15 (Li & Durbin, 2009), and marked the duplicated reads with Picard v2.9.0-1 (). SNP calling was done following the GATK Best Practice with GATK v4.2.2.0 (Van der Auwera et al., 2013). Here, we use VCFtools to filter the biallelic SNPs with max-missing at 0.9 and minQ at 30 (Danecek et al., 2011). SNPs and individuals with heterozygous sites of more than 10% were removed because mungbean is a selfing crop. To perform GWAS, we filtered out SNPs with missing rate > 0.05, minor allele frequency < 0.05, and LD-prune = 0.9, resulting in a final SNP matrix with 292 individuals and 200,000 SNPs.

Phenotyping

Single-environment trial

The trial was grown at the World Vegetable Center in Shanhua, Tainan, Taiwan (latitude 23.1°N, longitude 120.3°E, elevation 12 m.a.s.l.). The trial was planted on October 06, 2022, with 20 seeds per plot in a randomized complete block design with four blocks. The treatments consisted of 292 accessions of the mung bean mini-core collection, four experimental check cultivars (NM94, KPS1, KPS2, and CO5), and one commercial check (3890—Tainan #5—replicated three times per block). Soil samples were collected at twelve places within the experimental field to understand overall variability.

Phenospex

Automated data collection was done using four PlantEye 3D-Spectral Scanners F500 (Phenospex) mounted on a gantry moving over the experimental field (20 m × 100 m) at 3 m/min. The three-dimensional (3D) plant multispectral data in the near-infrared range (NIR: 720–750 nm) and at three color bands (RED: 620–645 nm, GREEN: 530–540 nm, and BLUE: 460–585 nm) were automatically processed using the PHENA analytics platform (Phenospex) and visualized and analyzed by HortControl 3.0 (Phenospex). This platform collected 44 parameters related to plant and leaf morphology and color reflectance, including normalized differentiation vegetattion index (NDVI), normalized pigment chlorophyll ratio index (NPCI), and plant senescence reflectance index (PSRI). Data were collected twice a day from each plot for 35 days (November 17–December 19, 2022), during vegetative and reproductive plant phases.

Manual phenotyping

To explore the utility of automated phenotypes, standard manually collected phenotypes were also taken; these included days to maturity, plant count per plot, 100-seed weight, pod with seed weight, seed weight, and seeds per pod (of randomly picked 10 pods per plot). The 10 randomly selected pods were taken from the entire plot harvest at random, to be used for calculating seeds per pod and mean seeds per pod (see Figure 1).

[IMAGE OMITTED. SEE PDF]

Multi-environment

To explore the utility of different selection methods, we conducted a series of multi-environment trials with the mungbean mini-core performed in 15 countries across 23 locations over 6 years. In these trials, 40 phenotypic parameters were collected, including 100-seed weight, resistance scores to anthracnose, halo-blight, Mungbean yellow mosaic disease, powdery mildew, Cercospora leaf spot, yield, seed color and luster, pod count per plant, and plant height at flowering. Not all traits were collected across all environments and accessions. To test selection methods, yield was used for proof of concept. Yield was phenotyped in eight environments, where we classify an environment as a combination of location and year. Across these environments, yield was phenotyped in 1 of 4 years (2016–2019).

Analysis/prediction

All modeling was conducted using R Statistical Software and the packages lmer and sommer (Bates et al., 2015; Covarrubias-Pazaran, 2016). Data summarization and correlations were estimated using the packages dplyr and stats (R Core Team, 2013; Wickham et al., 2023). Correlations between manual and automatic phenotypes were estimated by first obtaining the mean by accession of automatic phenotypes and then calculating the Pearson's correlation coefficient of mean automatic phenotype with manual phenotype.

All data analysis and code are available ().

Single-environment

Within the single environment, the best linear unbiased predictions (BLUPs) were calculated for each TOI. The form is as follows: 1 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {{y}_{ijklm} = {u}_i + {b}_j + {s}_k + {\beta }_l{m}_l + {\beta }_m{c}_m + {\varepsilon }_{ijklm}}\\ {u \sim N\left( {0,{\sigma }_u^2} \right)}\\ {b \sim N\left( {0,{\sigma }_b^2} \right)}\\ {s \sim N\left( {0,{\sigma }_s^2} \right)} \end{array} \end{equation}$ where y_ijklm is the predicted value of linear mixed-model regression of u_i is the random effect of the ith genotype with u ∼ N(0, σ²_g), b_j is the random effect of the jth block with b ∼ N(0, σ²_b), s_k is the random effect of the kth soil sample with s ∼ N(0, σ²_s), m_l where β_l tells us the fixed effect of the days to maturity on yield, and c_m where β_m tells us the fixed effect of the number of plants in a plot on yield.

Phenomics

Here, we are predicting line performance for each TOI informed with phenomics relationship. The form is as follows: 2 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {{y}_{ijklm} = {u}_i + {b}_j + {s}_k + {\beta }_l{m}_l + {\beta }_m{c}_m + {\varepsilon }_{ijklm}}\\ {u \sim N\left( {0,{{\bf P}}{\sigma }_u^2} \right)}\\ {b \sim N\left( {0,{\sigma }_b^2} \right)}\\ \def\eqcellsep{&}\begin{array}{l} s \sim N\left( {0,{\sigma }_s^2} \right)\\ {{\bf P}} = \left[ {{\mathrm{HTP}}{\sigma }_u^2} \right] \end{array} \end{array} \end{equation}$ where y_ijklm is the predicted value of linear mixed-model regression of genotype informed with all phenomics readings where the matrix P = HTPσ²_u is the variance covariance matrix for the random effect genotype, from the normal distribution u ∼ N(0, Pσ²_u) with P being the phenomic relationship matrix formed by the variance–covariance of phenomics readings, b_j is the random effect of the jth block with b ∼ N(0, σ²_b), s_k is the random effect of the kth soil sample with s ∼ N(0, σ²_s), m_l where β_l tells us the fixed effect of the days to maturity on yield, and c_m where β_m tells us the fixed effect of the number of plants in a plot on yield.

Genomics

Here we are predicting line performance for each TOI informed with a genomics relationship. The form is as follows: 3 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {{y}_{ijklm} = {u}_i + {b}_j + {s}_k + {\beta }_l{m}_l + {\beta }_m{c}_m + {\varepsilon }_{ijklm}}\\ {u \sim N\left( {0,{{\bf P}}{\sigma }_u^2} \right)}\\ {b \sim N\left( {0,{\sigma }_b^2} \right)}\\ \def\eqcellsep{&}\begin{array}{l} s \sim N\left( {0,{\sigma }_s^2} \right)\\ {{\bf G}} = \left[ {K{\sigma }_u^2} \right] \end{array} \end{array} \end{equation}$ where y_ijklm is the predicted value of multivariate linear mixed-model regression of genotype informed with SNPs where G = Kσ²_u is the variance covariance matrix for the random effect genotype, from the normal distribution u ∼ N(0,Gσ²_u) with G being the genomic relationship matrix, b_j is the random effect of the jth block with b ∼ N(0, σ²_b), s_k is the random effect of the kth soil sample with s ∼ N(0, σ²_s), m_l where β_l tells us the fixed effect of the days to maturity on yield, and c_m where β_m tells us the fixed effect of the number of plants in a plot on yield.

Phenomics with genomics BLUP

Here we are predicting line performance for each TOI informed with phenomics and genomics relationships. The form is as follows: 4 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {{y}_{ijklm} = {u}_i + {b}_j + {s}_k + {\beta }_l{m}_l + {\beta }_m{c}_m + {\varepsilon }_{ijklm}}\\ {u \sim N\left( {0,{{\bf PG}}{\sigma }_u^2} \right)}\\ {b \sim N\left( {0,{\sigma }_b^2} \right)}\\ {s \sim N\left( {0,{\sigma }_s^2} \right)}\\ {{{\bf PG}} = \left[ {K{\sigma }_u^2} \right] + \left[ {{\mathrm{HTP}}{\sigma }_u^2} \right]} \end{array} \end{equation}$ where y_ijklm is the predicted value of linear mixed-model regression of genotype informed with SNPs where G = Kσ²_u is the variance covariance matrix for the random effect genotype, from the normal distribution genotype ∼ N(0,PGσ²_u) with G being the genomic relationship matrix and informed with all phenomics readings where P = HTPσ²_u is the variance covariance matrix for the random effect genotype, from the normal distribution u ∼ N(0, Pσ²_u) with P being the phenomic relationship matrix, b_j is the random effect of the jth block with b ∼ N(0, σ²_b), s_k is the random effect of the kth soil sample with s ∼ N(0, σ²_s), m_l where β_l tells us the fixed effect of the days to maturity on yield, and c_m where β_m tells us the fixed effect of the number of plants in a plot on yield.

Multi-environment

Here, we remove random variation associated with replication/blocking in order to obtain a single BLUP value for each TOI in each environment. The form is as follows: 5 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {{y}_{ij} = {\mathrm{ ge}}_i + {b}_j + {\varepsilon }_{ij}}\\ {{\mathrm{ge }}\sim N\left( {0,{\sigma }_{{\mathrm{ge}}}^2} \right)}\\ {b \sim N\left( {0,{\sigma }_b^2} \right)} \end{array} \end{equation}$ where y_ij is the predicted TOI value of linear mixed-model regression of ge_i is the random effect of the ith genotype by environment interaction with ge ∼ N(0, σ²_ge), and b_j is the random effect of the jth replication with b ∼ N(0, σ²_b). We then use the TOI BLUPs within each environment as our response variable in the following prediction models, employing different assumptions of variance–covariance structure in the R package "sommer" (Covarrubias-Pazaran, 2016; Crossa et al., 2022).

Main effects model

This is a main effects model because we are specifying the genetic correlation structure as a genomic relationship matrix, with the residual variance being the same across environments. 6 $\begin{equation}{y}_{ij} = {u}_i + {\beta }_j{e}_j + {\varepsilon }_{ij}\end{equation}$ where y_ij is the predicted value of linear mixed-model regression of u_i is the random effect of the ith genotype with u ∼ N(0, σ²_u), and e_j where β_j tells us the fixed effect or environment-specific mean yield.

Compound symmetry

This is compound symmetry because we are specifying the genetic correlation structure as compound symmetry with equal genotypic variance and equal correlation among all pairs of environments where the residual variance is the same across environments. 7 $\begin{eqnarray} y &=& g + e\nonumber\\ y &=& \left\{ {{y}_1, \ldots ,{y}_m} \right\}\nonumber\\ g &=& \left\{ {{g}_1, \ldots ,{g}_m} \right\}\nonumber\\ e &=& \left\{ {{e}_1, \ldots ,{e}_m} \right\} \end{eqnarray}$ $\begin{equation*}{\mathrm{g\ }}\sim \ N\left( {0,G \otimes \left[ { \def\eqcellsep{&}\begin{array}{@{}*{3}{c}@{}} {\sigma _g^2 + \ \sigma _{gxe}^2}&{\sigma _g^2}&{\sigma _g^2}\\ {\sigma _g^2}&{\sigma _g^2 + \ \sigma _{gxe}^2}&{\sigma _g^2}\\ {\sigma _g^2}&{\sigma _g^2}&{\sigma _g^2 + \ \sigma _{gxe}^2} \end{array} } \right]} \right)\end{equation*}$ $\begin{equation*}e \sim N\left( {0,I{\sigma }_e^2} \right)\end{equation*}$ where y is phenotypic value, g is genotypic value, and e is non-genotypic value. y₁,…,y_m is phenotypic values in environments 1,…,m; g₁,…,g_m is genotypic values in environments 1,…,m; and e₁,…,e_m is non-genotypic values in environments 1,…,m.

Compound symmetry with heterogeneous variance

This is compound symmetry with heterogeneous variance because we are specifying the genetic correlation structure as compound symmetry with heterogeneous genotypic variance in the environments and constant genotypic covariance across all environments, with the residual variance being the same across environments. 8 $\begin{equation} \def\eqcellsep{&}\begin{array}{@{}*{1}{c}@{}} {y = g + e}\\ {y = \left\{ {{y}_1, \ldots ,{y}_m} \right\}}\\ {g = \left\{ {{g}_{\mathrm{1}}, \ldots ,{g}_m} \right\}}\\ {e = \left\{ {{e}_{\mathrm{1}}, \ldots ,{e}_m} \right\}} \end{array} \end{equation}$ $\begin{equation*}{\mathrm{g\ }}\sim \ N\left( {0,G \otimes \left[ { \def\eqcellsep{&}\begin{array}{@{}*{3}{c}@{}} {\sigma _g^2 + \ \sigma _{gxe}^2}&{\sigma _g^2}&{\sigma _g^2}\\ {\sigma _g^2}&{\sigma _g^2 + \ \sigma _{gxe}^2}&{\sigma _g^2}\\ {\sigma _g^2}&{\sigma _g^2}&{\sigma _g^2 + \ \sigma _{gxe}^2} \end{array} } \right]} \right)\end{equation*}$ $\begin{equation*}e \sim N\left( {0,I{\sigma }_e^2} \right)\end{equation*}$ where y is phenotypic value, g is genotypic value, and e is non-genotypic value. y₁,…,y_m is phenotypic values in environments 1,…,m; g₁,…,g_m is genotypic values in environments 1,…,m; and e₁,…,e_m is non-genotypic values in environments 1,…,m.

Selection

Selection is made by truncation selection by line ranking for either a weighted rank sum index (WRSI) or yield at a proportion of 10%.

Single-environment

The WRSI is a selection index formed by the linear combination of line rankings for yield component trait BLUPs, weighted by the predictive ability of the underlying model (Fumia et al., 2023). This index technique serves as a proxy for yield, combining yield component trait BLUPs additively: 100-seed weight, pod with seed weight, seed weight, and seed count per pod. Moreover, the weighting of the index components serves to reduce component contribution proportionally to its ability of being accurately predicted. Days to maturity is not included in the index because this effect is removed from the predictions by using it as a covariate in the models. The intersecting lines, those selected by each method, are extracted to compare against the multi-environment models.

Multi-environment

Here, selection is based on yield rank. Selections are made from each model by selecting the top 30 lines (highest BLUP). Comparisons are then made between these G × E models, and the intersecting lines selected are extracted. The single-environment intersecting line selections and the multi-environment intersecting line selections are then compared to identify lines selected for the WRSI (yield index) and for G × E yield BLUP (yield).

RESULTS AND DISCUSSION

Trait relationships

The purpose of data summarization is to understand the data structure, identify correlations among factors, and begin the process of analysis. An important aspect of understanding variation in our trials is to have some ideas about the variation that is present across replications or environments for a TOI, so that when we move to regression, we know what a reliable effect estimate should be for different covariates. For example, we can view variation among replications from the single-environment trials for the manually collected phenotypes (Figure 1), the automatically collected phenotypes (Figure 2), and the soil samples (Figure S1). There is not much variation present within manually or automatically collected phenotypes among replications, an indicator of a homogeneous trial location. However, if we shift to soil samples (Figure S1), there is substantial variation by sample point identifying different soil characteristics, inferring that soil sample identifier is likely to be a covariate in our regression models with a larger effect than replication, despite both being important to include. There was also substantial phenotypic variation in the multi-environment trials (Figure 3).

[IMAGE OMITTED. SEE PDF]

Correlations (Figure 4) are also an important first analysis step. Some meaningful correlations between manual phenotypes include the expected positive correlation between 100-seed weight and 10 pod weight (0.866) and the negative correlation between days to maturity and total seed weight (−0.436). This means that if 100-seed weight is difficult to collect, it may not need to be collected in other trials as it is mostly redundant with 10 pod weight. We could substitute 10 pod weight and have confidence in it as a proxy for 100-seed weight. Other meaningful correlations between manually and automatically collected phenotypes include the positive correlation between 100-seed weight/10 pod weight and mean light penetration depth (0.316 and 0.333). Mean projected leaf area is moderately correlated with pod and 100-seed weight (0.290) and with total seed weight (0.256). Mean seeds per pod is moderately correlated with height (0.285). Ten pod weight is negatively correlated with mean NDVI Bin 3 (−0.239).

[IMAGE OMITTED. SEE PDF]

Single environment

The analysis starts with performing a linear mixed-model analysis for identifying the most parsimonious model to move forward with. The first model uses the covariates of genotype, replication, soil sample, and days to maturity, and the second model uses all of the above plus plant count per plot. When compared through Akaike information criterion (AIC) and analysis of variance, the second model is significantly different and more fit (AIC difference = 94; p-value < 0.001). The covariates from the second model are now included in all models to follow. There were strong relationships among phenotypes collected in the experiment (Figure 4).

Moving forward to the use of phenomics, genomics, and their combination results in some different interpretations from the findings. We will focus primarily on the cross-validated models from which our selections are made (Figure S2; Table 1), predictive abilities calculated, and trait heritability estimated (Figure 5). Predictive abilities vary depending on the model and the trait. The larger the variation of predictive ability within a trait, the smaller the heritability for that trait, and vice versa. This is an expected outcome, as the more heritable a specific trait is, the more the phenotype of that trait is similar across related individuals, and therefore the prediction is more accurate.

TABLE 1 Lines selected from the single-environment trial for yield index by weighted rank sum.

Rank	BLUP	PBLUP	GBLUP	PGBLUP
1	VI002872BG	VI001339AG	VI002587AG	VI002587AG
2	VI001339AG	VI002523AG	VI001096AG	VI001096AG
3	VI001096AG	VI002647AG	VI005041AG	VI005041AG
4	VI002587AG	VI003057BG	VI001339AG	VI000805BG
5	VI003470BG	VI001124AG	VI000805BG	VI001339AG
6	KPS1	VI003034BG	VI000380AG	VI003470BG
7	VI001733BG	VI000380AG	VI003470BG	VI003951AG
8	VI000380AG	VI002587AG	VI002872BG	VI000380AG
9	VI003951AG	VI003959BG	VI003951AG	VI002872BG
10	VI000805BG	VI003187BG	VI003801BG	VI001733BG
11	VI002190BG	VI000317BG	VI002646AG	VI002646AG
12	VI003187BG	VI001096AG	VI001023BG	VI001023BG
13	VI003220AG	VI004006A-GM	VI001733BG	VI003801BG
14	VI002877BG	VI001548AG	VI004958BG	VI003220AG
15	VI002646AG	3890	VI003364AG	VI002402BG
16	VI000981BG	VI000981BG	VI003220AG	VI004958BG
17	VI005041AG	VI000188A-BLM	VI002402BG	VI004942BG
18	VI004958BG	VI003083BG	VI001859BG	VI003364AG
19	VI000317BG	VI003465BG	VI004069BG	VI001859BG
20	VI003648BG	VI002611AG	VI004942BG	VI000232AG
21	3890	VI003220AG	VI000317BG	VI000735BG
22	VI002009BG	VI002469AG	VI000232AG	VI004069BG
23	VI004915BG	VI002646AG	VI001191BG	VI000317BG
24	VI003083BG	VI000232AG	VI003242AG	VI004954BG
25	VI002647AG	VI000680AG	VI003187BG	VI002672AG
26	VI002993BG	VI001191BG	VI004954BG	VI003187BG
27	VI004942BG	VI003648BG	VI000735BG	VI003019BG
28	VI001023BG	VI000470AG	VI000981BG	VI000981BG
29	VI000232AG	VI002190BG	VI002190BG	VI002206AG
30	VI002934AG	VI003276BG	VI002587AG	VI002587AG

[IMAGE OMITTED. SEE PDF]

Selections were made from each type of cross-validated model BLUPs by using a WRSI. There are sizable overlaps between the selections made by techniques of BLUP, GBLUP, and genomic and phenomic best linear unbiased prediction (PGBLUP), with fewer between different combinations of phenomic best linear unbiased prediction (PBLUP) and the others. This is likely due to PBLUP relationships being formed based upon phenomics data from a single-environment trial, taking into account additive and non-additive effects. Phenomics is observing this variation and making predictions about effects that are not heritable, reducing predictive ability (Figure 5) and identifying a larger proportion of novel lines (Figure S2). But, considering the relationship matrix is formed by similar values of phenomic parameters related to plant health and growth, the selections made by PBLUP were healthy growing lines with apparent low expression of the TOI. Furthermore, the selection index is based upon selecting large and heavy seeds, something that could be causing this difference. The GBLUP and PGBLUP selections are almost entirely the same (27/30), where the three outlier selections made by each technique could be further investigated to parse the differences between the predictive models. Model comparisons of PBLUP, GBLUP, and PGBLUP show a tremendous improvement in model performance, with an AIC of 335, 296, and 261, respectively. Although GBLUP has an advantage of performance over PBLUP, the combination of the two improves overall performance. This is likely due to the P matrix (phenomics relationship matrix) identifying line correlations not identified by the G matrix (genomics relationship matrix) due to non-additive effects of trait variation captured by 3D multispectral systematic data.

Multi-environment

Quantitative traits such as yield are important to evaluate across many environments, years, and seasons, especially when predicting the genotypic effects associated with them. The environment has an increasing effect on phenotypic performance when the trait becomes increasingly quantitative. There are three methodologies of addressing genotype-by-environment interaction: (1) ignore it—average performance across environments; (2) reduce it—cluster environments into subgroups and select cultivars for those environmental subgroups; or (3) exploit it—select environment-specific cultivars to maximize productivity (Bernardo, 2002). Methods (2) and (3) require specific environmental variables to be used in order to parse variation appropriately; therefore, we decided to use method (1), where the goal is the selection of an environmentally stable line with a good BLUP value across environments. Traits of interest along the geographies where the mini-core has been evaluated show a large amount of variation; therefore, it is imperative to confirm some selections along the geographical gradient.

The methods used here of specifying different variance-covariance structures in different models and selecting therein, is an approach useful for selection by looking for intersections between BLUP-selected lines. However, if the goal is to use the model for prediction of untested genotypes (progeny), proper model choice through cross-validation and comparison of predictive ability is an imperative step. A total of 30 selections per G × E model structure were made and compared to one another, where the main effects and compound symmetry models have the highest intersection (19/30; Figure S3; Table 2). Compound symmetry with heterogeneous variance is an outlier, with the majority of its selections (21/30) not matching other methods. But this model does have the highest predictive accuracy and should be considered for cross-validation and genomics-assisted breeding integration. Across the three methods, there are a total of eight intersecting lines selected for G × E stability and high G × E yield BLUP.

TABLE 2 Lines selected from the multi-environment trials for yield by G × E models.

Rank	gemainBLUP	gecsBLUP	gecsdgBLUP
1	VI001124AG	VI001124AG	VI001191BG
2	VI001339AG	VI001339AG	VI001339AG
3	VI002569BG	VI002569BG	VI000736AG
4	VI000736AG	VI002469AG	VI002063BG
5	VI002469AG	VI000736AG	VI002569BG
6	VI000578AG	VI000578AG	VI000805BG
7	VI000380AG	VI003957AG	VI001993BG
8	VI001211AG	VI003958B-BLM	VI003490AG
9	VI003958B-BLM	VI001211AG	VI001023BG
10	VI003957AG	VI000380AG	VI001974BG
11	VI000942AG	VI003212B-BLM	VI002469AG
12	VI003212B-BLM	VI004958BG	VI004069BG
13	VI004958BG	VI000942AG	VI003242AG
14	VI004931AG	VI004931AG	VI000380AG
15	VI002523AG	VI003490AG	VI000735BG
16	VI002195AG	VI003470BG	VI003534AG
17	VI003470BG	VI002195AG	VI002009BG
18	VI002206AG	VI002523AG	VI001859BG
19	VI003490AG	VI002206AG	VI003220AG
20	VI004954BG	VI004010AG	VI000175BY
21	VI004010AG	VI000559AG	VI003057BG
22	VI000461BG	VI000852AG	VI003379BG
23	VI000559AG	VI000175BY	VI001520A-BLM
24	VI000175BY	VI004954BG	VI002672AG
25	VI003220AG	VI000461BG	VI000818BG
26	VI000470AG	VI001548AG	VI003886BY
27	VI000317BG	VI003220AG	VI003725BG
28	VI001221AG	VI001974BG	VI003364AG
29	VI002739AG	VI001562AG	VI004965BG
30	VI000852AG	VI000317BG	VI001605BG

Comparisons and considerations

The final step is a comparison of the 10 intersecting lines from Section 2 and the eight intersecting lines from Section 3. When compared, three lines are selected by every method of BLUP employed (Figure S4; Table 3). These three lines are accessions PI 369787 () from the Philippines, EG-MD-6D from the Philippines (), and PI 363534 () from India.

TABLE 3 Intersecting lines selected from all methods (green), intersecting lines selected from single-environment trial (red), and intersecting lines selected from multi-environment trials (blue).

All-environment selections—Yield and yield index
Accession number	Accession name	Provenance
VI001339AG***	PI369787	Philippines
VI000380AG***	EG-MD-6D	Philippines
VI003220AG***	PI363534	India
VI002569BG**	PI381396	Nigeria
VI000736AG**	M.S.9719	India
VI002469AG**	(CES 28 × ML-18)	Philippines
VI003490AG**	PI363845	India
VI000175BY**	EC15131	India
VI002587AG*	CPI10594	Australia
VI001096AG*	Q10590	Australia
VI002646AG*	Watt & Finkner '73	Thailand
VI000232AG*	EC15179	Iran
VI000317BG*	PI323285	Pakistan
VI003187BG*	PI363487	India
VI000981BG*	EC15006 PARTO	Philippines

The comparison of selection was done on the biodiverse germplasm of a core collection displaying more variation than breeding populations. In such diverse populations, the selection of genetic resources as donor accessions for new traits is challenging, as TOIs may be blurred by lack of adaptation. In such a germplasm set, selecting for yield may be inappropriate, while selecting for traits that contribute to improved yield in adapted lines is of greater interest. Single-environment automatized phenotyping provides trait values for a multitude of traits that may or may not contribute to yield in adapted genotypes. Moreover, single-environment predicted values for component traits are likely to inadequately represent truth, but our multi-environment genotype-by-environment predicted values for yield assist in alleviating some of this potential limitation. Since cultivated mungbean is bred to meet geographical market preferences, future work should consider the application of the above methods to a selection index. A potential selection index of interest could be the Smith-Hazel index, in which different economically based weights can be applied to the various TOIs on a geographical basis, assisting in the selection of geographically specific accessions (Hazel, 1943; Smith, 1936). This technique would improve in strength by defining the environment based on quantitative measurements of temperature, precipitation, and soils because selections could be made in accordance with consumer and producer preferences.

The predictive ability of the models will increase once applied to a breeding population because it is determined by the training population size, relatedness of the training and target populations, and linkage disequilibrium (Wartha & Lorenz, 2021). This is driven by improved marker effect estimation by increased training population size as well as high relatedness, ensuring the loci variation in the target population is the same as the training population with the same linkage phase. Therefore, a breeding population can be split into training/validation and target sets, effectively decreasing the phenotyping requirements by being able to predict phenotypes in the target population by leveraging population relatedness and linkage disequilibrium through genomic selection (Voss-Fels et al., 2019). Furthermore, training population composition is important to accurately predict performance in the target population, where the addition of increasingly unrelated individuals to the training population will reduce the predictive ability as compared to a smaller training population of related individuals (Lorenz & Smith, 2015).

Future work should focus on the following steps: (1) selection of geographically specific accessions meeting consumer and producer preferences, and (2) forming breeding populations within these groups to improve predictive ability by population relatedness and linkage disequilibrium. Key benefits of this approach include improved predictive ability of models by borrowing information from increasingly related individuals and correlated environments (Crossa et al., 2014).

Breeding insights and implications

Single-environment evaluation gives insight on overall performance, and there is a (small) overlap of selections between single- and multi-environment evaluations. Here, we observed the overlap of three accessions out of eight selected in multi-location trials out of 293 entries, showing that the model performs well. In general, multi-environment models have better prediction accuracy than single-environment models. Hence, the overlapping accessions between the single- and multi-environment evaluations are highly encouraging, give more confidence in the predictive ability of the model, and indicate that we can use genomic prediction on a larger scale even in a crop where genomic technologies are just being developed. However, it is important to note that the heritability of correlation among traits and correlation between environments helps with interpretation. For example, depending on the goals of the individual breeder, an overlap among two environments that are highly correlated may not be as meaningful as an overlap between divergent environments. Phenomic selection provides a much more cost-effective way to enhance breeding efforts, and here, in line with previous work, we see that investment in intensive, high-quality data generation in a single environment can have large impacts in other environments (Rincent et al., 2018).

CONCLUSION

Access to multi-environment trial networks, the ability to genotype every line in a breeding program, and the ability to turn around data fast enough to make breeding decisions determine the use of genomic selection in breeding programs. Here we have shown that the use of genomic selection is worthwhile despite the limitations for crops that have fewer resources than breeding programs for major commodities. Within a single-environment trial using GBLUP, PBLUP, and PGBLUP, as well as within a multi-environment framework using genotype-by-environment interaction BLUP, intersecting lines were selected. The three mungbean lines selected through all methods employed were PI 369787 from the Philippines, EG-MD-6D from the Philippines, and PI 363534 from India. These were selected through BLUP in a single environment based on a yield index by use of genomics, phenomics, and their combination, as well as through multi-environment prediction of yield with genotype-by-environment interaction under alternate variance-covariance assumptions. These methods can be deployed for different crops and/or different quantitative traits to streamline the selection process.

AUTHOR CONTRIBUTIONS

Nathan Fumia: Conceptualization; data curation; formal analysis; visualization; writing—original draft; writing—review and editing. Ramakrishnan Nair: Conceptualization; funding acquisition; data curation; investigation; methodology; writing—review and editing. Ya-Ping Lin: Conceptualization; formal analysis; methodology; writing—review and editing. Cheng-Ruei Lee: Conceptualization; data curation; investigation; methodology; writing—review and editing. Hung-Wei Chen: Data curation; formal analysis; investigation; writing—review and editing. Eric Bishop von Wettberg: Conceptualization; investigation; methodology; writing—review and editing. Michael Kantar: Conceptualization; writing—original draft; writing—review and editing. Roland Schafleitner: Conceptualization; methodology; writing—review and editing.

ACKNOWLEDGMENTS

The work was funded by the Australian Centre for International Agricultural Research (ACIAR) through the projects on International Mungbean Improvement Network (CIM-2014-079 and CROP-2019-144) and long-term strategic donors to the World Vegetable Center, Taiwan: UK aid from the UK Government, United States Agency for International Development (USAID), Australian Centre for International Agricultural Research (ACIAR), Germany, Thailand, Philippines, Korea, and Japan. Cheng-Ruei Lee was supported by 112-2628-B-002-023-MY3 of the National Science and Technology Council, Taiwan. The DNA sequences used for developing the markers for this study were generously provided by the Twelfth Annual Illumina Agricultural Greater Good Initiative Grant (.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

References

Aski, M. S., Rai, N., Reddy, V. R. P., Gayacharan, Dikshit, H. K., Mishra, G. P., Singh, D., Kumar, A., Pandey, R., Singh, M. P., Pratap, A., Nair, R. M., & Schafleitner, R. (2021). Assessment of root phenotypes in mungbean mini‐core collection (MMC) from the World Vegetable Center (AVRDC) Taiwan. PloS One, 16(3), [eLocator: e0247810]. [DOI: https://dx.doi.org/10.1371/journal.pone.0247810]

Word count: 6227

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Grown on 7 million ha, mungbean is a warm‐season grain legume with regional importance in parts of Asia and Africa. Under forecasted climate change, due to its tolerance to drought and heat, the short crop duration, and its nutritional properties, mungbean could serve to fill an important need for human diets. However, selection of accessions becomes difficult where plant and consumer market variation is large. We performed selection on genebank accessions, specifically the mini‐core collection at the World Vegetable Center, for yield and yield component traits. Our selection index uses refined accuracy by leveraging genomics, phenomics, and genotype‐by‐environment interactions. Best linear unbiased prediction (BLUP) is used to predict the genotypic effects of the 292 mini‐core accessions toward seed yield based on genomic relationships formed from ∼200,000 SNPs. We expanded BLUP analysis to predict phenotypic effects based on the phenomic relationships formed from ∼75,000 measurements from three‐dimensional multispectral data. While this method is restricted to a single environment, our multi‐environment trials across eight countries and 4 years serve to quantify the genotype‐by‐environment effect. K‐fold cross‐validation finds predictive ability to vary by methods but to be related to the narrow‐sense heritability of the yield component trait. Our weighted rank sum index (WRSI) linearly combines yield component traits to proxy yield within our single environment phenomics trial by first ranking genomic and/or phenomic BLUPs, then weighting by predictive accuracy from the cross‐validated model, and then summing the component weighted ranks for each accession. Selections were made from the predicted random effects in each location, identifying three accessions overlapping across both methodologies: PI 369787 (VI001339A‐G) and EG‐MD‐6D (VI000380A‐G) from the Philippines, and PI 363534 (VI003220A‐G) from India.

Details

Title

Leveraging genomics and phenomics to accelerate improvement in mungbean: A case study in how to go from GWAS to selection

Author

Fumia, Nathan¹; Nair, Ramakrishnan²; Lin, Ya‐Ping³

; Lee, Cheng‐Ruei⁴; Chen, Hung‐Wei⁵; Wettberg, Eric Bishop⁶

; Kantar, Michael¹

; Schafleitner, Roland³

¹ Department of Tropical Plant and Soil Science, University of Hawaii at Manoa, Honolulu, Hawaii, USA
² World Vegetable Center South Asia, Patancheru, India
³ World Vegetable Center, Shanhua, Taiwan
⁴ Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan, Institute of Plant Biology, National Taiwan University, Taipei, Taiwan
⁵ Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan
⁶ Department of Plant and Soil Science, University of Vermont, Burlington, Vermont, USA

Section

ORIGINAL ARTICLES

Publication year

2023

Publication date

2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

25782703

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/ppj2.20088

ProQuest document ID

3139008313

Leveraging genomics and phenomics to accelerate improvement in mungbean: A case study in how to go from GWAS to selection

Jump to:

Full Text

Abstract

Details

Suggested sources