Genome‐wide Association Mapping and Prediction of

Full text

Translate

Turn on search term navigation

Abbreviations

BLUEs
best linear unbiased estimates

GBLUP
genomic best linear unbiased prediction

GWAS
genome-wide association studies

GP
genomic prediction

LD
linkage disequilibrium

MTA
marker–trait association

PC
principal component
p_G
percentage of genotypic variance

QTL
quantitative trait locus

RKHSR
reproducing kernel Hilbert space regression

STB
Septoria tritici blotch

Core Ideas

Elite European winter wheat varieties hold large genetic variance for resistance to Septoria tritici blotch (STB) infection.
Genome-wide association studies with high-density marker platforms identified hitherto undetected genetic loci for STB infection.
Genomic prediction suggested the effective use of genomic selection for STB resistance

Septoria tritici blotch is one of the most serious diseases of wheat. It is caused by an ascomycete fungus, Zymoseptoria tritici (syn. Mycosphaerella graminicola; anamorph Septoria tritici). The fungus enters into the leaves almost exclusively through the stomata and colonizes the mesophyll tissues without producing any feeding structures like haustoria (Palmer and Skinner, 2002). The macro-STB symptoms become visible after an extended latent period of symptomless growth in the leaf when a switch in the fungal reproductive mode from asexual to sexual occurs. The switch coincides with a rapid increase in fungal biomass and is associated with the appearance of pycnidia, which results in chlorosis or necrosis of the leaves, making the fungus essentially necrotrophic (O'Driscoll et al., 2014; Palmer and Skinner, 2002). The necrotic lesions produced by the STB infection prove most devastating on the flag and the first leaves, which provide most of the photosynthetic assimilates at the grain filling stage (Palmer and Skinner, 2002; Shaw and Royle, 1989; Thomas et al., 1989). The spread of STB may occur vertically (i.e., from the base to the top of the plants) via rain splashes or horizontally (i.e., from one plant to another) via leaf interaction. The yield losses from STB infection have been estimated to range from 30 to 50% under optimum fungal growth conditions (Fones and Gurr, 2015; Goodwin, 2007). The detection and restriction of STB spread is therefore of high priority.

The STB growth may be controlled by agronomic management practices (e.g., crop rotation and cultivar mixtures) and the application of fungicides (O'Driscoll et al., 2014). Currently, most STB infection is controlled by the latter. Nevertheless, the cost associated with the abovementioned measures is high and can result in substantial yield penalties and a high rate of pathogen evolution, respectively (Brown et al., 2015; O'Driscoll et al., 2014). Furthermore, resistance against strobilurin fungicides has been reported for various field populations of Z. tritici (Fraaije et al., 2005; Hayes et al., 2016; Heick et al., 2017). Septoria tritici blotch infections became more widespread after the release of semidwarf wheat varieties under high fertilizer input conditions during the Green Revolution. As a result, most of the short-statured and high-yielding varieties are susceptible to STB (Eyal, 1981). Consequently, susceptibility to STB infection has become cultivar-dependent and suggests that it is under genetic control. Therefore, exploiting the genetic diversity for STB resistance is an option to breed for more resistant cultivars.

The genetic architecture of STB infection in wheat can be qualitative or quantitative; in other words, some varieties may harbor a single or a few dominant genes conferring resistance to STB whereas some varieties may have a combination of many genes working in an additive manner. Although a single major gene is of utmost importance for controlling STB infection, qualitative resistance is a short-term measure because of the continuous pathogen–host warfare resulting from gene-to-gene interaction. Quantitative resistance, on the other hand, is a durable option and can be achieved by detecting and stacking moderate effect loci into an elite genetic background (Brown et al., 2015; Goodwin, 2007).

The genetic nature of STB infection has mostly been elucidated via linkage mapping studies in biparental populations, mostly with the objective of detecting a large-effect gene or quantitative trait locus (QTL) (Brown et al., 2015; Dreisigacker et al., 2015; Goodwin, 2007; Risser et al., 2011). Taller and late-heading varieties escape STB infection; as a result, Rht-D1 and Ppd-D1 were observed to be large-effect QTLs influencing STB infection (Arraiano et al., 2009; Simón et al., 2004). Genome-wide association studies can be performed on a diverse set of varieties with different genetic backgrounds and different infection rates. Since GWAS do not involve parental crosses and work effectively on all the available genetic diversity in a given panel of varieties, it promises increased mapping resolution (Myles et al., 2009). Although GWAS have detected the significant loci involved in quantitative STB resistance (Gurung et al., 2014; Kollers et al., 2013; Miedaner et al., 2013), it is still difficult to stack multiple loci in an elite background. Genomic prediction is a slightly different approach that exploits the effects of genome-wide markers rather than using only the significant loci to predict the genetic merit of the individual for the trait for selection purposes (Meuwissen et al., 2001). The GP results based on experimental data on STB and other diseases suggests its promising potential in breeding for improved quantitative resistance (Jiang et al., 2015; Miedaner et al., 2013; Mirdita et al., 2015; Rutkoski et al., 2012).

Our study is based on a panel of 371 European elite wheat varieties that were genotyped with two different high-density SNP arrays. The objectives of this study were (i) to determine the potential of dense marker arrays in GWAS to detect novel STB disease resistance loci and (ii) to study the potential of GP to predict STB infection by exploiting additive effects and additive and epistatic interactions in the genomic selection models.

MATERIAL AND METHODS Phenotypic Data Analyses

The data for STB disease infection (percentage of necrosis) were gathered on a European wheat panel comprising 371 (357 winter types and 14 spring types) elite varieties for two consecutive years (2009 and 2010) at Cecilienkoog, Germany, in replicated (three replications) incomplete α-block design trials. A detailed description of the STB spray inoculation method is provided in Kollers et al. (2013). Mixed field isolates with various degrees of virulence were applied. Briefly, the spray inoculation was performed in each environment (year) at two time points at 10-d intervals to increase the rate of disease infestation. The first inoculation was performed at growth stage 39/41 (emerged flag leaf). Septoria tritici blotch infection was visually assessed on the flag and first leaves 32 and 48 d after spray inoculation as the percentage of leaf area infected. High positive and significant correlations were observed between STB infection scoring at both time points on flag leaves and first leaves (Supplemental Fig. S1). Therefore, the arithmetic mean of STB infection on both leaves at both time points was taken to represent the adjusted STB infection in each replication. To correct the skewness, a square root transformation was applied, which resulted in a scale from 0 (least STB infection) to 5 (most STB infection). The following linear mixed-effect model was used for within-year phenotypic data analysis: [Image Omitted. See PDF]

where y_ik is the STB infection on the i^th genotype in the k^th replication, μ is the common intercept term, G_i is the effect of the i^th genotype, r_k is the effect of the k^th replication, and e_ik denotes the corresponding error term. All effects, except the intercept, were assumed to be random to calculate the individual variance components. The repeatability among the replications was calculated as: [Image Omitted. See PDF]

where $σ_{G}^{2}$ and $σ_{e}^{2}$ denote the variance components of genotype and error, whereas r denotes the number of replications. To calculate the best linear unbiased estimates (BLUEs), the intercept and genotypic effects were assumed to be fixed. Phenotypic data across years were analyzed as: [Image Omitted. See PDF]

where y_ijk is the infection rate on the i^th genotype in the j^th environment and k^th replication, μ is the common intercept, G_i is the effect of the i^th genotype, E_j is the effect of the j^th environment, (G × E)_ij is the genotype × environment interaction effect of the i^th genotype and j^th environment, r_(j)k is the effect of the k^th replication in year j, and e_ijk is the corresponding error. To calculate the heritability of STB infection across environments, Eq. [3] was used to calculate the individual variance components by assuming all effects except the intercept as random as: [Image Omitted. See PDF]

where $σ_{G}^{2}$ , $σ_{G \times E}^{2}$ , and $σ_{e}^{2}$ denote the variance components of genotype, genotype × environment interaction and error, respectively, and r and E denote the number of replications and environments, respectively. To calculate best linear unbiased estimates (BLUEs), all effects except the intercept and genotype were assumed random in Eq. [3].

Genotypic Data Analyses and Population Structure

All 371 varieties were genotyped with the 35k Affymetrix and 90k iSELECT single nucleotide polymorphism (SNP) arrays (Allen et al., 2017; Wang et al., 2014), which generated 35,143 and 81,587 SNP markers, respectively. Quality control was performed on SNP data from both arrays by removing the markers with >5% heterozygous or missing calls and with a minor allele frequency <0.05. The remaining missing data were imputed by mean of both alleles. The quality control resulted in a total of 28,222 SNPs (35k = 8964; 90k = 19,258), which were used in subsequent analyses. Overall, these SNPs represented 10,005 haplotypes (i.e., unique patterns) across all markers in the examined panel of genotypes. Of these, 2926 haplotypes were in common between the 90k iSELECT chip and the 35k Affymetrix chip, 5122 haplotypes were only present in the 90k chip, and 1957 haplotypes were specific for the 35k chip. The polymorphic SNPs were mapped on a reference population (TraitGenetics GmbH, unpublished data), namely the International Triticeae Mapping Initiative-doubled haploid population described in Sorrells et al. (2011). Of the 28,222 SNPs used for GWAS, 4073 markers from the 35k array and 7357 markers from the 90k array were mapped on the reference population. Along with SNP genotyping, the whole panel was genotyped with the candidate gene markers associated with photoperiodism (Ppd-D1), height (Rht-B1 and Rht-D1), and vernalization (Vrn-A1, Vrn-B1, and Vrn-D1) (Beales et al., 2007; Ellis et al., 2002; Zhang et al., 2008)

Population structure based on marker genotypes was examined by principal component (PC) analysis via singular value decomposition. The first two PCs were drawn to see the clustering among varieties. Moreover, the genetic relatedness among varieties was evaluated via an additive variance–covariance genomic relationship matrix. To infer the hidden population substructuring, an inference algorithm (LEA, Landscape and Ecological Association Studies) was used by assuming 10 ancestral populations (K = 1–10). The function snmf was used, which provides least squares estimates of ancestry proportions and estimates an entropy criterion to evaluate the quality fit of the model by cross-validation. The number of ancestral populations best explaining the data can be chosen by using the entropy criterion. We performed 10 repetitions for each K and the optimal repetition explaining the minimal cross-entropy value was used to visualize clustering among varieties as bar plots (Frichot and François, 2015).

Genome-Wide Association Studies

Genome-wide association studies were performed on the BLUEs calculated in individual years and SNPs passing the quality criteria plus the scoring for the candidate genes for photoperiodism, height, and vernalization. Let n be the number of varieties and p be the predictor marker genotypes. A standard linear mixed-effect model following Yu et al. (2006) was used to perform GWAS as: [Image Omitted. See PDF]

where y is the column vector of the BLUEs of each genotype in each environment calculated in Eq. [1]; μ is the common intercept; τ, β, v, u, and e are the vectors of environment, marker, population (PCs), polygenic background, and the error effects, respectively; E, X, P, and Z are the corresponding design matrices. In the model, μ, τ, β, and v were assumed to be fixed but u as random so that $u \sim N (0, G σ_{a}^{2})$ and $e \sim N (0, I σ_{e}^{2})$ . The _n × _n variance–covariance additive relationship matrix (_G) was calculated from _n × _p matrix _W = (w_ik) of marker genotypes (being 0, 1, or 2) as: [Image Omitted. See PDF]

where w_ik and w_jk are the profiles of the k^th marker for the i^th and j^th variety, respectively and p_k is the estimated frequency of one allele in the k^th marker. Subtraction by 2p_k in the numerator and standardization by the denominator allows us to interpret $σ_{a}^{2}$ as the additive genomic variance (VanRaden, 2008).

As population stratification and familial relatedness can severely impact the power to detect true marker–trait associations (MTA) in GWAS, different models were used to correct for population stratification: (i) PCs, (ii) the genomic relationship matrix (G), and (ii) both PCs and G. It is expected that using both PCs and G as fixed and random effects, respectively, in the model can enhance the accuracy of GWAS. Moreover, a two-step strategy that involves, firstly, within-year calculation of BLUEs and then setting the BLUEs as a fixed effect in the model further enhances the accuracy of GWAS. In all model scenarios, fixed environmental effects (BLUEs) were assigned. The models described above were compared by plotting their expected versus observed –log₁₀(P) values in a quantile–quantile plot. The best model was determined by checking how well the observed –log₁₀(P) values aligned with the expected ones.

To determine the MTA, a liberal (0.20) false discovery rate to account for multiple testing was applied (Benjamini and Hochberg, 1995). However, only Ppd-D1 crossed the false discovery rate threshold. An arbitrary threshold of significance of –log₁₀(P) ≥ 3.0 to determine the MTA was therefore applied. Following Utz et al. (2000), the total genotypic variance explained by all QTL (p_G) was determined as p_G = (R²_adj/H²) × 100. $R_{adj}^{2}$ was calculated by fitting all significant markers (–log₁₀(P) ≥ 3.0) in a multiple linear regression model in the order of ascending P-values; H² is the broad-sense heritability.

Since all varieties were genotyped with 35k and 90k SNP arrays plus the candidate gene markers, three GWAS scenarios were adopted to see if the marker density influenced the detection of MTAs. These scenario were (1) GWAS based on the 35k array, (2) GWAS based on the 90k array, and (3) GWAS based on combined set of 35k and 90k arrays.

Estimation of Global and Local Linkage Disequilibrium and Demarcation of the QTLs

Linkage disequilibrium (LD), the nonrandom association of alleles at different loci, was measured as the squared correlation (r²) among markers. The genetic mapping positions of the markers for both arrays were adopted from International Triticeae Mapping Initiative map as described in Sorrells et al. (2011). Although inter- and intrachromosomal LD among the loci vary, genome-wide calculation of LD gives a global estimate of the genetic mapping distance over which LD decays in a particular population. The genome-wide (global) LD was calculated from mapped markers only.

To see the local LD among the MTAs and to define the QTLs, pairwise LD was calculated among all the MTAs. For the unmapped MTAs, the chromosomal locations were retrieved from other published consensus maps (Allen et al., 2017; Wang et al., 2014) and a heat map was drawn to show the LD among them (Gu et al., 2016). The QTL harboring more than one MTA were shown by one representative MTA [that with the highest –log₁₀(P) value]. Furthermore, pairwise LD values were calculated between each representative MTA and the genome-wide markers (mapped and unmapped). A threshold of r² ≥ 0.20 was assumed to define the markers that co-segregated with the representative MTA. The sequences of all markers crossing the r² threshold were BLASTed against the wheat reference sequence to retrieve the corresponding genes and their functional descriptions (Altschul et al., 1990; International Wheat Genome Sequencing Consortium, 2018).

Genome-Wide Prediction

The BLUEs calculated across years were used in two different genomic selection models, namely genomic best linear unbiased prediction (GBLUP) and reproducing kernel Hilbert space regression (RKHSR) (Gianola et al., 2006; Gianola and van Kaam, 2008; Meuwissen et al., 2001; VanRaden, 2008). Following is the brief description of the models.

Genomic best linear unbiased prediction is a standard and robust parametric procedure that exploits the main additive effects of the loci to predict the genetic value of the trait. It involves regression of the phenotypic data (y) on the marker genotypes in a linear model of the form y = 1μ + Xβ + e, where, μ is a common intercept, X is a n × p incidence matrix of marker genotypes, β is a p × 1 vector of marker effects, and e is a n × 1 vector of error terms with the assumption that $β \sim N (0, I σ_{β}^{2})$ and $e \sim N (0, I σ_{e}^{2})$ . By setting g = Xβ, GBLUP can be written as y = 1μ + g + e, where $g \sim N (0, G σ_{a}^{2})$ ; G is as described in the GWAS section.

Reproducing kernel Hilbert space regression is a semiparametric regression procedure that accounts for the epistatic interactions among the loci as well. It is of the same form as GBLUP with the assumption that g = Kα and thus can be represented as y = 1μ + Kα + e, where y and e are the same as described in the GBLUP model and α is the vector of random effects. In RKHSR, $a \sim N (0, K σ_{α}^{2})$ and K is a n × n symmetric positive-definite matrix and is defined as $K_{i j} = \exp (- h d_{i j}^{2})$ , where K_ij represents the measured relationship between the i^th and j^th varieties based on their marker profiles, $d_{i j}^{2}$ is the Euclidean distance between the i^th and j^th varieties, and h is the bandwidth parameter. To determine the optimum h, a range of values was tested in a cross-validation scenario and the value representing the highest accuracy was chosen.

We evaluated the prediction accuracy (r_GP) of both GP models by using a fivefold cross-validation scenario. The varieties were randomly divided into five subsets and four of them were used as the training set to estimate the genetic values of the remaining test set. The accuracy of prediction was defined as the Pearson's correlation between the predicted and the observed value standardized by the square root of the broad-sense heritability as r_GP = cor(y_pred, y_obs)/H. The cross-validation runs were repeated for 1000 iterations. Furthermore, to check whether the increase in the number of markers resulted in an increase in r_GP, GP was performed on all varieties based on separate marker arrays (35k and 90k) and by combining markers from both arrays as described in the GWAS section. All calculations were performed in R software (R Core Team, 2016) and by using the packages lme4 and rrBLUP (Bates et al., 2015; Endelman, 2011).

RESULTS Phenotypic Data Analyses Show Significant Genetic Variation and High Heritability for STB Infection

The assessment of STB infection was performed in replicated trials in two environments (years). Square root transformation was applied to correct for the skewness in the data. The ANOVA showed that genotype $(σ_{G}^{2})$ , environment $(σ_{E}^{2})$ , and genotype × environment $(σ_{G \times E}^{2})$ variance was significantly (P < 0.001) larger than zero. The major contributing factor was $σ_{G}^{2}$ , followed by $σ_{G \times E}^{2}$ and $σ_{E}^{2}$ with remaining $σ_{e}^{2}$ ascribed to the residuals (Table 1). The Pearson's correlation (r) among the BLUEs of STB infection estimated in individual environments was positive and significant (Fig. 1a). The STB BLUEs calculated across years followed a statistical normal distribution, which is typical of quantitative traits (Fig. 1b). The within-environment repeatability of the STB score was high (2009 = 0.86; 2010 = 0.89) and the across-environment broad-sense heritability (H²) amounted to 0.78. Although, the significant $σ_{E}^{2}$ and $σ_{G \times E}^{2}$ indicated inconsistent disease pressure across environments, the high repeatability and H² values indicate the good quality of the phenotypic data. A large $σ_{G}^{2}$ coupled with high H² values is suggestive of a strong selection response for STB resistance in European wheat. Moreover, the quality of the phenotypic data promises its reliable use in GWAS and GP of the trait.

Table 1 ANOVA for Septoria tritici blotch in wheat.

	Df†	Sum Sq†	Mean Sq	F-value	Pr(> F)	Sig†	σ²†	SD†	% σ²
Genotype	370	865.60	2.34	13.22	<2 × 10–16	***	0.3055	0.55	43.39
Env.†	1	124.40	124.38	702.79	<2 × 10–16	***	0.1097	0.33	15.58
Genotype × Env.	370	187.40	0.51	2.86	<2 × 10–16	***	0.1113	0.33	15.80
Rep† × Env.	1	1.20	1.17	6.62	0.0102	*	0.0048	0.07	0.68
Residuals	1483	262.50	0.18	–	–	–	0.1728	0.42	24.55
Total	–	–	–	–	–	–	0.7041	–	100.00

Significant at the 0.05 probability level.

***

Significant at the 0.001 probability level.

†

Df = Degree of freedom; Sq = squares; Sig = significance codes; σ² = variance; SD = standard deviation; Env. = environment; Rep = replication.

View Image - Fig. 1. Distribution of Septoria tritici blotch (STB) infection in wheat. (a) Association between the best linear estimates (BLUEs) of STB infection calculated in the years 2009 and 2010. (b) Distribution of the BLUEs of STB infection calculated across years. r, P, and n denote the Pearson's product moment correlation, the significance of correlation, and the number of varieties, respectively.

Fig. 1. Distribution of Septoria tritici blotch (STB) infection in wheat. (a) Association between the best linear estimates (BLUEs) of STB infection calculated in the years 2009 and 2010. (b) Distribution of the BLUEs of STB infection calculated across years. r, P, and n denote the Pearson's product moment correlation, the significance of correlation, and the number of varieties, respectively.

The European Winter Wheat Shows the Absence of Distinct Subpopulations and a Sharp Decline in LD

Extensive genotyping on the whole panel resulted in 28,222 polymorphic SNPs and functional markers for the genes Ppd-D1, Rht-B1, Rht-D1, Vrn-A1, Vrn-B1, and Vrn-D1. We analyzed the population structure on the basis of marker genotypes by PC analysis and observed the absence of distinct subpopulations, with the first two PCs representing only 12.6% of the total variation (Fig. 2). Principal component analysis-based inference of population structure performed on individual marker arrays showed a similar pattern (Supplemental Fig. S2). The nonexistence of distinct population stratification was further supported by the STRUCTURE-like inference algorithm LEA, which resulted in the subpopulations being distinguished but with a slight entropy shift. The bar plots indicate admixed and weak subpopulations (Supplemental Fig. S3).

View Image - Fig. 2. Principal component (PC) analysis on the wheat marker loci combined from the 35k and 90k single nucleotide polymorphism arrays. (a) Scree plot showing first 10 PCs and their corresponding proportion of variance. (b) Scatterplot showing the absence of pronounced clustering among the varieties. Different colors represent the Ppd-D1 alleles. n and p denote the number of varieties and the marker genotypes used in the analysis, respectively.

Fig. 2. Principal component (PC) analysis on the wheat marker loci combined from the 35k and 90k single nucleotide polymorphism arrays. (a) Scree plot showing first 10 PCs and their corresponding proportion of variance. (b) Scatterplot showing the absence of pronounced clustering among the varieties. Different colors represent the Ppd-D1 alleles. n and p denote the number of varieties and the marker genotypes used in the analysis, respectively.

The marker density required for GWAS of a trait depends on the extent of LD in the population under consideration. The analysis of genome-wide LD (r²) between adjacent mapped marker genotypes showed a rapid decline in LD with increasing genetic map distance (Supplemental Fig. S4), with the first quantile dropping to 0.001 and the third to 0.013. The mean and median values of LD amounted to 0.013 and 0.004, respectively. Although the marker density was high, the LD values suggest that it can still be improved for increasing the power of the GWAS.

Genome-wide association studies of STB Infection Detect Multiple Marker–Trait Associations

Among the different GWAS models used in our study, we observed that the G-based model could sufficiently control the spurious associations. Our GWAS identified QTLs on chromosomes 1A, 1B, 2B, 2D, 4A, 5A, 6D, and 7A that exerted minor to modest effects on the trait. Supplemental Table S1 describes the MTAs detected in all three GWAS scenario (i.e., GWAS performed on markers from individual SNP array and on the combined set of markers). We noted that the detection of QTLs depends on the marker array used. For example, GWAS performed on individual marker arrays detected lower numbers of QTLs than GWAS performed on the combined genotypic data from both arrays (Fig. 3 and Supplemental Table S2). Moreover, the total genotypic variance imparted by all QTLs in GWAS Scenario 1 (by using markers from the 35k array only) was slightly lower (43.02%) than the GWAS Scenario 2 [based on 90k array markers (p_G = 47.20%)]. Scenario 3 (with the combined set of 35k and 90k markers) detected QTLs that remained undetected in analyses based on GWAS Scenarios 1 and 2 and explained ∼20% more genotypic variance altogether (65.05%). Therefore, in the following, we present the GWAS results based on Scenario 3.

View Image - Fig. 3. Summary of the genome-wide association studies (GWAS) of wheat in different scenarios based on marker platforms. (a) Manhattan plot shows the distribution of marker significance –log10(P) values along the chromosomes. The correction for population stratification was performed by using an additive relationship matrix (G) in the linear mixed-effect model. The red dashed line marks an arbitrary threshold [–log10(P) ≥ 3.0] for the detection of marker–trait associations, since only the functional marker (i.e., Ppd-D1) crossed the multiple testing criterion of false discovery rate [less than] 0.20. unm stands for unmapped markers. (b) Quantile–quantile plot showing the distribution of observed versus expected (red dashed line) –log10(P) values based on the naïve model (the general linear model without correction for population structure), P[1–10] model [the population structure corrected with first 10 principal components (PC)], the G model (population structure corrected with a genomic relationship matrix), and the P[1–10] + G model (population structure corrected with both PCs and the G matrix). The color code of the different models is given in the figure legend.

Fig. 3. Summary of the genome-wide association studies (GWAS) of wheat in different scenarios based on marker platforms. (a) Manhattan plot shows the distribution of marker significance –log10(P) values along the chromosomes. The correction for population stratification was performed by using an additive relationship matrix (G) in the linear mixed-effect model. The red dashed line marks an arbitrary threshold [–log10(P) ≥ 3.0] for the detection of marker–trait associations, since only the functional marker (i.e., Ppd-D1) crossed the multiple testing criterion of false discovery rate [less than] 0.20. unm stands for unmapped markers. (b) Quantile–quantile plot showing the distribution of observed versus expected (red dashed line) –log10(P) values based on the naïve model (the general linear model without correction for population structure), P[1–10] model [the population structure corrected with first 10 principal components (PC)], the G model (population structure corrected with a genomic relationship matrix), and the P[1–10] + G model (population structure corrected with both PCs and the G matrix). The color code of the different models is given in the figure legend.

Scenario 3 detected 44 MTAs, 13 of which were unmapped (Table 2). We observed a tight linkage among the MTAs located close to each other on the same chromosome. For example, 21 MTAs were detected on chromosome 5A, 18 of which were present at 48.1 to 52.1 cM, whereas three were located at 90.4 cM. Similarly, four MTAs were detected on chromosome 7A (three at 81.9 cM and one at 84.2 cM). Of the 13 unmapped markers, eight were mapped according to other consensus maps on chromosome 1A, 2B, 2D, 6A, 6D, and 7B. Interestingly, four of the unmapped MTAs formed a cluster; three of which mapped on chromosome 7B. The MTAs forming clusters (in LD) can be considered as a single QTL (Fig. 4). These QTLs (harboring multiple MTAs) were shown by one representative MTA [with the highest –log₁₀(P) value]. The allelewise distribution of phenotypic data in the representative MTA revealed the influence of minor (variant) alleles with respect to the allele frequency of the STB infection phenotype (Fig. 5). However, the minor alleles of the MTA on chromosomes 1A, 7A, and 7B, which exhibited improved STB resistance, were noteworthy.

Table 2 Associated markers with Septoria tritici blotch from the combined set of 35k and 90k marker arrays (genome-wide association study Scenario 3).

Marker	Chr‡	Pos‡	Chr(WA)†	Pos(WA)†	–log10(P)	p_G‡
wsnp_Ex_rep_c109742_92411838	1A	72.10	1A	81.54	3.03	0.87
IAAV3905	1B	41.30	1B	67.14	3.01	6.24
AX_94734086	2B	145.00	unm	unm	3.42	6.54
Ppd_D1	2D	unm	2D	unm	5.98	14.93
wsnp_JD_c27162_22206547	4A	120.40	4A	108.76	3.31	5.80
AX_94955360	5A	48.10	5A	5.50	3.95	2.55
IAAV8258	5A	48.10	5A	86.91	3.50	2.11
Excalibur_c11656_1760	5A	50.50	5A	88.03	4.06	2.95
Ex_c27046_3425	5A	50.50	5A	88.03	3.98	2.90
Ex_c898_1319	5A	50.50	5A	88.03	3.98	2.90
RAC875_c3046_1764	5A	50.50	5A	88.03	3.98	2.90
wsnp_Ex_c27046_36265198	5A	50.50	5A	88.03	3.98	2.90
BS00089795_51	5A	50.50	5A	88.03	3.84	2.75
wsnp_Ex_c17523_26244256	5A	50.50	5A	88.03	3.84	2.75
wsnp_Ex_c898_1738424	5A	50.50	5A	88.03	3.84	2.75
IAAV2473	5A	50.50	5A	88.03	3.84	2.75
wsnp_Ku_c40349_48594583	5A	50.50	5A	88.03	3.84	2.75
RFL_Contig3739_2135	5A	50.50	5A	88.03	3.84	2.76
Ex_c27046_1362	5A	50.50	5A	88.03	3.83	2.82
AX_95628896	5A	50.50	5A	7.19	3.38	2.31
AX_95073216	5A	52.10	5A	10.64	3.84	2.75
Tdurum_contig56267_180	5A	52.10	5A	88.17	3.55	2.10
BobWhite_c40643_370	5A	52.10	5A	88.17	3.38	2.31
AX_95681646	5A	90.40	5A	31.05	3.31	–0.28
BS00003088_51	5A	90.40	5A	114.51	3.19	–0.29
BS00022378_51	5A	90.40	5A	114.51	3.16	–0.29
Excalibur_rep_c106283_82	6D	101.00	6D	118.53	3.04	2.13
BS00052668_51	7A	81.90	7A	127.75	3.03	2.59
IACX1831	7A	81.90	7A	127.75	3.03	2.59
wsnp_CAP11_c651_429263	7A	81.90	7A	127.75	3.03	2.59
AX_94466147	7A	84.20	7A	27.37	3.03	2.59
BobWhite_c1890_712	unm	unm	1A	43.28	3.27	3.74
AX_94459800	unm	unm	2B	62.31	3.01	1.17
TA003482_1069	unm	unm	2D	103.33	3.09	2.35
BobWhite_c35035_317	unm	unm	6A	135.76	3.2	6.48
wsnp_Ex_rep_c110800_93013978	unm	unm	6D	121.75	3.37	5.98
BS00010616_51	unm	unm	7B	58.17	3.08	15.50
AX_95223861	unm	unm	7B	177.89	3.16	15.72
AX_94589755	unm	unm	7B	177.89	3.02	15.60
AX_94449265	unm	unm	unm	unm	3.21	12.00
AX_94441789	unm	unm	unm	unm	3.03	13.45
JD_c3930_358	unm	unm	unm	unm	4.31	6.17
BS00035576_51	unm	unm	unm	unm	3.05	6.64
AX_94797489	unm	unm	unm	unm	3.12	9.59

†

Chr_(WA) and Pos_(WA) = chromosome and cM positions based on Wang et al. (2014) and Allen et al. (2017).

‡

p_G, percentage of genotypic variance; Chr, Chromosome name; Pos, cM position; unm, unmarked.

View Image - Fig. 4. Pairwise linkage disequilibrium (r2) among significant markers in wheat. Markers in tight linkage are represented as quantitative trait loci (QTLs). The color key is given in the figure.

Fig. 4. Pairwise linkage disequilibrium (r2) among significant markers in wheat. Markers in tight linkage are represented as quantitative trait loci (QTLs). The color key is given in the figure.

View Image - Fig. 5. Allelic influence of the representative markers associated with the Septoria tritici blotch (STB) infection phenotype in wheat. Horizontal line marks the mean of best linear unbiased estimate (BLUE) values of STB infection. R and V denote the distribution of the reference and variant alleles (i.e., varieties harboring major and minor alleles, respectively).

Fig. 5. Allelic influence of the representative markers associated with the Septoria tritici blotch (STB) infection phenotype in wheat. Horizontal line marks the mean of best linear unbiased estimate (BLUE) values of STB infection. R and V denote the distribution of the reference and variant alleles (i.e., varieties harboring major and minor alleles, respectively).

Local LD Analysis Reveals the Presence of Putative Disease Resistance Genes

Our LD analysis between each representative MTA of individual QTLs and the genome-wide markers (including unmapped markers) yielded 500 markers (LD markers) at threshold of r² ≥ 0.20. The LD marker sequences BLASTed against the respective chromosomes of the wheat genome reference sequence (International Wheat Genome Sequencing Consortium, 2018) resulted in 186 unique gene identifiers (Supplemental Table S2). The functional descriptions of these gene identifiers included several disease resistance proteins. For example, the markers on chromosome 1A in at 77.70 cM were in LD with the representative MTA of QTLs on 1A (72.10 cM), all of which corresponded to TraesCS1A01G323600, which is described as a nucleotide-binding site leucine-rich repeat disease resistance protein. Similarly, the representative MTA of the QTL on 6D was also in LD with the unmapped markers corresponding to TraesCS6D01G365100, whose function is described as a disease resistance protein.

The Accuracy of GP of STB Infection

The mean values of the fivefold cross-validated prediction accuracies of STB infection produced similar results across both model scenarios (i.e., the GBLUP model that accounted for the main additive effects and RKHSR, which accounted for the epistatic interactions as well) (Fig. 6). A similar pattern was observed when the prediction accuracies were evaluated from individual marker platforms indicating that increasing marker density does not influence the prediction accuracy of STB infection (Fig. 6).

View Image - Fig. 6. Accuracy of genomic prediction (GP) of Septoria tritici blotch (STB) infection in wheat. Assessment of GP accuracy is based on two models: genomic best linear unbiased prediction (GBLUP) and reproducing kernel Hilbert space regression (RKHSR) evaluated through 1000 random fivefold cross-validation cycles. Symbols μ and σ denote the mean accuracy and SD.

Fig. 6. Accuracy of genomic prediction (GP) of Septoria tritici blotch (STB) infection in wheat. Assessment of GP accuracy is based on two models: genomic best linear unbiased prediction (GBLUP) and reproducing kernel Hilbert space regression (RKHSR) evaluated through 1000 random fivefold cross-validation cycles. Symbols μ and σ denote the mean accuracy and SD.

DISCUSSION Exploiting Large, Heritable Genetic Variation as Well as Morphological Escape Traits can Help to Mitigate STB Infection

Besides large genetic variation, we observed a significant genotype × environment (year) interaction, which is typical because disease pressures are uneven across years and because yearly effects are not predictable. Nevertheless, the high broad-sense heritability (0.78) suggested that the genetic variation is heritable. This is important because, coupled with large and significant genetic variation, high heritability promises a strong selection response. Similar heritability values have been reported in other recent biparental and diverse mapping populations and therefore seem to be common for STB infection (Dreisigacker et al., 2015; Gurung et al., 2014; Miedaner et al., 2012, 2013).

In addition to genetic variation, certain morphological traits have been proposed as escape mechanisms providing resistance to STB infection. For example, since STB infection is more detrimental to yield when it occurs on the flag and first leaves, which provide photosynthetic assimilates at grain filling stages, taller and later heading varieties help to reduce the spread of STB spores to the upper leaves (Arraiano et al., 2009; Simón et al., 2004). In our study, we observed a significant reduction of STB infection in the varieties harboring Ppd-D1, the photoperiod-sensitive and late flowering allele of the photoperiodism gene on chromosome 2D. Moreover, in previous studies based on multienvironment data on the same panel (Zanke et al., 2014a, 2014b), we observed the significant negative correlation of STB infection with plant height (–0.40, P < 0.001) and heading date (–0.50, P < 0.001). Although escape traits seem to be beneficial, the agronomic and yield penalties often associated with these suggest that the efficient exploitation of genetics should provide better means for durable STB resistance.

Marker Density, LD, and Correction of Population Stratification Govern the Efficacy of Marker–STB Associations

The detection of MTAs and the resolution of GWAS are functions of the population and the variation (phenotypic and genotypic) within that population, such as high density genome-wide polymorphic markers and the sharp decline of LD between adjacent marker pairs (Bernardo, 2010; Mackay, 2001). The proximity of the marker to the gene of interest strengthens its association and thus its ability to be useful for marker-assisted selection in breeding programs. Therefore, the efficacy of marker-assisted selection hinges on the usefulness of the MTAs. Spurious MTAs, which can severely impact the gains from marker-assisted selection, can be avoided by controlling the population stratification (Yu et al., 2006). In our analysis, we observed the absence of a sharp distinction into subpopulations at the genetic level, which indicates that European wheat varieties are or have been bred, by and large, with similar goals. It should be noted, however, that many breeding programs have similar goals but do not necessarily result in a low number of subpopulations at the genetic level. The lack of genetic segregation could result from a narrow genetic base, in combination with selection in a target population of environments characterized by low levels of genotype × environment interaction. Genetic relatedness can also be attributed to the easier intra-Europe transfer of varieties (i.e., varieties bred in one country may be registered or released in another country). This is in line with other reports on elite European wheat varieties, where the absence of pronounced clustering was observed for different marker genotypes (Kollers et al., 2013; Würschum et al., 2013). Nevertheless, we observed that controlling the population stratification by assuming a variance–covariance relationship among the varieties sufficiently controls the false positives and that additional covariables do not increase the power of the MTA detection.

Examining the extent of LD is important in GWAS, as this approach entails the principle that the phenotype-controlling gene and the markers in LD with it are coinherited. Our analysis of LD showed that it decayed within ∼1 cM, which is smaller than in previous reports.

Marker-Assisted Selection for STB Resistance

We analyzed all the data in three GWAS scenarios representing different marker densities (using markers from 35k and 90k arrays separately and in combination). In any scenario, our GWAS did not capture the loci explaining the full variance, which is typical of highly quantitative disease traits (Gurung et al., 2014; Kollers et al., 2013; Miedaner et al., 2012, 2013; Mirdita et al., 2015). The remaining unexplained variance can be ascribed to the very complex nature of STB infection and therefore to the loci that did not meet the significance criteria. Nevertheless, we noticed enhanced QTL detection and genotypic variance (∼20%) by combining both marker arrays (35k and 90k), which can be attributed to the obvious reason of improved marker density.

In our study, loci contributing p_G > 5% can be of relevance in breeding programs. The comparison of QTL positions with those in previous studies was not possible largely because of the different populations and marker systems used. Some QTLs, however, possibly coincide with the mapping location of known major genes. For example, the QTL (p_G > 5%) on chromosome 1B may represent Stb2, Stb11, or both (Chartrain et al., 2005b; Liu et al., 2013); the QTL on 2BL may be Stb9 (Chartrain et al., 2009); and the QTL on 4AL may represent Stb12 (Chartrain et al., 2005a). In the original study by Kollers et al. (2013), in which simple sequence repeat markers were used, several QTLs were discovered, including the genes Stb1 on chromosome 5B (Adhikari et al., 2004c), Stb3 on chromosome 6D (Adhikari et al., 2004b), Stb4 and Stb5 on chromosome 7D (Adhikari et al., 2004a; Arraiano et al., 2001), Stb6 on chromosome 3A (Brading et al., 2002), and Stb10 on chromosome 1D (Chartrain et al., 2005a), even though the marker density was much lower. The possible reasons could be that the simple sequence repeat markers are multiallelic, in contrast to the biallelic SNP markers, and that a less stringent GWAS model was applied in the previous study. In this study, markers that were not mapped to the International Triticeae Mapping Initiative map but mapped in other consensus maps also harbored important QTLs. For example, a QTL on chromosome 7BL (p_G = 15.72%, mapped at 177.89 cM) may represent Stb8 (Adhikari et al., 2003). Of note are the QTLs on chromosomes 6AL and 6DL, which imparted > 5% genotypic variance. To our knowledge, none of the known major genes have been mapped in these regions of the chromosomes and therefore these QTLs can be regarded as novel, suggesting the use of high-density marker arrays to capture loci that were otherwise not detected. Four of the significant markers were not assigned to any chromosomes in any published map. The allelic distribution of the STB infection phenotype in the representative MTA of QTLs is noteworthy and may benefit breeding programs. In our analysis, we observed that most of the European wheat varieties already harbor the alleles for resistance (shown in Fig. 5) and very few resistance loci related to the allele with the MAF in the population exist. Nevertheless, the resistance loci exert moderate effects and can be used in positive marker-assisted selection. On the other hand, STB resistance can be efficiently improved by negative marker-assisted selection on the infection-promoting alleles.

Quantitative trait locus demarcation based on local LD decay is an approach to narrowing down to the putative candidate. Significant markers (MTAs) act mostly as proxies (by virtue of being in LD) for the causal genes. Estimating the LD between the representative-MTA and the remaining genome-wide markers and anchoring the markers onto the reference sequence of wheat helps us to study the corresponding genes and their annotated function. By following this approach, we observed several markers to be in tight linkage with the MTAs. The annotated function of LD markers suggested their possible involvement in disease resistance.

Epistatic Interactions and the Marker Platform do not Hamper the Accuracy of GP of STB Infection in European Winter Wheat

Improving STB resistance via major genes is a short-term measure because some genes are responsive to only one or a few races of the pathogen and therefore have no broad-environment application. The costs associated with the introgression of major genes or QTLs into elite backgrounds may inadvertently affect the breeding operations and accelerate the evolution of the pathogen. Instead of concentrating on a few large-effect QTL, GP of STB infection based on both small- and large-effect markers promises a holistic genetic-based approach for broad-spectrum resistance to avoid the growth and spread of STB infection (Heffner et al., 2009; Meuwissen et al., 2001; Mirdita et al., 2015). We evaluated the prediction accuracies of the genetic values of STB infection by modeling only the main effects and main and epistatic effects. The mean prediction accuracy across 1000 iterations in a fivefold cross-validation scenario amounted to ∼0.43 and did not significantly change across the model scenario. Although epistatic interactions have been proposed to be pervasive in self-pollinating species like wheat and barley (Hordeum vulgare L.)(Heslot et al., 2012), we observed only a slight increase in the prediction accuracy by modeling epistatic interactions. This is in contrast to other findings, where modeling epistasis to predict STB in European wheat outperformed the model focusing exclusively on main effects (Miedaner et al., 2013; Mirdita et al., 2015).

Since the genotyping of the whole set of 371 varieties was performed with two state-of-the-art SNP arrays (35k and 90k), their capacity to predict the genetic value of STB infection was evaluated individually and together on the basis of the hypothesis that increasing marker density improves prediction accuracy. However, increasing the marker density did not result in any significant increase in GP accuracy. This finding is in line with previous reports where GP accuracy was not influenced above a certain number of markers (Jiang et al., 2015), underlining that both marker platforms are equally efficient for predicting STB infection. In practical breeding, nevertheless, the usefulness of GP may be hampered by shifts in the virulence spectrum of the pathogen in different years.

Supplemental Information Available

Supplemental Fig. S1: Correlation of STB infection among leaves and time-points of inoculation.

Supplemental Fig. S2: Principal component analysis.

Supplemental Fig. S3: Population structure.

Supplemental Fig. S4: Linkage disequilibrium decay over genetic distance.

Supplemental Table S1: Significant markers across marker arrays.

Supplemental Table S2: Representative-marker-trait associations and the LD markers.

Conflict of Interest Disclosure

JP and MWG are employed by the company TraitGenetics GmbH. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Authors' Contributions

QHM performed the data analyses and prepared the manuscript. YZ participated in the data analysis. BR participated in phenotypic data collection. JP and MWG contributed the genotypic data. QHM and MSR conceived the idea. All authors read and approved the final manuscript.

Acknowledgments

QHM thanks Deutscher Akademischer Austauschdienst for supporting his PhD candidature. The data for this research were generated in the projects GABI-Wheat and VALID (project numbers 0315067 and 0315947) funded by the German Federal Ministry of Education and Research. The authors gratefully acknowledge two anonymous reviewers whose comments helped to improve this manuscript.

Word count: 6895

Show less

© 2019. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Septoria tritici blotch (STB) caused by the fungus Zymoseptoria tritici is a devastating foliar disease of wheat (Triticum aestivum L.) that can lead to substantial yield losses. Quantitative genetic resistance has been proposed as a durable strategy for STB control. In this study, we dissected the genetic basis of STB infection in 371 European wheat varieties based on 35k and 90k single nucleotide polymorphism marker arrays. The phenotypic data analyses suggested that large genetic variance exists for STB infection with a broad‐sense heritability of 0.78. Genome‐wide association studies (GWAS) propose the highly quantitative nature of STB infection with potential associations on chromosomes 1A, 1B, 2D, 4A, 5A, 6A, 6D, 7A, and 7B. Increased marker density in GWAS by combining markers from both arrays helped to detect additional markers explaining increased genotypic variance. Linkage disequilibrium analyses revealed genes with a possible role in disease resistance. The potential of genomic prediction (GP) assessed via two models accounting for additive effects and additive plus epistatic interactions among the loci suggested the possibility of genomic selection for improved STB resistance. Genomic prediction results also indicated that the higher‐order epistatic interactions are not abundant and that both marker platforms are equally suitable for GP of STB infection. Our results provide further understanding of the quantitative genetic nature of STB infection, serve as a resource for marker‐assisted breeding, and highlight the potential of genomic selection for improved STB resistance.

Details

Title

Genome‐wide Association Mapping and Prediction of Adult Stage Septoria tritici Blotch Infection in European Winter Wheat via High‐Density Marker Arrays

Author

Muqaddasi, Quddoos H¹; Zhao, Yusheng¹; Rodemann, Bernd²; Plieske, Jörg³; Ganal, Martin W³; Röder, Marion S¹

¹ Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Germany
² Julius Kühn Institute (JKI), Braunschweig, Germany
³ TraitGenetics GmbH, OT Gatersleben, Germany

Section

Original Research

Publication year

2019

Publication date

Mar 2019

Publisher

John Wiley & Sons, Inc.

ISSN

19403372

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3835/plantgenome2018.05.0029

ProQuest document ID

2664993278

Genome‐wide Association Mapping and Prediction of Adult Stage Septoria tritici Blotch Infection in European Winter Wheat via High‐Density Marker Arrays

Jump to:

Full text

Abstract

Details

Suggested sources