Metabolome-based genome-wide association study of

Full text

Turn on search term navigation

ARTICLE

Received 4 Oct 2013 | Accepted 12 Feb 2014 | Published 17 Mar 2014

Weiwei Wen1,*, Dong Li1,*, Xiang Li1, Yanqiang Gao1, Wenqiang Li1, Huihui Li2, Jie Liu1, Haijun Liu1, Wei Chen1, Jie Luo1 & Jianbing Yan1

genome-wide association mapping. Most (58.5%) of the identied loci are supported by expression QTLs, and some (14.7%) are validated through linkage mapping. Re-sequencing and candidate gene association analysis identies potential causal variants for ve candidate genes involved in metabolic traits. Two of these genes were further validated by mutant and transgenic analysis. Metabolite features associated with kernel weight could be used as biomarkers to facilitate genetic improvement of maize.

DOI: 10.1038/ncomms4438 OPEN

Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights

1 National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China. 2 Institute of Crop Science, CIMMYT China Ofce, Chinese Academy of Agricultural Sciences, Beijing 100081, China. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to J.Y. (email: mailto:[email protected]

Web End [email protected] ) or to J.L. (email: mailto:[email protected]

Web End [email protected] ).

NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 1

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438

Plants produce numerous structurally diverse metabolites, which play essential roles in growth, cellular replenishment and whole-plant resource allocation as well as roles in plant

development and stress responses. In addition, they provide essential resources for human nutrition, bioenergy, medicine, avourings, and so on1. Understanding plant biochemistry is thus of fundamental importance for sustainable agriculture and resource conservation, especially under changing climate conditions2. Metabolomics, which enables comprehensive high-throughput quantication of a broad range of metabolites, is invaluable for both phenotyping and diagnostic studies in humans, animals and plants35. The importance of maize as one of the critical crops for food and feed worldwide makes a comprehensive metabolomic study of this species imperative68. Moreover, maize manifests exceptional genome and phenotypic diversity among its inbred lines9,10, making it an attractive model organism for crop genetics.

As the perceived end product of cellular regulatory and metabolic processes, the metabolite spectrum and quantities making up the metabolic complement may be viewed as the metabolic phenotype or metabotype11,12. As the metabolic phenotype provides a link between gene sequence and visible phenotypes, metabolites can be used as biomarkers for trait prediction6,13. In humans, genome-wide association studies (GWAS) are beginning to unravel the genetics of metabolic traits and demonstrate their utility in biomedical and pharmaceutical research14. Numerous studies on plant primary and secondary metabolites have been carried out that allowed the detection of hundreds of QTLs in Arabidopsis, rice and tomato1518. Recently, the rst metabolite-based association study in maize demonstrated the utility of this approach in genetic improvement7. However, the understanding of the genetic and molecular basis of natural variation in plant metabolomes, including those of maize, is still limited to relatively small population size and a few selected biochemical pathways7,1518. Further, despite of the respective advantages of GWAS, it is logical to borrow the strengths from linkage analysis to complement GWAS in order to validate and identify causal polymorphisms19.

The rapid development of RNA sequencing and metabolomic technologies coupled with SNP data has enabled eQTL and mQTL mapping at a large scale that help us to understand the ow of biology information underlying complex traits in the systems genetics level20. Combing GWAS and transcript data can allow the rapid identication of a large number of novel genes and potential networks that affect specic metabolism, as suggested by a previous study in Arabidopsis thaliana21. Recently, we obtained expression data of 28,769 genes and B1 million high-quality SNPs by deep RNA sequencing of the immature seeds of 368 diverse maize inbreds22. A pilot GWAS for oil concentration and composition in maize kernels has identied 74 loci associated with target traits that explain the majority of the observed phenotypic variation23.

Here we analyse 184 metabolites with associated chemical structures and additional 799 unknown metabolite features, using 368 diverse maize inbreds, SNP and gene expression information as mentioned above, along with two recombinant inbred (RIL) populations. We reveal a relatively simple genetic architecture for most metabotype compositions using GWAS and linkage mapping analysis. GWAS manifests strong power to dissect metabolite traits and its ndings can be validated using linkage and eQTL analysis, as well as functional validation through molecular experiments. We nd novel metabolites and genes, constituting key processes in the formation of phenolamides (PAs) and avonoids. Combining genetics, metabolomics and expression proles signicantly improves our knowledge of both the functional genomics and metabolism of maize and provides a powerful tool for crop improvement.

ResultsMetabolite proling. Using high-throughput LC-MS/MS analysis, we detected and quantied 983 distinct metabolite features from mature kernel extracts of the association panel (368 inbred lines) harvested at three locations in China. Most of them (793/983) were also detected in the two RIL populations (334 lines), as well as overlapped metabolite features found in replications of the association panel (Supplementary Fig. 1). Chemical structures of 184 metabolites were identied or annotated (Supplementary Data 1 and 2).

The level of each metabolite feature varied widely across the lines in both the association panel and linkage populations. For the intensity of a majority of metabolite features (483% in the association panel, 466% in the linkage population), 410-fold change difference measured in each experiment was observed (Supplementary Fig. 2), indicating high natural variability.Greater phenotypic diversity for these metabolite features was observed among the lines of the association panel than within both linkage populations (Supplementary Fig. 2).

In the association panel, 725 metabolite features were detected in two or three environments, 71.7% of which were observed with a repeatability of 40.5, and 48.3% with a repeatability of 40.7 (Supplementary Fig. 3). The level of repeatability indicated a precise phenotyping of metabolic level measurement and a signicant genetic contribution to the determination of metabolic content within the association panel and the three experiments.GWAS was performed for each experiment independently as the level of replication within each experiment was not sufcient to directly test the genotype by experiment interaction term.

Genetic basis of maize metabolome. In GWAS, 445% of the metabolite features in each of the three environments had at least one associated locus at a genome-wide signicance level of Pr1.8 10 6 (calculated by mixed linear model controlling Q

and K (MLM); N 335339). In total, 1,459 distinct locustrait

associations were identied across three environments (Table 1;

Table 1 | Summary of signicant locitrait associations identied by GWAS and QTL identied by linkage mapping.

E1* E2 E3 BB ZY Number of traitsw 258/548 347/748 332/735 550/725 447/736

Number of lociz 484 655 583 1152 724

Average loci per traity 1.92.0* 1.91.7 1.81.7 2.11.1 1.60.8

*E1, E2 and E3 represent the three experiments conducted on the association panel; BB, linkage population B73/By804 RIL; ZY, linkage populationZong3/Yu87-1 RIL. wNumber of traits having signicantly associated loci or QTL (before slash), number of total detected traits(after slash).

zNumber of signicant loci detected in each experiment on the association panel (Pr1.8 10

6, MLM) and QTL detected in each RIL population (LODZ3.0).

yAverage number of signicant loci (or QTL) detected per traits.d.

2 NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438 ARTICLE

Supplementary Data 3). Each locus explained 5.749.1% of the observed metabolic variance, with a median of 7.8%. Among 725 metabolite features that were detected in more than one environment, we found a total of 1,256 signicant loci associated with 501 of them (Pr1.8 10 6). Of the 1,256 associations, 210

(16.7%) were consistently identied in two or three environments at Pr1.8 10 6 (Supplementary Data 3). Additionally, with

relaxed cutoff values of Pr1.8 10 6 in one environment and

Pr1.0 10 4 in at least one of the other two environments, the

proportion of signicant locustrait associations that are found in at least two environments increased to 50.2% (Supplementary Data 3).

Linkage mapping in the BB RIL population identied 1,152 QTLs for 550 metabolite features, which accounted for 75.9% of all metabolic traits detected in this population. For the ZY RIL population, 60.7% traits (447 of 736) had at least one QTL (Table 1). Each QTL explained 3.380.4% (in the BB RIL) and5.365.6% (in the ZY RIL) of the phenotypic variance (Supplementary Data 4). Of the signicant GWAS loci identied in three environments, 14.7% overlapped with the QTLs identied in at least one of the two RIL populations (Supplementary Data 3 and 4).

In the present study, 1,197 unique candidate genes corresponding to 1,459 signicant locustrait associations identied across three environments were annotated (Supplementary Data 3; only the nearest candidate was reported here, but for the metabolites with identied or annotated structure, genes within a 100-kb anking region of the lead SNPs were also provided in Supplementary Data 5). Cis expression QTLs (eQTL, Pr1.8

10 6, MLM, N 368) were identied for the majority of these

candidate genes (58.5% or 700/1,197), which were from 946 signicant locustrait associations identied across three environments. Within these 946 locustrait associations, signicant correlation (Pr0.01, t approximation, N 335339) between the

expression level of the candidate genes and the phenotypic variation of the target metabolic traits were found in 238 cases(25.2%), which implied that at least some of these candidate genes affect the phenotypic variation via transcriptional regulation. Functions of 24% of these genes are currently unknown based on the available database. Catalytic enzymes, regulators and

transporters were involved in the metabolite content control (Supplementary Fig. 4).

Biochemical and functional interpretation of GWAS results.

The utility of a metabolic phenotype is enhanced by the rich knowledge base of many metabolic pathways and the ability to corroborate candidate associations with biological and functional arguments12,24. In addition, using GWAS on these metabolic phenotypes, we were able to verify and have the chance to update the annotation of many maize genes. Correlating gene annotation and the biochemical characteristics of the associated metabolite frequently allows selection of a single most likely causative gene.

The association between caffeoyl CoA 3-O-methyltransferase 1(CCoAOMT1)25,26 and caffeic acid, dicaffeoylspermidine and several other metabolites is one example of easily pinpointing the most likely causative gene (Table 2 and Supplementary Data 3).GWAS associations with N, N-Di-feruloylputrescine and Apigenin di-C-pentoside provided us the opportunity to potentially update functional annotation of their causal genes (Fig. 1; Table 2 and Supplementary Data 3). On the other hand, the annotation of candidate genes provides useful clues to the biochemical nature of the associated metabolites. Locus TDC1 (tryptophan decarboxylase, located on chromosome 10 at 82851072bp), which was signicantly associated with 16 meta-bolites (P 7.21 10 181.54 10 6, MLM, N 335339),

contains two highly homologous genes (GRMZM2G021277 and GRMZM2G021388, 89% and 88% DNA and amino-acid homology, respectively). Their annotated function in Hordeumvulgare is tryptophan decarboxylase (IPR010977, query coverage 99% and max identity 86%).We predicted that some of these 16 metabolites are tryptophan derivatives based on tryptamine (3-(2-Aminoethyl) indole hydrochloride) standard (Table 2 and Supplementary Data 3). Likewise, GWAS result of metabolite n0769 and functional annotation of the candidate gene (steroleosin, STE) suggested the structure of n0769 (Table 2 and Supplementary Data 3). STE is a sterol-binding dehydrogenase in seed oil bodies27. Indeed, n0769 could be fatty acid moiety suggested by its mass spectrum fragmentation pattern, even though the complete structure remains to be determined.

Table 2 | Validation of candidate genes through various methodologies and associated information.

Gene Lead trait Marker* Sitew Allele Frequency Location Amino-acid change

P-valuez Ny eQTL(P) Mu/

transgenic

PHT DFP(DiFer-Put) (n0381-1)

InDel_17/ 15/0

Chr.1_140321926 17/ 15/0

159/76/ 16

Promoter No 2.27 10 13 251 3.42 10 5 Yes

SNPT/A Chr.1_140321605 T/A 106/144 Exon Leu to Gln

1.26 10 11 250 0.26

CCoAOMT1 n1544-1 InDel_28 Chr.6_79193717 0/28 223/92 5UTR No 1.99 10 22 351 0.06 Yes

STE n0769 InDel_1 Chr.2_68601038 0/1 310/16 Exon Frame shift

3.80 10 8 326 0.02

InDel_878/ 185

Chr.2_68602648 878/ 185

176/87 Promoter No 0.13

(0.003)||

263 3.65 10 13

UGT Apigenin di-C-pentoside

(n1201)

SNPA/C Chr.6_120019623 A/C 138/178 Exon Asp to

Ala

1.08 10 9 316 0.08

TDC1 Tryptamine

(n0671)

InDel_602 Chr.10_82851072 602/0 235/74 Promoter No 4.80 10 14 309 N.A.

*Candidate functional polymorphisms.wPosition for the SNP and indel markers according to version 5b.60 of the B73 reference sequence (MaizeSequence, http://www.maizesequence.org/).

zCalculated by using mixed linear model controlling Q and K (MLM). yNumber of samples used in statistical analysis.

||The P 0.003 in parenthesis is calculated by ANOVA.

NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 3

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438

Functional validation of candidate genes. To further validate GWAS ndings and investigate functional variations in the candidate sequences, we tested ve representative genes, PHT (putrescinehydroxycinnamoyltransferase), CCoAOMT1, STE, UGT (UDP glycosyltransferase) and TDC1, using multiple molecular approaches. These included re-sequencing PCR products that encompassed the genetically associated polymorphisms in the relevant inbred lines, eQTL analysis, linkage analysis, mutant analysis and/or transgenic expression.

The PHT locus (GRMZM2G030436) showed the highest signicance (P 2.57 10 15, MLM, N 339, Supplementary

Data 3) in the association with the content of compound N, N-Diferuloylputrescine (DFP) in maize28. We further veried its function by transgenic analysis in rice (Table 2 and Fig. 1). Over-expression of PHT in rice resulted in the accumulation of DFP in the leaf tissue in which it is normally absent, which strongly suggest the involvement of PHT in the biosynthesis of DFP (Supplementary Fig. 5). The functional annotation of PHT was thus updated from transferase (IPR003480) to a putative putrescinehydroxycinnamoyltransferase. Re-sequencing uncovered a 0/15/17bp InDel polymorphism in the promoter region, which is the most likely responsible for the natural variation of DFP content, as well as the expression difference of PHT. Re-sequencing also identied ve polymorphisms in the rst exon that were signicantly associated with the target traits (P 4.34 10 141.26 10 11, MLM, N 230320) and in

modest-to-high LD with the InDel identied in the promoter region (r2 0.480.81). One of the polymorphisms caused the

deletion of an amino acid, and three resulted in amino-acid replacements (Supplementary Table 1 and 2). Taken together, genetic variants within promoter and exon regions might contribute to the functional variation of PHT (Fig. 1).

CCoAOMT1 (GRMZM2G127948) encodes a Caffeoyl-CoA O-methyltransferase. It inuences the content of several metabolites, and its function was validated by examining both CCoAOMT1 knockout maize lines and transgenic rice lines (Table 2 and Supplementary Fig. 6). A monoacylatedagmatine, putrescine and an unknown spermidine derivative (S11) were signicantly upregulated in the CCoAOMT1 knockout lines (Supplementary Fig. 5). The association between its allelic variations and the metabolite n1544-1 (spermidine derivative S11) was supported by metabolite QTL mapping in both BB and ZY RIL populations. After re-sequencing the association panel, a strong association signal was detected with a 28 bp InDel in the 50 untranslated region (UTR) of CCoAOMT1 (P 1.99 10 22,

MLM, N 315, Supplementary Table 1 and 2 and Supplementary

Fig. 6). At the site of this InDel, the parents of the ZY RIL population, but not of the BB RIL population are polymorphic, suggesting that the 28-bp InDel might not be causative, or not the only causative sequence change. The negative correlation between metabolite content and gene expression level suggested that transcriptional regulation may cause the phenotype, although the 28-bp InDel is only marginally correlated (P 0.06, t approxima

tion, N 315) with gene expression (Supplementary Table 1 and

Supplementary Fig. 6).

In STE (GRMZM2G108338), re-sequencing revealed a 1-bp InDel in the coding region, causing a frame shift that was signicantly associated with the content of n0769 (P 3.8

10 8, MLM, N 326, Supplementary Table 1 and 2 and

Supplementary Fig. 7). We also found a signicant difference between the expression levels of the two alleles at this InDel (P 0.02, t-test, N 326, Supplementary Fig. 7). In addition, a

strong cis eQTL was detected for STE (P 1.4 10 18, MLM,

N 368, Supplementary Fig. 7), and the expression level of this

gene was positively correlated with the quantity of n0769 measured (r 0.21, P 7.8 10 4, N 339; Table 2 and

Supplementary Fig. 7). Re-sequencing the promoter region of STE indicated that another potentially causative polymorphism (an 878/185 bp InDel located 370 bp upstream of STE) was strongly associated with the STE expression level (P 3.7

10 13, ANOVA, N 263, Supplementary Table 1 and

Supplementary Fig. 7) and slightly associated with the phenotypic trait (P 0.003, AVONA, N 263, Supplementary Fig. 7). Low

LD (r2 0.02) was observed between the two polymorphisms. We

thus postulate that the two InDels are both associated with phenotypic and expression variation to different extents.

UGT (GRMZM2G383404), annotated as UDP-glycosyltransferase, was associated with the natural variation of a avonoid putatively named Apigenin di-C-pentoside. Despite the fact that it is homologous to rice gene anthocyanin-3-O-glycosyltransferase, the protein sequence similarity of UGT to rice avone-6-C-glucosyltransferase (Os06g18010) is higher than the anthocyanin-3-O-glucosyltransferase gene in maize (also known asBz1GRMZM2G165390; Supplementary Fig. 8)29,30. Strongly associated SNPs were identied in UGT by re-sequencing (Supplementary Table 1 and 2). Eight signicant SNPs found in the exon region were located in a LD block. Six of these eight SNPs cause substitution of amino acids and one SNP (A/C, SYN13426; P 1.1 10 9, MLM, N 316) results in amino-

acid polarity change (Asp to Ala). This and other SNPs in the LD block are likely to constitute the functional variation; however, it is difcult to exclude other variants surrounding this region (Supplementary Table 1 and Supplementary Fig. 9).

A 602-bp InDel in the promoter region of TDC1 (GRMZM2G021277), identied by re-sequencing, is signicantly associated with tryptamine content (P 4.8 10 14, MLM,

N 309; Supplementary Table 2). TDC1 was located within the

QTL region mapped in the ZY RIL population and the 602-bp InDel was segregating in the parents. Although expression level of this gene in 60 tissues in maize is extremely low31 and was not detected in our RNA-sequencing study22, this large InDel may affect the gene expression and, consequently, the phenotype (Supplementary Table 1 and Supplementary Fig. 10).

New genes in phenolamide and avonoid biosynthesis pathways. PAs, which are frequently referred to as hydroxycinnamic acid amides and phenylamides, have been identied in many plant species. PAs participate in many physiological and developmental processes3234, related to defence against abiotic (temperature, drought and salt, and UV) and biotic (pathogen and insects)3337 stresses in plants. One of the major secondary metabolite groups in plants, avonoids, is widely distributed and has a variety of functions38. Combining metabolomics analysis and GWAS, we found novel metabolites and genes constituting key processes in the formation of PAs and avonoids, which had not been previously characterized in maize.

In the biosynthesis of phenolamides, N-hydrocinnamoyltransferasesthatuse aliphatic amines as acyl acceptors and hydro-xycinnamoyl-CoA as a donor were considered the key enzymes. Some were identied in plants, such as ACT (Agmatinecoumaroyltransferase) in barley39, SDT (spermidinesinapoyl CoA acyltransferase) and SCT (spermidinecoumaroyl CoA acyltransferase) in Arabidopsis32 and AT1 (acyltransferase1), DH29 (acyltransferase DH29) and CV86 (acyltransferase CV86) in tobacco40. In addition, the conjugates can be further modied via species-specic hydroxylation, methylation, cyclization and coupling reactions41. In Arabidopsis, AtTSM1, which encodes a CCoAOMT-like protein, was proven to have methylation activity in the biosysnthesis of phenolamides42.

In this study, we quantied 27 phenolamides. GWAS indicated that locus PHT (GRMZM2G030436) was highly associated

4 NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438 ARTICLE

InDel_17/15/0 Chr1.S_140321605(Leu to Gln)

10.80.60.40.2 0

r 2

DFP

n=159/76/16

n=168/87/18

a e

P=9.881022

P=5.95106

177.24 TGA

ATG

R 2=32%

R 2=9%

265.20

DFP

145.20

Intensity (%)

100

Expression level

248.16

117.0089.04

72.24

291.24 206.04

441.12

0 0 100 200 300 400

m/z

17 bp 15 bp 0 bp

InDel_17/15/0

1.5

c g

1.5

DFP

12 GRMZM2G030436

P = 5.121020

P=0.87

R 2 = 29%

Log 10P

DFP

Expression level

0 1 2 3 4 5 6 7 8

InDel_17/15/0

Chr1.S_140321605

9 10

Chr.

0 1 2

3 4 5 6 7 8 9

10 Chr.

A(Gln)

T(Leu)

Chr1.S_140321605

DFP

n =144/106

n=152/120

h k

(106)

0 120 125 130 135 140 145PHT Phy_pos.(Mb)

GRMZM2G030436

InDel_17/15/0

r = 0.316

P = 3.18107

150

17 bp n =159

15 bp n =76

0 bp n =16

100

DFP

Log 10P

Expression level

100

(1)

12 15 18 21

1 3

2 4 5 6 7 8 9

DFP

Transgenic individual

Figure 1 | Casual sites identication and functional validation of putrescinehydroxycinnamoyltransferase. (a) Structure of N, N-Di-feruloylputrescine (DFP or DiFer-Put) in the polyamine pathway. (b) LC/MS fragmentation of DFP. Possible structures of the major fragments are shown. (c) Manhattan plot displaying the GWAS result of the content of DFP (MLM, N 339). (d) Regional association plot for locus PHT. The signicant sites identied by

re-sequencing PHT (GRMZM2G030436), show in red (MLM, N 230B320). The bigger red points show the putative functional polymorphisms, an

insertion/deletion at the site InDel_17/15/0 and a SNP at Chr1.S_140321605. (e) Gene model of PHT. Filled blue boxes represent exons and UTRs.

The dashed boxes mark the re-sequenced region, and the stars represent the signicant sites identied by re-sequencing, the bigger stars represent the proposed functional sites. (f) A representation of the pair-wise r2 value among all polymorphic sites in PHT, where the colour of each box corresponds to the r2 value according to the legend. (g) Manhattan plot shows the association between expression level of PHTand genome-wide SNPs. Signicant signals are mapped to SNPs within PHT, indicating a cis transcriptional regulation of this gene (MLM, N 368). The presence of the proposed functional site,

InDel_17/15/0, is associated with both the expression level and the content of DFP (h,i), implying that the changed expression level is partially responsible for the change in DFP content. (h) Plot of correlation between the content of DFP and the normalized expression level of the PHT gene. Maize inbred lines with different genotypes at the InDel_17/15/0 site were shown in red, sky blue and midnight blue, respectively. The r value is based on a Pearson correlation coefcient. The P value is calculated using the t approximation. (i) Box plot for DFP content (red) and expression of PHT (sky blue); plotted as a function of genotypes at the site InDel_17/15/0. (j) Box plot for DFP content (red) and expression of PHT (sky blue), plotted as a function of genotypes at the site Chr1.S_140321605. Horizontal line represents the mean and vertical lines mark the range from 5th and 95th percentile of the total data (i,j), respectively. (k) Bar plot for DFP content and PHTexpression level in rice transgenic individuals (T0). The content of DFP and expression level of PHT in the leaves of each transgenic individual is shown in red and sky blue, respectively. Vertical lines represent the s.e. (N 3).

with the metabolite diferuloylputrescine (P6), while CCoAOMT1 (GRMZM2G127948) was responsible for the content of N-(caffeoyl-O-hexoside)-spermidine (S8) and two of N,N-caffeoyl,feruloyl-spermidine derivatives (S10 and 11) (Supplementary Data 3 and Supplementary Fig. 5). PHT has 38% amino acid identity with previously identied AT1 from

Nicotianaattenuata40 and CCoAOMT1 shows 81% identity with CCoAOMT1 from Arabidopsis thaliana42. The in vivo function of PHT and CCoAOMT1 were validated in this study as described above. In the rice PHT over-expression lines, both agmatine- and putrescine-associated conjugates were signicantly upregulated (Supplementary Fig. 5). Interestingly, no change of the

NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 5

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438

monoacylated spermidine was observed while some of the diacylated spermidines were downregulated in the PHT-over-expressing lines (Supplementary Fig. 5). In addition, some monoacylated agmatine and putrescine were signicantly upregulated in the CCoAOMT1 knockout lines (Supplementary Fig. 5), which further conrmed its biochemical function in vivo. Unlike the agmatineacyl transferase (ACT)39 and putrescineacyltransferase (AT1)40 reported previously, the PHT in our study seems to have a broader substrate specicity and can recognize both agmatine and putrescine, which has similar function with AtACT in Arabidopsis thaliana43. The lack of hydroxycinnamoyl-CoA may result in the downregulation of diacylatedspermidine in PHT-overexpressing lines. Therefore, we conclude that the PHT is likely an acyltransferase that can act on both agmatine and putrescine in maize. Furthermore, the latter modication of PAs in maize was also conrmed in CCoAOMT1 knockout lines (Supplementary Fig. 5). Further, based on the results inferred by the metabolic proling of our over-expressing rice lines and knockout maize lines, the biosynthetic pathway of phenolamides was reconstructed (Fig. 2).

Flavonol derivatives are highly enriched in mature maize kernels; avanone and anthocyanin derivatives were also identied in our study. Based on the natural variation of these compounds, genomic loci responsible for the abundance of these avonoids were identied (Fig. 3). Genes involved in the

regulation, as well as the biosynthesis, of individual steps in the avonoid biosynthetic pathways were among these loci (Fig. 3 and Supplementary Data 3). Notably, a known locus P1, which encodes a R2R3-MYB transcription factor44, was responsible for natural variation of 20 avonoids found in this study (Supplementary Data 3). More interestingly, a considerable number of loci identied in this study as associated with avonoids were direct targets of (or regulated by) P1, as illustrated by Morohashi et al.45 Functional annotation of these loci, including ABCT (ABC transporter; GRMZM2G018074), GRD2 (Glucose/ribitol dehydrogenase; GRMZM2G170013), HCT (hydroxycinnamoyl-CoA shikimate/quinatehydroxycinnamoyltransferase; GRMZM2G156816), UGT (UDP-glycosyltransferase; GRMZM2G383404), HLY (hemolysin-III homologue; GRMZM2G114650), UGT88A1 (UDP-glycosyltransferase 88A1; GRMZM2G122072) and SAMDC (S-adenosylmethionine decarboxylase proenzyme Precursor; GRMZM2G154397), provided important clues for their involvement in maize avonoid biosynthesis (Fig. 3). Further experimental investigations are needed to elucidate the precise functions of these loci.

Naringenin is the key intermediate of the avone/anthocyanin pathway, serving as the common precursor for a large number of downstream avonoids, as described previously46. The occurrence of various avones and O- or C-glycosyl avones found here demonstrates the existence of the pathway including glycosyltransferase genes, implicating the genetic and biochemical basis for the formation of diverse avonoids in the maize kernel. Metabolite GWAS thus facilitated characterization of the avonoid metabolic pathway and identication of genes involved in its biosynthesis.

Potential utilization of metabolites. Reliable biomarkers signicantly related to plant phenotypic performance is exceptionally attractive for breeders and plant biologists. Using variables with the highest signicance above an arbitrary cutoff value, a set of candidate biomarkers can be dened47. In the present study, 26 metabolite features signicantly associated with 100-kernel weight were detected in E2 that can explain 72.6% of the phenotypic variance. The most signicant metabolite feature was n1043. For comparison, 17 signicant metabolite features were found in E3, explaining 34.5% of the phenotypic variance. The most signicant metabolite feature is n0486. Two metabolite features (that is, n0956 and n1618) were signicant in both E2 and E3. It is demonstrated that a limited number of metabolite features signicantly correlated with kernel weight (Pr0.05, general stepwise regression, N 335339; Supplementary

Table 3) can be explored as useful markers for plant breeding. Using 100-kernel weight as an example, ve and two QTLs were found from linkage mapping in ZY and BB RIL populations, respectively. Forty-three QTLs of 42 metabolic traits identied in this study colocalized with these seven QTLs found for 100-kernel weight (Supplementary Table 4). More interestingly, of these 42 metabolic traits, two (n1104 and n1266) were signicantly associated with 100-kernel weight according to the general stepwise regression (Pr0.05; Supplementary Tables 3 and 4).

Additionally, the strongest signicant locus associated with n1266, which is also validated by linkage mapping, was exactly located in the region of a 100-kernel weight QTL identied in the ZY RIL population. Sixteen annotated genes were found within the B500-Kb region including the lead candidate gene GRMZM2G066067 (annotated as a UDP-glucosyltransferase), and other genes such as GRMZM2G472651 (Thylakoid assembly4), GRMZM2G366373 (Aux/IAA-transcription factor), GRMZM2G141379 (Zinc

Arginine

ADC CYP98A

OMT

Cou-Agm

Caf-Agm

Fer-Agm N-Fer,N-methoxy-Agm

Di-Fer-put

N-(Cou-O-Hex)-Spd

N-(Caf-O-Hex)-Spd

N-(Fer-O-Hex)-Spd

Agmatine

Putrescine

Spermidine

PHT

AIH

CPA

PAO

(MT)

Sin-Agm

Cou-Put

Caf-Put

Fer-Put

Sin-Put Cou-Spd

Caf-Spd

Fer-Spd

Sin-Spd

CYP98A

OMT

(PHT)

SPDS

CYP98A

OMT

SPMS

CV 86

CYP98A ??

(CCoAOMT1)

Spermine

Figure 2 | Proposed pathway of polyamine conjugates biosynthesis. The common conjugates are indicated in blue and new candidate genes in red (conrmed) and golden (not veried). ADC, arginine decarboxylase; AIH, agmatineiminohydrolase; CPA, N-carbamoylputrescineamidohydrolase; DAO, diamine oxidase; SPDS, spermidinesynthase; SPMS, spermine synthase; PAO, polyamine oxidase; PHT, putrescine: hydroxycinnamoyltransferase; GT, glycosyltransferase; CCoAOMT1, caffeoyl-CoA O- methyltransferase 1. Candidate gene revealed by the association analysis was put in the bracket under the corresponding metabolite.

6 NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438 ARTICLE

4-Coumaric acid

Tri 4 O-(tg)eth O-hex

3,4,5.-Tri 3,4,5.-Tri O-hex

3,4,5.-Tri O-rha-O-hex

Tri O-hex-O-(rha/pen)

4CL

CHS

CHI

FNS

F3 H

FOMT

UDP73B5, ZF

Tri 4 O-(e/tg)eth Sel der

Que

P1, RSP

Hes O-neo Hes

Cya 3-O-glu

Mal di-O-hex

Api di-C-hex

Api C-hex

Api C-hex-C-pen

Api C-pen-O-(Cou)hex

Api C-pen-O-(Caf)hex

Api di-C-pen

Api C-pen

Chr O-hex

P1, AQU, GRD3, BTF

SAMDC, NMT, ADS1 P1, ATD, HIS

BAM, OMT, DB, ADS2 CPSF, HAD, OMT, RSP, MADS, ELMO

P1, GRD2, IMI, HLY

P1, HLY, HCT, CYP86A35, UGT GRD1, WRKY23, RMVB

P1, CYP81D P1, PHC1A, GRAM

UGT-88A1

Nar

Api

Lut

Chr

Sel

FOMT FOMT

bHLH OBG, TPR

Tri O-hex

UGT-88A1

ABCT, PDHE

P1, PK

Api 7-O-glu

OXY, BX8, PPR

BX8, ZFR, SDR

P1, HLY, GH35, PK, GRD1, DPB

Chr di-O-hex

Chr O-hex-O-rha

Mal O-hex

MethylChr C-hex

Chr C-hex

Chr di-C-hex

Chr C-hex-O-pen

Chr C-hex-O-rha

P1 P1

P1,

P1, PK

Figure 3 | Proposed pathway of avonoid biosynthesis in maize kernel. Candidate genes identied by GWAS are shown in orange, under the corresponding associated metabolites. Api, Apigenin; Chr, chrysoeriol; Lut, Luteolin; cafpen, caffeoylpentoside; couhex, coumaroylhexoside; Cya, Cyanidin; der, derivative; glc, glucose; hes, hesperetin; hex, hexose; MethylChr, Methylchrysoeriol; Mal, Malvidin; pen, pentose; rha, rhamnose; Sel, Selgin; Tri, trincin; 30,40,50-Tri, 30,40,50-tricetin,(ebg)eth, (erythro-b-guaiacylglyceryl)ether; (tbg)eth, (threo-b-guaiacylglyceryl)ether; 4CL, 4-coumarate-CoA ligase; CHS, chalcone synthase; CHI, chalconeisomerase; FNS, avone synthase; F3H, avonoid 3-hydroxylase; FOMT, avonoid O-methyltransferase; bHLH, basic helix-loop-helix (GRMZM2G162382); CPSF, cleavage and polyadenylation specicity factor 73-I(GRMZM2G422649); HAD, haloaciddehalogenase-like hydrolase superfamily(GRMZM2G035651); OMT, O-methyltransferase (GRMZM2G104710); RSP, ribosomal protein (GRMZM2G344279); MADS, MADS-box family protein (GRMZM2G129034); ELMO, ELMO/CED-12 family protein (GRMZM2G031952); BAM, beta-amylase (GRMZM2G069486); DB, DNA-binding (GRMZM2G478370); ADS, AMP-dependent synthetase and ligase (GRMZM2G019746); IMI, plant invertase/pectin methylesterase inhibitor superfamily (GRMZM2G054225); P1, MYB R2R3type transcription factor (GRMZM2G084799); AQU, aquaporin NIP-type (GRMZM2G126582); GRD2, glucose/ribitol dehydrogenase (GRMZM2G170013); GRD3, glucose/ribitol dehydrogenase (GRMZM2G059361); BTF, basic transcription factor(GRMZM2G110116); ATD, acetamidase/formamidasefamily protein (GRMZM2G424857); HIS, histone superfamily protein (GRMZM2G176358); UGT88A1, UDP-glycosyltransferase 88A1 (GRMZM2G122072); ABCT, ABC transporter (GRMZM2G018074); PDHE, erythronate-4-phosphate dehydrogenase family protein (GRMZM2G177982); RSP, 60S ribosomal protein (GRMZM2G344279); OBG, GTP1/OBG family protein (GRMZM2G077632); TPR, tetratricopeptide repeat (TPR)-like superfamily protein (GRMZM2G177072); SDH, succinate dehydrogenase (GRMZM2G134134); ABCB2, ABC transporter group B2 (GRMZM2G156145); PK, pyruvate kinase (GRMZM2G119175); UGT73B5: UDP-glycosyltransferase 73B5 (GRMZM5G888620); ZF, RING/U-box superfamily protein zinc nger (GRMZM2G145104); SAMDC, S-adenosylmethionine decarboxylase proenzyme Precursor (GRMZM2G154397); NMT, histone-lysine N-methyltransferase (GRMZM2G025924); RHC1A, RING-H2 nger C1A (GRMZM2G176028); GRAM, GRAM domain family protein (GRMZM2G106622); HLY, hemolysin-III homologue (GRMZM2G114650);GH35, glycoside hydrolase, family 35 (GRMZM2G153200); GRD1, glucose/ribitol dehydrogenase (GRMZM2G076981); DPB, DNA binding and protein binding (GRMZM2G393471); HCT, hydroxycinnamoyl-CoA shikimate/quinatehydroxycinnamoyltransferase (GRMZM2G156816); CYP86A35, cytochrome P450 family 86, subfamily A, polypeptide 35 (GRMZM2G062151); UGT, UDP glycosyltransferases (GRMZM2G383404); WRKY53, superfamily of transcriptional factors having WRKY and zinc nger domains (GRMZM2G449681); RMVB, regulator of Vps4 activity in the MVB pathway protein (GRMZM2G059590); bx8, benzoxazinone synthesis 8 (GRMZM2G085054); ZFR, zinc nger, RING-CH-type (GRMZM2G358987); SDR, short-chain dehydrogenase/reductase (GRMZM2G000586); OXY, 2OG-Fe(II) oxygenase superfamily (GRMZM5G843555); PPR, PPR repeat domain containing protein (GRMZM2G325019).

nger, C3HC4 type), GRMZM2G112596 (ATPase-like), GRMZM2G043191 (Endonuclease/exonuclease/phosphatase), and so on (Supplementary Fig. 11). Further evaluation and identication of the underlying genes will help to clone the QTL affecting the kernel weight as well as to understand the genetic architecture of complex traits, and thus further enhance the crop-breeding toolbox.

DiscussionPlants are rich in metabolites and it is critical to explore the immense diversity of plant metabolism for the products important to human well being1,48. Metabolites may exert control on growth either by acting as substrates for the

synthesis of cellular components that become limiting under conditions of maximum growth, or by acting as signals regulating growth and development13,49. Many secondary metabolites are involved in biotic and abiotic stress responses. The economic value of maize grain and the very large contribution of maize to the diets of humans and animals make grain chemical composition studies an invaluable research target. The ability to understand quality determinants at the metabolic level, and use this information to boost grain nutrition, is one direct benet of this study. By measuring 983 metabolite features that include 184 metabolites with associated chemical structures in kernels of 702 genotypes, our understanding of natural variation at the metabolite level of maize has largely furthered. More than 80% metabolite features identied in this study exhibited large fold

NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 7

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438

change (410) within maize lines, which suggested an interesting direction to explore how the huge quantitative variations regulate plant growth and development.

GWAS has become a popular approach in plant genetic studies owing to the rapid advance of the sequencing technology in recent years50,51. Maize has exceptionally large diversity within species and rapid LD decay51. In our metabolite GWAS high analytical precision and marker density facilitated high-resolution mapping. We can achieve the mapping resolution down to a single gene in most cases in this study even though additional improvements will be possible with larger association panels as well as resolution and structure determination of a larger number of metabolites. Linkage mapping is an excellent tool for the validation of GWAS results. Rapidly developing genotyping platforms such as high-density SNP chip and genotyping by sequencing52 will benet the genotyping of larger panels of genotypes to achieve the identication of causative sequence variants. Availability of gene expression data owing to the RNA sequencing on our association panel also played an important role in validating function of candidate genes and investigating how the alleles work to change the phenotype. Picking and validating candidate genes would be signicantly challenging in some cases based on current genomic annotation of maize. Function annotation of their orthologous genes in other species can be helpful clue to gain novel ndings in maize, as shown by the cases of UGT and TDC in this study. Various approaches or tools are therefore useful and needed to interpret and utilize the GWAS results. Functional link between genetic variants and metabolic traits is relatively evident as suggested by our study, which demonstrates the great potential of combining genetics and metabolomics to dissect the biological mechanisms of maize metabolism. For instance, we updated the PAs and avonoid biosynthetic pathways in this study. Our knowledge of both pathways is now greatly improved by the identication of previously unknown metabolites and candidate genes, including those for metabolic enzymes, transcription factor and transporters. Further studies, including structure conrmation of the selected metabolites and functional validation of additional candidate genes, can now be undertaken. Understanding natural variation at the metabolite level facilitates reconstruction of biosynthetic pathways, which in turn will benet synthetic biology and metabolic engineering of desirable compounds in plants.

Metabolites that are correlated with complex traits like biomass of plants possess great predictive power to be used as biomarkers18,49. A number of metabolite features signicantly associated with kernel trait were identied in this study. Although further validation was required, the combination of GWAS and metabolomics provided an alternative to uncover agronomically important traits, which will enhance the molecular design breeding in maize as well as other crops.

In summary, the combination of multiple technologies, including transcript and metabolite proling, has facilitated candidate gene selection and allowed novel functional and biological insights into the association results. Future genetic studies in conjunction with genomics, transcriptomics, metabolomics and proteomics, as well as precision phenotyping, will help to fully uncover the mechanisms of complex agronomic and biochemical traits, and will lead to an accelerated rate of genetic gain in crop improvement.

Methods

Populations and eld trials. Genetic materials used in this study included a panel of 368 diverse maize inbred lines for GWAS, which was referred to as the association panel23 and two RIL populations, B73/By804 (ref. 53) and Zong3/Yu87-1

(ref. 54) for linkage analysis. Field trials for the association panel were conducted in three sites: Hainan (Sanya, E 109510, N 18250) in 2010, Yunnan (Kunming,

E 102300, N 24250) and Chongqing (E 106500, N 29250) in 2011. These three experiments were hereafter referred to as E1, E2 and E3, respectively. The173 RIL lines from the B73/By804 cross (referred to as BB hereafter) were planted in Henan in 2011. The 161 RILs from the Zong3/Yu87-1 cross (referred to as ZY hereafter) were planted in Yunnan in 2011. All the inbred lines were divided into two groups (temperate and tropical/subtropical) based on pedigree information and planted in one-row plots in an incompletely randomized block design within the group. All lines were self-pollinated and ears of each plot were hand-harvested at their respective physiological maturity, followed by air drying and shelling. For each line, ears from ve plants were harvested at the same maturity and bulked. One hundred kernels of each line were also counted and weighted for the association panel planted in three environments (Sichuan, 2009; Yunnan, 2009 and 2010) and for the two linkage populations planted in three environments (Chongqing, 2011; Hainan, 2011; Henan, 2011), respectively. The blup values of HKW across all environments were used for GWAS and linkage analysis.

Genotyping and RNA sequencing. All 368 lines of the association panel were genotyped using Illumina MaizeSNP50 BeadChip, which contains 56,110 SNP loci55. Ninety-base pair pair-end Illumina RNA sequencing was subsequently performed on the immature seeds of 15 days after pollination for these 368 lines. In all, 1.06 million high-quality SNPs were identied in the whole panel and the expression data for 28,769 genes were also obtained in all the 368 lines. The detailed information was described in the recent studies22,23. Both RIL populations have been genotyped using Illumina GoldenGate BeadChip containing 1,536 SNPs and linkage map was constructed for both populations56.

Sample preparation and metabolite proling. We carried out metabolic proling on mature maize kernels from lines of the association panel (n 368) and

two RIL populations (N 173 and 167, respectively). For each line, 12-well growth

kernels were randomly selected from ve plants and bulked for grinding. The kernels were ground using a mixer mill (MM 400, Retsch) with zirconia beads for2.0 min at 30 Hz. The powder of each genotype was partitioned into two sample sets and stored at 80 C until they were required for extraction. One sample set

was extracted for lipid-soluble metabolites, while the other was for extracting water-soluble metabolites. One hundred milligram of powder and 1 ml absolute methanol, which contained 0.1 mg l Lincomycin and Lidocaine, were used for lipid-soluble metabolites (or 70% methanol for water-soluble metabolites). Samples were extracted overnight at 4 C. After centrifugation at 10,000g for 10 min, 0.4 ml of each extract was combined and lter spun using 0.22-mm lters (ANPEL, Shanghai,

China, http://www.anpel.com.cn/

Web End =http://www.anpel.com.cn/ ) before analysis using an LC-ESI-MS/MS system. The metabolite quantication and annotation is performed by our newly developed method57, which is described in detail in the Supplementary Notes. All the data are reported in detail in the Supplementary Materials, following the recommendations for reporting metabolite data by Fernie et al.58 (see the Supplementary Note 1; Supplementary Data 1 and 2 and Supplementary Fig. 12).

Metabolite identication and annotation. To facilitate the identication/annotation of detected metabolites by our widely targeted metabolomics approach, accurate m/z of each Q1 was obtained, if possible. To this end, extracted ion chromatograms (XICs) of the ESI-QqTOF-MS data for each of Q1 (m/z0.2 Da) of the 983 transitions in the MS2T library were manually evaluated for the presence of the targeted substances by analysing corresponding mass spectra, and the accurate m/z values were obtained. For each of the corresponding accurate m/z, fragmentation pattern was obtained by running the analysis under targeted MS/MS mode using three different collision energies of 10, 20 and 30 eV. The accurate m/z was assigned to the corresponding Q1 if similar fragmentation patterns were obtained between the ESI-Q TRAP-MS/MS and the ESI-QqTOF-MS/MS. Eventually, accurate mass of 245 of Q1 was obtained.

The MS2T library was annotated based on the fragmentation pattern (delivered by ESI-Q TRAP-MS/MS and/or the accurate m/z value delivered by ESI-QqTOFMS/MS) and the retention time of each metabolite. Based on the annotation, commercially available standards were purchased and analysed using the same proling procedure as the extracts. By comparing the m/z values, the retention time and the fragmentation patterns with the standards, 49 metabolites were identied, including amino acids, avonoids, lysoPCs and fatty acid (such as a-linolenic acid), and some phytohormones (see Supplementary Data 2). For the metabolites that couldnt be identied by available standards, peaks in the MS2T library, especially the peaks that have similar fragmentation patterns with the metabolites identied by authentic standards, were used to query the MS/MS spectral data taken from the literature or to search the databases (MassBank, KNApSAcK, HMDB, MoTo DB and METLIN). Best matches were then searched in the Dictionary of Natural products and KEGG for possible structures. In all, 184 metabolites were identied and more than four different pathways were detected in our study.

Statistical analysis of metabolic traits. The mixture of 150 randomly chosen extracts from the association panel was used as a reference control of each measured batch59. One reference control that contains 983 molecular features was

8 NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438 ARTICLE

placed and measured per 25 genotypes. In our study, the reference control were used for normalization even though two internal standards were added, since we think the internal standards are not the best to reect the change of all metabolite features through the analysis procedure, considering different properties of the large number of metabolite features we analysed. Moreover, each set of 25 samples and each molecular feature were normalized to the average level of reference control that was injected before and after these 25 samples. All procedures were applied after normalization of the metabolite data using the reference control. All the metabolite concentrations were log2-transformed for further analysis. Since only one replication was performed in each experiment, phenotypic variance (Vp)

was partitioned into genotype (Vg), environment (Ve) and the error (Ver) using a SAS proc mixed procedure. Repeatability (R) was then calculated for each metabolite as R Vg/Vp according to Holland et al.60

Genome-wide association studies. Given that the MS system for E1 was different from that used in the other two experiments (Supplementary Note 1), for simplicity, GWAS was independently performed for each metabolic trait obtained in each experiment. We used a compressed mixed linear model61 accounting for the population structure (Q) and familial relationship (K)23 to examine the association between SNPs and metabolic traits. SNPs with a moderate minor allele frequency (MAF45%) in the 368 lines were employed in the association analysis. P value of each SNP was calculated and signicance was dened at a uniform threshold of r1.8 10 6 (P 1/n; n total markers used, which is roughly a Bonferroni

correction). SNP with the lowest P value (that is, lead SNP) and its corresponding gene were reported for each signicant metabolic locus (see SupplementaryData 3). To validate each signicant locus by linkage analysis, the physical position of its lead SNP was compared with the physical region of QTL. For the purpose of evaluating each candidate gene, eQTL analysis was conducted to investigate the regulatory pattern of each gene, and the relationship between its expression level and the corresponding metabolic trait was further investigated.

Linkage mapping. We conducted QTL analysis using Composite Interval Mapping implemented in Windows QTL Cartographer V2.5 (refs 62,63) for metabolites detected in the two RIL populations. Zmap (model 6) with a 10-cM window and an interval-mapping increment of 2 cM were used. To determine a uniform threshold for signicant QTLs 1,000 permutations were used for 100 randomly selected traits, 50 traits from each RIL population. The average LOD value at Po0.05 is 2.88, so we chose a uniform value (LOD 3) as the cutoff. The genomic region in which a

peak of LOD value reached the threshold (LOD 3), and the LOD of the anking

markers was 42.5, was designated as a condence interval.

eQTL mapping. Using the same method as for GWAS, the associations between genome-wide SNPs and the expression level of each metabolic trait-associated gene were investigated. SNPs within a 100-kb region of the lead SNP for each metabolic trait were evaluated for their possible regulatory involvement.

Only genes expressed in 450% of the 368 sequenced lines that had a mean quantication of 410 reads were used in this analysis.

Vector construction and rice transformation. The over-expression vector (pJC034) for rice is constructed from the gateway over-expression vector pH2GW7; 35S promoter of pH2GW7 is replaced by maize ubiquitin promoter, which is more suitable for rice over-expression study. To generate PHT and CCoAOMT1 over-expression constructs, the full-length cDNA of PHT was amplied from maize inbred line B73 by reverse transcription (RT)-PCR. The PCR product was cloned into the gateway vector pDONR207 using the BP enzyme (Invitrogen, Shanghai, China), and then sequenced; the right clones would be used for LR reaction with pJC034 using the LR enzyme (Invitrogen, Shanghai, China). This construct were introduced into japonica rice cultivar ZH11 by Agrobacterium tumefaciens-mediated transformation64.

Expression analyses of the transformed genes. We isolated total RNA from rice and maize leaves using an RNA extraction kit (TRIzol reagent; Invitrogen, Shanghai, China) according to the manufacturers instructions. The rst-strand cDNA was synthesized using MLV (Invitrogen, Shanghai, China) according to the manufacturers protocol. Real-time PCR was performed on an optical 96-well plate in an ABI SteponePlus PCR system (Applied Biosystems, Shanghai, China) by using SYBR Premix reagent F-415 (Thermo scientic, Shanghai, China). The relative expression level of gene PHT and CCoAOMT1 was determined with the rice Actin1 (ref. 65) gene as an internal control. The expression measurements were obtained using the relative quantication method66. For semi-quantitative RT-PCR, reactions were performed in 20-ml volumes with the following protocol: one cycle of 94 C for 5 min and 30 cycles of 94 C for 30 s, 58 C for 30 s and 72 C for 60 s.

Detection of metabolites signicantly associated with 100-kernel weight.

General step-wise regression, implemented by GLMSelect procedure in SAS software67, was used to detect metabolites signicantly associated with 100-kernel

weight investigated in E2 and E3. The probability of marker entering into the model was set at 0.05 for selecting the top metabolites.

Data availability. All data are available as a public resource to aid functional studies and interpretation of GWAS ndings. Data sets including genotypic, phenotypic, expression and the mass spec data of each line and detailed information of called SNPs from RNA-sequencing result can be viewed and downloaded from the website http://www.maizego.org/Resources.html

Web End =http://www.maizego.org/Resources.html .

References

1. DeLuca, V. et al. Mining the biodiversity of plants: a revolution in the making. Science 336, 16581661 (2012).

2. Milo, R. & Last, R. L. Achieving diversity in the face of constraints: lessons from metabolism. Science 336, 16631667 (2012).

3. Kettunen, J. et al. Genome-wide association study identies multiple loci inuencing human serum metabolite levels. Nat. Genet. 44, 269276 (2012).

4. Grifn, J. L. Understanding mouse models of disease through metabolomics. Curr. Opin. Chem. Biol. 10, 309315 (2006).

5. Fernie, A. R. & Schauer, N. Metabolomics-assisted breeding: a viable option for crop improvement? Trends Genet. 25, 3948 (2009).

6. Riedelsheimer, C. et al. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat. Genet. 44, 217220 (2012).

7. Riedelsheimer, C. et al. Genome-wide association mapping of leaf metabolic proles for dissecting complex traits in maize. Proc. Natl Acad. Sci. USA 109, 88728877 (2012).

8. Shen, M. et al. Leveraging non-targeted metabolomic proling via statistical genomics. PLoS One 8, e57667 (2013).

9. Gore, M. A. et al. A rst-generation haplotype map of maize. Science 326, 11151117 (2009).

10. Huang, X. H. & Han, B. A crop of maize variants. Nat. Genet. 44, 734735 (2012).11. Fiehn, O. et al. Metabolite proling for plant functional genomics. Nature 18, 11571161 (2000).

12. Suhre, K. & Gieger, C. Genetic variation in metabolic phenotypes: study designs and applications. Nat. Rev. Genet. 13, 759769 (2012).

13. Meyer, R. C. et al. The metabolic signature related to high plant growth rate in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 104, 47594764 (2007).

14. Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite proles in human serum. PLoS. Genet. 4, e1000282 (2008).

15. Chan, E. K. F. et al. The complex genetic architecture of the metabolome. PLoS. Genet. 4, e1001198 (2010).

16. Keurentjes, J. J. B. et al. The genetics of plant metabolism. Nat. Genet. 38, 842849 (2006).

17. Matsuda, F. et al. Dissection of genotype-phenotype associations in rice grains using metabolome quantitative trait loci analysis. Plant J.l 70, 624636 (2012).

18. Schauer, N. et al. Comprehensive metabolic proling and phenotyping of interspecic introgression lines for tomato improvement. Nat. Biotechnol. 24, 447454 (2006).

19. Chan, E. K. F., Rowe, H. C. & Kliebenstein, D. J. Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics (2010; 185, 9911007.

20. Civelek, M. & Lusis, A. J. Systems genetics approaches to understand complex traits. Nat. Rev. Genet. 15, 3448 (2014).

21. Chan, E. K. F., Rowe, H. C., Corwin, J. A., Joseph, B. & Kliebenstein, D. J. Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in Arabidopsis thaliana. PLoS Biol. 9, e1001125 (2011).

22. Fu, J. J. et al. RNA sequencing reveals the complex regulatory network in the maize kernel. Nat. Commun. 4, 2832 (2013).

23. Li, H. et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat. Genet. 45, 4350 (2013).

24. Tohge, T. & Fernie, A. R. Combining genetic diversity, informatics and metabolomics to facilitate annotation of plant gene function. Nat. Protoc. 5, 12101227 (2010).

25. Zhong, R. et al. Essential role of caffeoyl coenzyme A O-methyltransferase in lignin biosynthesis in woody poplar plants. Plant Physiol. 124, 563578 (2000).

26. Do, C. T. et al. Both caffeoyl Coenzyme A 3-O-methyltransferase 1 and caffeic acid O-methyltransferase 1 are involved in redundant functions for lignin, avonoids and sinapoyl malate biosynthesis in Arabidopsis. Planta 226, 11171129 (2007).

27. Lin, L. J., Tai, S. S., Peng, C. C. & Tzen, J. T. Steroleosin, a sterol-binding dehydrogenase in seed oil bodies. Plant Physiol. 128, 12001211 (2002).

28. Moreau, R. A., Nunez, A. & Singh, V. Diferuloylputrescine and p-coumaroylferuloylputrescine, abundant polyamine conjugates in lipid extracts of maize kernels. Lipids 36, 839844 (2001).

29. Christie, P. J., Alfenito, M. R. & Walbot, V. Impact of low-temperature stress on general phenylpropanoid and anthocyaninpathways: enhancement of transcript abundance and anthocyanin pigmentation in maize seedlings. Planta 194, 541549 (1994).

NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications 9

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms4438

30. Brazier-Hicks, M. et al. The C-glycosylation of avonoids in cereals. J. Biol. Chem. 284, 1792617934 (2009).

31. Sekhon, R. S. et al. Genome-wide atlas of transcription during maize development. Plant J. 66, 553563 (2011).

32. Blackmore, S., Wortley, A. H., Skvarla, J. J. & Rowley, J. R. Pollen wall development in owering plants. New Phytol. 174, 483498 (2007).

33. Luo, J. et al. A novel polyamine acyltransferase responsible for the accumulation of Spermidine conjugates in Arabidopsis seed. Plant Cell 21, 318333 (2009).34. Martin-Tanguy, J. The occurrence and possible function of hydroxycinnamoyl acid amides in plants. Plant Growth Regul. 3, 381399 (1985).

35. Back, K. et al. Cloning and characterization of a hydroxycinnamoyl-CoA: tyramine N-(hydroxycinnamoyl) transferase induced in response to UV-C and wounding from Capsicum annuum. Plant Cell Physiol. 42, 475481 (2001).

36. Goyal, M. & Asthir, B. Polyamine catabolism inuences antioxidative defense mechanism in shoots and roots of ve wheat genotypes under high temperature stress. Plant Growth Regul. 60, 1325 (2010).

37. Walters, D. Resistance to plant pathogens: possible roles for free polyamines and polyamine catabolism. New Phytol. 159, 109115 (2003).

38. Koes, R., Verweij, W. & Quattrocchio, F. Flavonoids: a colorful model for the regulation and evolution of biochemical pathways. Trends Plant Sci. 10, 236242 (2005).

39. Burhenne, K., Kristensen, B. K. & Rasmussen, S. K. A new class of N-hydroxycinnamoyltransferases. Purication, cloning, and expression of a barley agmatinecoumaroyltransferase (EC 2.3.1.64). J. Biol. Chem. 278, 1391913927 (2003).

40. Onkokesung, N. et al. MYB8 controls induciblephenolamidelevels by activating three novelhydroxycinnamoyl-coenzyme A: polyamine transferases in nicotianaattenuata. Plant Physiol. 158, 389407 (2012).

41. Bassard, J. E., Ullmann, P., Bernier, F. & Werck-Reichhart, D. Phenolamides: bridging polyamines to the phenolic metabolism. Phytochemistry 71, 18081824 (2010).

42. Fellenberg, C., Boettcher, C. & Vogt, T. Phenylpropanoid polyamine conjugate biosynthesis in Arabidopsis thaliana ower buds. Phytochemistry 70, 13921400 (2009).

43. Muroi, A. et al. Accumulation of hydroxycinnamic acid amides induced by pathogen infection and identication of agmatinecoumaroyltransferase in Arabidopsis thaliana. Planta 230, 517527 (2009).

44. Grotewold, E., Athma, P. & Peterson, T. Alternatively spliced products of the maize P gene encode proteins with homology to the DNA -binding domain of myb-like transcription factors. Proc. Natl Acad. Sci. USA 88, 45874591 (1991).

45. Morohashi, K. et al. A Genome-wide regulatory framework identies maize pericarp Color1 controlled genes. Plant Cell 24, 27452764 (2012).

46. Wang, Y., Chen, S. & Yu, O. Metabolic engineering of avonoids in plants and microorganisms. Appl. Microbiol. Biotechnol. 91, 949956 (2011).

47. Bijlsma, S. et al. Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation. Anal. Chem. 78, 567574 (2006).

48. Saito, K. & Matsuda, F. Metabolomics for functional genomics, systems biology,and biotechnology. Annu. Rev. Plant Biol. 61, 463489 (2010).

49. Sulpice, R. et al. Starch as a major integrator in the regulation ofplant growth. Proc. Natl Acad. Sci. USA 106, 1034810353 (2009).

50. Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627631 (2010).

51. Yan, J. B., Warburton, M. & Crouch, J. Association mapping for enhancing maize genetic improvement. Crop Sci. 51, 433449 (2011).

52. Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6, e19379 (2011).

53. Chander, S. et al. Using molecular markers to identify two major loci controlling carotenoid contents in maize grain. Theor. Appl. Genet. 116, 223233 (2008).

54. Ma, X. Q. et al. Epistatic interaction is an important genetic basis of grain yield and its components in maize. Mol. Breeding 20, 4151 (2007).

55. Ganal, M. W. et al. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One 6, e28334 (2011).

56. Pan, Q., Ali, F., Yang, X., Li, J. & Yan, J. Exploring the genetic characteristics of two recombinant inbred line populations via high-density SNP markers in maize. PLoS One 7, e52777 (2012).

57. Chen, W. et al. A novel integrated method for large-scale detection, identication and quantication of widely-targeted metabolites: application in study of rice metabolomics. Mol. Plant 6, 17691780 (2013).

58. Fernie, A. R. et al. Recommendations for reporting metabolite data. Plant Cell 23, 24772482 (2011).

59. Roessner, U. et al. Metabolic proling allows comprehensive phenotyping of genetically or environmentally modied plant systems. Plant Cell 13, 1129 (2001).

60. Holland, J. B., Nyquist, W. E. & Cervantes-Martinez, C. T. Estimating and interpreting heritability for plant breeding: anupdate. Plant Breed. Rev. 22, 9111 (2003).

61. Zhang, Z. W. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355360 (2010).

62. Zeng, Z. B. Precision mapping of quantitative trait loci. Genetics 136, 14571468 (1994).

63. Wang, S., Basten, C. J. & Zeng, Z. Windows QTL Cartographer 2.5 (North Carolina State University, 2005).

64. Lin, Y. J. & Zhang, Q. Optimising the tissue culture conditions for high efciency transformation of indica rice. Plant Cell Rep. 23, 540547 (2005).

65. Hou, X. et al. A homologue of human ski-interacting protein in rice positively regulates cell viability and stress tolerance." Proc. Natl Acad. Sci. USA 106, 64106415 (2009).

66. Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2 DDCT method. Methods 25, 402408

(2001).67. Freund, R. J. & Littell, R. C. SAS system for regression 1986 edn (SAS Institute Inc., 1986).

Acknowledgements

We appreciate the helpful comments on the manuscript made by Dr Alisdair Fernie, Dr Takayuki Tohge and Dr Marilyn Warburton. This research was supported by the National Hi-Tech Research and Development Program of China (2012AA10A307), the National Program on Key Basic Research Project of China (2013CB127000, 2014CB138202) and the National Natural Science Foundation of China (31101156, 31201220).

Author contributions

J.Y. and J. Luo designed and supervised this study. W.W., D.L., X.L., Y.G. and W.L. performed the experiments. W.W., D.L., X.L., H. Li, J. Liu, H.Liu and W.C. performed the data analysis. W.W., J. Luo and J.Y. prepared the manuscript with inputs from other authors.

Additional information

Accession Codes: RNA sequencing data of 368 maize inbred lines have been deposited in the GenBank Sequence Read Archive (SRA) under the accession code SRP026161.

Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications

Web End =http://www.nature.com/ http://www.nature.com/naturecommunications

Web End =naturecommunications

Competing nancial interests: The authors declare no competing nancial interests.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

Web End =http://npg.nature.com/ http://npg.nature.com/reprintsandpermissions/

Web End =reprintsandpermissions/

How to cite this article: Wen, W. et al. Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights. Nat. Commun. 5:3438doi: 10.1038/ncomms4438 (2014).

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

Web End =http:// http://creativecommons.org/licenses/by-nc-sa/3.0/

Web End =creativecommons.org/licenses/by-nc-sa/3.0/

10 NATURE COMMUNICATIONS | 5:3438 | DOI: 10.1038/ncomms4438 | http://www.nature.com/naturecommunications

Web End =www.nature.com/naturecommunications

Word count: 9845

Show less

Abstract

Translate

Plants produce a variety of metabolites that have a critical role in growth and development. Here we present a comprehensive study of maize metabolism, combining genetic, metabolite and expression profiling methodologies to dissect the genetic basis of metabolic diversity in maize kernels. We quantify 983 metabolite features in 702 maize genotypes planted at multiple locations. We identify 1,459 significant locus-trait associations (P≤1.8 × 10^-6 ) across three environments through metabolite-based genome-wide association mapping. Most (58.5%) of the identified loci are supported by expression QTLs, and some (14.7%) are validated through linkage mapping. Re-sequencing and candidate gene association analysis identifies potential causal variants for five candidate genes involved in metabolic traits. Two of these genes were further validated by mutant and transgenic analysis. Metabolite features associated with kernel weight could be used as biomarkers to facilitate genetic improvement of maize.

Details

Title

Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights

Author

Wen, Weiwei; Li, Dong; Li, Xiang; Gao, Yanqiang; Li, Wenqiang; Li, Huihui; Liu, Jie; Liu, Haijun; Chen, Wei; Luo, Jie; Yan, Jianbing

Pages

3438

Publication year

2014

Publication date

Mar 2014

Publisher

Nature Publishing Group

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/ncomms4438

ProQuest document ID

1507806471

Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights

Jump to:

Full text

Abstract

Details

Suggested sources